设置草图

回复:发票 获取源代码

源码:

发票类型类:

public enum InvoiceType {PLAIN_INVOICE("增值税电子普通发票", "0");String name;String type;InvoiceType(String name, String type){this.name = name;this.type = type;}public String getName() {return name;}public String getType() {return type;}public static InvoiceType getByName(String name){return Arrays.stream(InvoiceType.values()).filter((n)-> n.name.equals(name) ).findFirst().orElse(null);}@Overridepublic String toString() {return "InvoiceType{" +"name='" + name + '\'' +", type='" + type + '\'' +'}';}
}

发票信息类:

@Data
public class InvoiceInfo {//购方信息private String purchaserName;private String purchaserTaxNo;private String purchaserAddr;private String purchaserTel;private String purchaserAddrAndTel;private String purchaserBank;private String purchaserBankNo;private String purchaserBankAndNo;//销方信息private String sellerName;private String sellerTaxNo;private String sellerAddr;private String sellerTel;private String sellerAddrAndTel;private String sellerBank;private String sellerBankNo;private String sellerBankAndNo;private String invoiceNo;private String invoiceCode;private String invoiceType;//开票日期 格式:yyyy-MM-ddprivate String kprq;//密码区private String secretArea;//校验码private String checkCode;//含税金额private String hsje;//不含税金额private String bhsje;//税额private String se;//备注private String remark;//收款人private String skr;//复核人private String fhr;//开票人private String kpr;
}

pdf扫描工具:

public class ScanPdfInvoiceUtils {public InvoiceInfo ocrInvoice(String filePath) throws IOException {InvoiceInfo info = new InvoiceInfo();File file = Paths.get(filePath).toFile();InputStream in = new FileInputStream(file);RandomAccessRead rbuffer = new RandomAccessBuffer(in);PDFParser parser = new PDFParser(rbuffer);parser.parse();PDDocument document = parser.getPDDocument();PDPageTree pageTree = document.getPages();Iterator<PDPage> it = pageTree.iterator();while (it.hasNext()) {PDPage pdPage = (PDPage) it.next();PDResources pdResources = pdPage.getResources();Iterable<COSName> iterable = pdResources.getXObjectNames();if (iterable != null) {Iterator<COSName> iter = iterable.iterator();
//              while (iter.hasNext()) {COSName cosName = iter.next();System.out.println(">>>>>>>>>>>" + cosName.getName());
//              }}}int pages = document.getNumberOfPages();PDFTextStripper stripper = new PDFTextStripper();// 设置按序输出stripper.setSortByPosition(true);stripper.setStartPage(1);stripper.setEndPage(pages);String content = stripper.getText(document);System.out.println(content);document.close();rbuffer.close();in.close();return info;}public InvoiceInfo ocrInoivceArea(String filePath) throws IOException {InvoiceInfo info = new InvoiceInfo();File file = Paths.get(filePath).toFile();InputStream in = new FileInputStream(file);RandomAccessRead rbuffer = new RandomAccessBuffer(in);PDFParser parser = new PDFParser(rbuffer);parser.parse();PDDocument document = parser.getPDDocument();PDPage pdPage = document.getPage(0);System.out.println(pdPage.getCropBox().getHeight());System.out.println(pdPage.getCropBox().getWidth());PDFTextStripperByArea area = new PDFTextStripperByArea();area.setSortByPosition(true);area.addRegion("invoiceType", new Rectangle2D.Double(190,0,210,90));area.addRegion("invoiceRightT", new Rectangle2D.Double(417,0,171,90));area.addRegion("secretArea", new Rectangle2D.Double(355,59,354,84));area.addRegion("purchaser", new Rectangle2D.Double(40,95,200,60));area.addRegion("seller", new Rectangle2D.Double(42,291,171,60));area.addRegion("remark", new Rectangle2D.Double(360,297,208,57));area.addRegion("hsje", new Rectangle2D.Double(471,280,105,17));area.addRegion("bhsje", new Rectangle2D.Double(381,253,88,17));area.addRegion("se", new Rectangle2D.Double(500,253,88,17));area.addRegion("skr", new Rectangle2D.Double(30,360,120,15));area.addRegion("fhr", new Rectangle2D.Double(180,360,120,15));area.addRegion("kpr", new Rectangle2D.Double(310,360,80,15));area.extractRegions(pdPage);area.getRegions().stream().forEach((name)->{String temp = area.getTextForRegion(name);switch (name){case "invoiceType":if (temp.contains(InvoiceType.PLAIN_INVOICE.getName()) || temp.contains("普")){info.setInvoiceType(InvoiceType.PLAIN_INVOICE.getType());}break;case "hsje":if (null != temp && !temp.isEmpty()){info.setHsje(temp.replaceAll("¥", "").trim());}break;case "bhsje":if (null != temp && !temp.isEmpty()){info.setBhsje(temp.replaceAll("¥", "").trim());}break;case "se":if (null != temp && !temp.isEmpty()){info.setSe(temp.replaceAll("¥", "").trim());}break;case "secretArea":info.setSecretArea(temp);break;case "invoiceRightT":Object[] sp = Arrays.stream(temp.split("\n")).map(s-> {String xx = subString(s);xx = StringUtils.replaceAll(xx, "\\s*", "");return StringUtils.replaceAll(xx, "[^\\x00-\\xff]", "");}).toArray();info.setInvoiceCode(Objects.toString(sp[0]));info.setInvoiceNo(Objects.toString(sp[1]));info.setKprq(Objects.toString(sp[2]));info.setCheckCode(Objects.toString(sp[3]));break;default: break;}});document.close();rbuffer.close();in.close();saveAsPng(filePath, info);return info;}public static void saveAsPng(String filePath, InvoiceInfo info) throws IOException {File file = Paths.get(filePath).toFile();InputStream in = new FileInputStream(file);RandomAccessRead rbuffer = new RandomAccessBuffer(in);PDFParser parser = new PDFParser(rbuffer);parser.parse();PDDocument document = parser.getPDDocument();PDFRenderer pdfRenderer = new PDFRenderer(document);BufferedImage img = pdfRenderer.renderImageWithDPI(0, 300f);//备注BufferedImage remarkImg = img.getSubimage(1517,1221, 954, 233);byte[] remkarB = OcrUtils.imageToBytes(remarkImg);info.setRemark(OcrUtils.ocrImg(remkarB));//收款人、复核人、开票人BufferedImage threeImg = img.getSubimage(83,1458, 705, 183);byte[] threeB = OcrUtils.imageToBytes(threeImg);String threeS = OcrUtils.ocrImg(threeB);threeS = StringUtils.replaceAll(threeS, "\\s*", "");threeS = subString(threeS);info.setSkr(threeS);threeImg = img.getSubimage(790,1458, 509, 183);threeB = OcrUtils.imageToBytes(threeImg);threeS = OcrUtils.ocrImg(threeB);threeS = StringUtils.replaceAll(threeS, "\\s*", "");threeS = subString(threeS);info.setFhr(threeS);threeImg = img.getSubimage(1300,1458, 499, 183);threeB = OcrUtils.imageToBytes(threeImg);threeS = OcrUtils.ocrImg(threeB);threeS = StringUtils.replaceAll(threeS, "\\s*", "");threeS = subString(threeS);info.setKpr(threeS);//销方信息BufferedImage sellerImg;byte[] sellerB;String sellerS;sellerImg = img.getSubimage(183,1214, 1260, 60);sellerB = OcrUtils.imageToBytes(sellerImg);sellerS = OcrUtils.ocrImg(sellerB);sellerS = StringUtils.replaceAll(sellerS, "\\s*", "");sellerS = subString(sellerS);info.setSellerName(sellerS);sellerImg = img.getSubimage(183,1271, 1260, 60);sellerB = OcrUtils.imageToBytes(sellerImg);sellerS = OcrUtils.ocrImg(sellerB);sellerS = StringUtils.replaceAll(sellerS, "\\s*", "");sellerS = subString(sellerS);info.setSellerTaxNo(sellerS);sellerImg = img.getSubimage(183,1335, 1260, 60);sellerB = OcrUtils.imageToBytes(sellerImg);sellerS = OcrUtils.ocrImg(sellerB);sellerS = StringUtils.replaceAll(sellerS, "\\s*", "");sellerS = subString(sellerS);info.setSellerAddrAndTel(sellerS);sellerImg = img.getSubimage(183,1393, 1260, 60);sellerB = OcrUtils.imageToBytes(sellerImg);sellerS = OcrUtils.ocrImg(sellerB);sellerS = StringUtils.replaceAll(sellerS, "\\s*", "");sellerS = subString(sellerS);info.setSellerBankAndNo(sellerS);//购方信息sellerImg = img.getSubimage(172,348, 1270, 69);sellerB = OcrUtils.imageToBytes(sellerImg);sellerS = OcrUtils.ocrImg(sellerB);sellerS = StringUtils.replaceAll(sellerS, "\\s*", "");sellerS = subString(sellerS);info.setPurchaserName(sellerS);sellerImg = img.getSubimage(172,417, 1270, 69);sellerB = OcrUtils.imageToBytes(sellerImg);sellerS = OcrUtils.ocrImg(sellerB);sellerS = StringUtils.replaceAll(sellerS, "\\s*", "");sellerS = subString(sellerS);info.setPurchaserTaxNo(sellerS);sellerImg = img.getSubimage(172,486, 1270, 69);sellerB = OcrUtils.imageToBytes(sellerImg);sellerS = OcrUtils.ocrImg(sellerB);sellerS = StringUtils.replaceAll(sellerS, "\\s*", "");sellerS = subString(sellerS);info.setPurchaserAddrAndTel(sellerS);sellerImg = img.getSubimage(172,555, 1270, 69);sellerB = OcrUtils.imageToBytes(sellerImg);sellerS = OcrUtils.ocrImg(sellerB);sellerS = StringUtils.replaceAll(sellerS, "\\s*", "");sellerS = subString(sellerS);info.setPurchaserBankAndNo(sellerS);document.close();rbuffer.close();in.close();}private static String subString(String str){if (str.contains(":")){str = StringUtils.substringAfterLast(str, ":");}else if(str.contains(":")){str = StringUtils.substringAfterLast(str, ":");}return str;}/*** 长:系数0.2345* 宽:系数0.2388* @param args* @throws IOException*/public static void main(String[] args) throws IOException {ScanPdfInvoiceUtils pdfInvoiceUtils = new ScanPdfInvoiceUtils();InvoiceInfo info = pdfInvoiceUtils.ocrInoivceArea("/Users/grant/Pictures/03300180011130339349.pdf");System.out.println(JSONUtil.toJsonStr(info));//        pdfInvoiceUtils.saveAsPng("/Users/grant/Pictures/03300180011130339349.pdf");}
}

增值税电子发票识别-OCR相关推荐

  1. 发票识别OCR及查验API接口为企业化解难题

    对于当今的现代企业来说,分散的财务管理模式效率不高,管理成本反而相对较高,制约了集团企业发展战略的实施,因而需要建设财务共享模式.一个企业要建成财务共享中心,面临的难题是大量的数据采集和信息处理工作, ...

  2. python教程79--A4纸增值税电子发票合并打印

    接上篇https://blog.csdn.net/itmsn/article/details/121902974?spm=1001.2014.3001.5501https://blog.csdn.ne ...

  3. 发票识别OCR解决方案

    发票拍照识别系统还可与政府.企事业单位.工商等多个行业的业务流程系统无缝结合,辅助办公人员进行发票等单据的信息录入,提高资料电子化.数据格式化的效率. 那么发票拍照识别系统有哪些技术特点呢? 1.中安 ...

  4. 批量识别PDF/OFD/PNG/JPG电子发票到EXCEL

    最近有做财务的朋友提到,能否帮助他们做个工具,将每个月几百张的发票自动整理到EXCEL(既有PDF电子版.也有OFD的,甚至纸质的都还有,过程实在艰巨).下来找了一圈有免费的.也有商用的,结果要么功能 ...

  5. JAVA识别PDF和OFD电子发票并解析为java对象

    上一篇我们说了java实现电子发票中的发票税号等信息识别的几种可用方案,最后博主选取了识别文件二维码的方式,而且文章最后也说了,这种有局限性,去到的信息有限,而且针对OFD格式也得继续想办法,那接下来 ...

  6. python提取发票信息发票识别_python 发票识别

    广告关闭 腾讯云11.11云上盛惠 ,精选热门产品助力上云,云服务器首年88元起,买的越多返的越多,最高返5000元! 本接口支持机动车销售统一发票和二手车销售统一发票的识别,包括发票号码.发票代码. ...

  7. 发票管理之发票识别技术的应用

    企业发票管理对于企业财务的重要性不言而喻!今天,我们就来谈谈企业发票管理的那些事儿. 发票管理之所以对企业财务非常重要,是因为发票管理贯穿了整个财务管理的大部分模块儿: 1.应付帐模块:包括了发票管理 ...

  8. 发票识别私有云部署解决财务报销痛点

    发票管理一直是财务管理的一大痛点,由于发票的财务特性和唯一性,在相当长一段时间内,纸质发票还将作为主要的账务凭证存在. 围绕发票的管理工作(邮寄.接收.查验认证.扫描录入.生成凭证-)给企业带来的是长 ...

  9. 发票识别100%智能

    关键词:增值税发票识别 发票扫描识别 发票ocr识别 电子发票识别 发票识别SDK 一.增值税发票识别概述 财务部门天天都在和"钱"打交道,而任何一个环节追根到底都离不开发票两个字 ...

最新文章

  1. CAPEX与OPEX
  2. python3数据类型:Tuple(元组)
  3. yii2使用 db log
  4. java.jsp.jdbc_Java-jsp使用JDBC访问数据库时显示乱码是怎么回事?
  5. android UI布局
  6. 接口测试工具-fiddler的运用
  7. Javascript面向对象全面剖析 —创建对象
  8. 域用户开机自动加入本地管理员组VBS脚本+限制多点登录
  9. c++模板类_在 MCU 上使用 C++ 之模板类进阶与线性 Kalman 算法代码
  10. 扒美女衣服——妄撮游戏实现原理
  11. gridviewnbsp;enableviewstate
  12. matlab数值积分中函数积分的4种方法
  13. 51单片机入门学习------环境搭建
  14. HaaS学习笔记 | 阿里云物联网平台的产品和设备创建明细教程
  15. python杀毒软件程序_使用Python Shells绕过杀毒软件
  16. 重构类关系-Replace Inheritance with Delegation以委托取代继承十一
  17. Android studio推荐插件以及升级后插件丢失问题解决
  18. Excel函数公式大全—HLOOKUP函数
  19. python 正则表达式提取url
  20. 慢阻肺专病网络医院是什么情况?

热门文章

  1. 自定义组件-behaviors
  2. 济南联通软件研究院面试总结
  3. 2023中国(江西)国际预制菜产业展览会
  4. (仿牛客论坛项目)01 - 开发社区首页
  5. win 2008 r2
  6. 服务器智能管理,管好十万台服务器?必须靠智能!
  7. 高级职称17计算机,(高级职称计算机考试.doc
  8. 检测到域名被墙如何解决?域名被墙怎么快速恢复?
  9. FFT快速傅里叶变换详解
  10. linux中w命令使用