替换100个模板中部分标签后,并合并100个pdf模板文档、10个400kb的图片为一个pdf文档

耗时20s左右

 1.导入pdfbox

 <dependencies><dependency><groupId>org.apache.pdfbox</groupId><artifactId>pdfbox</artifactId><version>2.0.1</version></dependency><!-- https://mvnrepository.com/artifact/log4j/log4j --><dependency><groupId>log4j</groupId><artifactId>log4j</artifactId><version>1.2.17</version></dependency><dependency><groupId>maven-repository.junit</groupId><artifactId>junit</artifactId><version>4.13.2</version></dependency><!--   word转pdf     --><dependency><groupId>fr.opensagres.xdocreport</groupId><artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId><version>2.0.2</version></dependency></dependencies>

2.上代码

package main.java;import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.log4j.Logger;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.cos.COSArray;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.cos.COSString;
import org.apache.pdfbox.io.MemoryUsageSetting;
import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.apache.pdfbox.pdfparser.PDFStreamParser;
import org.apache.pdfbox.pdfwriter.ContentStreamWriter;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDStream;
import org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.junit.Test;import javax.imageio.ImageIO;
import javax.imageio.ImageReader;
import javax.imageio.stream.ImageInputStream;
import java.awt.image.BufferedImage;
import java.io.*;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;/*** @ClassName PdfboxSummary*/
public class PdfboxSummary {private final static Logger log = Logger.getLogger(PdfboxReplace.class);/*** 将某个文件夹下的pdf模板文件中的标签替换,如果有图片并把图片插入pdf,生成一个pdf文档* @throws Exception*/@Testpublic  void pdfMergeONE() throws Exception {//合并为一个文件的文件路径String outputFile="D:\\merged.pdf";long start = System.currentTimeMillis();System.out.println("===start==="+start);//需要替换的标签数据key放标签,value放替换成哪个值HashMap<String,String> replaceMap = new HashMap();replaceMap.put("<<D1>>","D1D1D1");replaceMap.put("<<F7>>","F7F7F7");replaceMap.put("<<Annual>>","AnnualAnnualAnnual");replaceMap.put("<<E6>>","E6E6E6E6E6");replaceMap.put("<<Month>>","MonthMonthMonth");replaceMap.put("<<EffDate>>","EffDateEffDateEffDate");replaceMap.put("<<R22>>","R22R22R22R22");PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();//pdfMergerUtility.setDestinationFileName(FILEPATH + "test\\merged.pdf");PDDocument destination = new PDDocument();//获取文件目录下要处理的文件名称列表List<String> fileNameList = getFile("D:\\merge");for (int i=0;i<fileNameList.size();i++){String filePath = fileNameList.get(i);String typeStr = filePath.substring(filePath.indexOf(".")+1);//System.out.println(typeStr);if ("pdf".equalsIgnoreCase(typeStr)){//处理pdf文档File pdfFile = new File(filePath);PDDocument pdfDocument = PDDocument.load(pdfFile);for (String key:replaceMap.keySet()) {replacePdfText(pdfDocument, key, replaceMap.get(key));}//把替换完标签的PDDocument pdfdocument合并到目标文件PDDocument destinationpdfMergerUtility.appendDocument(destination,pdfDocument);pdfDocument.close();}if ("jpg".equalsIgnoreCase(typeStr)||"png".equalsIgnoreCase(typeStr)||"jpeg".equalsIgnoreCase(typeStr)){//处理插入图片insertImageToPdf(destination,filePath);}}//pdf合并pdfMergerUtility.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());destination.save(outputFile);destination.close();long end = System.currentTimeMillis();System.out.println("===end==="+end);long total = end - start;System.out.println("===total==="+ total);}/*** 向pdf中插入图片* @param document* @param imagePath* @return* @throws IOException*/private static PDDocument insertImageToPdf(PDDocument document ,String imagePath) throws IOException {PDPage page = new PDPage();//创建PDImageXObject对象PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath,document);//创建PDPageContentStream对象PDPageContentStream contents = new PDPageContentStream(document, page);//插入图片,图片太大按scale比例缩小float pageWidth = page.getMediaBox().getWidth();float pageHeight = page.getMediaBox().getHeight();int imageHeight = pdImage.getHeight();int imageWidth = pdImage.getWidth();float scale = pageWidth/imageWidth;scale = Math.min(1,scale);contents.drawImage(pdImage,  (pageWidth-imageWidth*scale)/2, (pageHeight-imageHeight*scale)/2, imageWidth * scale, imageHeight * scale);document.addPage(page);contents.close();return document;}/*** 替换PDF中标签字符串* @param document* @param searchString* @param replacement* @return* @throws IOException*/private static PDDocument replacePdfText(PDDocument document, String searchString, String replacement) throws IOException {for (PDPage page : document.getPages()) {PDFStreamParser parser = new PDFStreamParser(page);parser.parse();List tokens = parser.getTokens();List<COSArray> keyList = new ArrayList();String pstring = "";boolean isStart = false;for (int j = 0; j < tokens.size(); j++) {Object next = tokens.get(j);if (next instanceof Operator) {Operator op = (Operator) next;//Tj和TJ是在PDF中显示字符串的两个运算符if (op.getName().equals("Tj")) {// Tj takes one operator and that is the string to display so lets update that operator//Tj是一种字符串形式的运算符,所以直接更新就行COSString previous = (COSString) tokens.get(j - 1);String string = previous.getString();string = string.replace(searchString, replacement);previous.setValue(string.getBytes());} else if (op.getName().equals("TJ")) {//Tj是一种字符数组形式的运算符COSArray previous = (COSArray) tokens.get(j - 1);for (int k = 0; k < previous.size(); k++) {Object arrElement = previous.getObject(k);if (arrElement instanceof COSString) {COSString cosString = (COSString) arrElement;String string = cosString.getString();//由于<<A1>>标签解析时,可能被解析成  "<<A1"  ">>"  或 "<"  "<"  "A1"  ">"  ">"//所以下面特殊处理下//System.out.println(string);if (pstring.contains("<<") || string.contains("<<")|| pstring.contains("<") || string.contains("<")) {pstring += string;}}}if (pstring.contains("<<")) {isStart = true;//System.out.println(pstring);}//if (searchString.equals(pstring.trim())) {if (pstring.contains("<<") && pstring.contains(">>")&&searchString.equals(pstring.trim())) {System.out.println(pstring);keyList.add(previous);for (int i = 0; i < keyList.size(); i++) {COSArray item = keyList.get(i);if (i == 0) {COSString cosString2 = (COSString) item.getObject(0);cosString2.setValue(replacement.getBytes());int total = item.size() - 1;for (int k = total; k > 0; k--) {item.remove(k);}} else {while (item.size() > 0) {item.remove(0);}}}keyList.clear();pstring = "";isStart = false;} else {if (isStart) {keyList.add(previous);}}}if (pstring.contains(">>")) {pstring = "";isStart = false;keyList.clear();}}}PDStream updatedStream = new PDStream(document);OutputStream out = updatedStream.createOutputStream(COSName.FLATE_DECODE);ContentStreamWriter tokenWriter = new ContentStreamWriter(out);tokenWriter.writeTokens(tokens);out.close();page.setContents(updatedStream);}return document;}/*** 获取path文件夹下文件的路径* @param path 文件夹路径* @return*/private static List<String> getFile(String path) {File file = new File(path);//获取文件列表File[] array = file.listFiles();List<String> fileNameList = new ArrayList<>(100);for (int i = 0; i < array.length; i++) {if (array[i].isFile()) {fileNameList.add(array[i].getPath());}//else if (array[i].isDirectory()) {//    getFile(array[i].getPath());//}}return fileNameList;}/*** 通过stream流的方式向pdf中插入内容* @param pdfDocument* @param filePath* @return* @throws Exception*/private static PDDocument insertToPdfByStream(PDDocument pdfDocument,String filePath) throws Exception {//Iterator<ImageReader> iterator = ImageIO.getImageReadersByFormatName("tiff");Iterator<ImageReader> iterator = ImageIO.getImageReadersByFormatName("jpeg");if (!iterator.hasNext()) {throw new Exception("The JDK does not support");}ImageReader imageReader = iterator.next();long timeMillis = System.currentTimeMillis();try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream();) {//ImageInputStream imageInputStream = ImageIO.createImageInputStream(new ByteArrayInputStream(tiffByte));ImageInputStream imageInputStream = ImageIO.createImageInputStream(new FileInputStream(filePath));imageReader.setInput(imageInputStream);int size = imageReader.getNumImages(true);for (int i = 0; i < size; i++) {BufferedImage image = imageReader.read(i);pageAddImage(pdfDocument, image);}pdfDocument.save(outputStream);return pdfDocument;//return outputStream.toByteArray();} catch (IOException e) {log.error("To PDF Page Error", e);throw new Exception("Conversion PDF Error");} finally {log.info("to pdf used time: "+(System.currentTimeMillis() - timeMillis));}}/*** 向PDDocument newPdf 中添加图片* @param newPdf* @param image* @throws IOException*/private static void pageAddImage(PDDocument newPdf, BufferedImage image) throws IOException {//PDPage page = new PDPage(PDRectangle.A4);PDPage page = new PDPage();newPdf.addPage(page);float width = page.getMediaBox().getWidth();float height = page.getMediaBox().getHeight();float scale = page.getMediaBox().getWidth() / image.getWidth();scale = Math.min(1, scale);float imgWidth = image.getWidth() * scale;float imgHeight = image.getHeight() * scale;try (PDPageContentStream pageContentStream = new PDPageContentStream(newPdf, page)) {PDImageXObject pdImage = LosslessFactory.createFromImage(newPdf, image);pageContentStream.drawImage(pdImage, (width - imgWidth) / 2, height - image.getHeight() * scale, imgWidth, imgHeight);}}/*** word转pdf  会出现内容丢失的情况* @param docFilePath* @param pdfFilePath* @throws Exception*/private static void wordToPdf(String docFilePath,String pdfFilePath) throws Exception {InputStream docFile = new FileInputStream(docFilePath);XWPFDocument doc = new XWPFDocument(docFile);PdfOptions pdfOptions = PdfOptions.create();OutputStream out = new FileOutputStream(pdfFilePath);PdfConverter.getInstance().convert(doc, out, pdfOptions);doc.close();out.close();System.out.println(pdfFilePath);}
}

参考博文:

用 Java 中的 PDFbox 替换或删除 PDF 中的文本 - IT屋-程序员软件开发技术分享社区

https://www.cnblogs.com/tankqiu/articles/4246776.html

教程 - PDFBox 中文文档 - 文江博客

Word转为PDF(Java实现)_chengp919的博客-CSDN博客_java word转pdf

pdfbox替换模板标签,并将多个pdf合并为一个pdf(有图片插入图片)相关推荐

  1. Java工具类pdfbox将多个pdf合并成一个pdf。

    引入maven依赖: <!-- 将两个或多个单独的PDF文件合并成一个PDF文件--><dependency><groupId>org.apache.pdfbox& ...

  2. java向Word模板中替换书签数据,插入图片,插入复选框,插入Word中表格的行数据,删除表格行数据

    java向Word模板中替换书签数据,插入图片,插入复选框,插入Word中表格的行数据,删除表格行数据 使用插件:spire.doc 创建工具类,上代码: import com.spire.doc.D ...

  3. java pdf模板 表单 多页_java使用itext操作填充pdf模板,(根据一个模板生成多页数据)...

    //我自己的数据源 final List rows = pa.getRows(); //每一条数据代表一个pdf表格 Listlist = new ArrayList(); try { PdfStam ...

  4. 如何在复古PPT模板内插入图片

    大家在制作复古PPT时,为了突显PPT复古的风格时,都会在PPT当中插入大量的有着复古元素的图片.可是有部分小伙伴刚刚接触PPT操作,对于很多操作还不是很了解.下面小编就告诉大家如何在复古PPT模板内 ...

  5. is属性用法 vue_vue组件讲解(is属性的用法)模板标签替换操作

    vue中is的属性引入是为了解决dom结构中对放入html的元素有限制的问题,譬如ul里面要接上li的标签,引入is的属性后,你完全可以写成这样 这样会保证dom结构在浏览器的正常渲染,尽量避免在不正 ...

  6. php嵌套模板,thinkphp3.1自定义模板标签嵌套实现

    thinkphp3.1自定义模板标签嵌套实现 之前做的自定义标签,属性里面读取不到上层标签的值,然后找了很多文档发现没有能解决的,然后就自己研究了一下,搞出来了,下面是方法,绝对能用,不能用我吃了它. ...

  7. ECMall2.x模板制作入门系列之2(模板标签/语法)

    ECMall2.x模板制作入门系列之2(模板标签/语法)今天给大家带来一个模板语法的教程.希望能为ECMall模板制作者提供一份参考资料.如有问题.建议和意见,欢迎提出.在ECMall模板中,用&qu ...

  8. aspcms模板转php,aspcms转换zzzcms,aspcms迁移数据,aspcms替换模板,aspcms模板转换-下载-zzzcms官网-开源免费建站系统...

    V1.5 Build20180129 1.5更新说明: 数据更准确,模板替换更精准 1.2更新说明: 1.修复模板中既有gbk还有utf模板乱码的问题. 2.支持替换数据库中的内链,自动匹配为新地址. ...

  9. Django视图层:URL的反向解析(主路由include之namespace,子路由之name,模板标签{%url%},视图reverse()函数,反向解析示例,URL命名空间

    一.视图层The view layer Django 具有 "视图" 的概念,负责处理用户的请求并返回响应. 二.URL反向解析Reverse resolution of URLs ...

最新文章

  1. hnswlib RuntimeError: Cannot return the results in a contigious 2D array. Probably ef or M is to sma
  2. python【数据结构与算法】内置函数 zip() 函数(看不懂你来打我)
  3. flutter控制显示隐藏_leaflet中如何通过透明度控制layerGroup的显示隐藏
  4. go goroutine主死从随
  5. MySQL(六)常用语法和数据类型
  6. HDLBits答案(16)_Verilog有限状态机(3)
  7. WGZX:javaScript 学习心得--1
  8. 第二十二章:洗净皮衣
  9. java list 改变变量的值_3、list改变指针还是改变值
  10. vnpy策略回测如何设置滑点手续费和size
  11. 视频直接变漫画!GAN又有了新玩法 | Demo+代码+论文
  12. Ubuntu 安装 Composer 图文详解
  13. KVC 与 KVO 理解
  14. Java基础知识笔记第八章:常用的实体类
  15. 【Spring揭秘】Spring简介
  16. SOUI Text控件不同状态设置不同颜色的方法
  17. 富士通Fujitsu DPK2181H Pro 打印机驱动
  18. 需求调研第二篇--调研准备阶段避免哪些错误
  19. 笔记本电脑开wifi热点小技巧,bat文件实现
  20. win10计算机怎么拨号上网,win10如何设置宽带拨号连接

热门文章

  1. 别不信,学IT这些网站够猛!
  2. HS4、HS6 USB示波器,USB虚拟示波器,多通道数据分析软件功能图解
  3. Tensorflow搭建卷积神经网络识别手写英语字母
  4. 安装淘宝镜像的cnpm
  5. 学校计算机室班班通的使用巡检维护记录表,学校班班通、设备使用记录簿表.doc...
  6. python填充空值_python dataframe均值填充知识点详解
  7. vue本地项目——小黑记事本
  8. Netty——LengthFieldBasedFrameDecoder
  9. PDF中加java空白可写字段,java 在已有内容的PDF上添加空白数字签名域
  10. ubuntu18.04向日蔡远程软件安装失败