Java关于word转pdf工具方法的几种解决方案和我遇到一些问题（html中转、jacob、Docx4j）

本文参考文章如下

java转pdf（html转为pdf）,解决中文乱码，标签不规范等问题

Java 使用 jacob 将 word 文档转换为 pdf 文件

java实现word生成并转pdf

docx4j word转pdf 中文宋体（中文正文）类型转换乱码

java实现html转pdf，支持中文，css以及中文换行

java项目实现html转pdf的需求（支持中文和CSS样式）

使用 flying-saucer-pdf 实现html转换pdf补充

1、优先学习这些大佬的文章，比我更加优秀，也更加详尽
2、同时因有感网上能查到的资料水准不齐，优秀文章难找，为了避免后来者重复造轮子，也为了提醒自己不断学习，特发此文以总结和整合
3、本人才疏学浅，加之做完需求后没有及时记录，因此只能尽力完成本文，如果各位读者在使用中出现问题或者能解决文中遇到的问题，欢迎回复和讨论

前段时间工作上有一个需求，要将生成的word文档转为pdf的形式，具体要求如下：

1、最好是能够跨系统运行(windows和linux)
2、尽可能少的额外配置且不能安装第三方软件
3、格式不能大乱，要尽可能还原

其实如果不要求跨系统的话很简单的，因此这一需求的难点其实在于跨系统的运行上，还好老大给了两天时间让我慢慢摸，经过一段时间的学(bai)习(du)，研(goo)究(gle)和探(stack)讨(overflow)，我根据需求整理出了以下思路并实现demo，这其中其中存在的问题我也会标明出来

1、poi直接转（复杂格式下极度混乱，放弃）
2、html中转（文档整体位移）
3、aspose（正式版jar包收费，放弃）
4、jacob（不能跨平台，目前选用的解决方案之一）
5、Docx4j（空格丢失，目前选用的解决方案之一）

ps: 除此之外还有使用第三方软件的解决思路如libreoffice和openoffice，因为需求不允许就不多介绍了

ps2: 同时，如果只是个人学习使用的话我比较推荐aspose的试用版，使用方便，代码简单

html中转

这一思路主要是通过失败的方法1改出来的，方法1格式丢失过于严重，考虑到html对于格式保存较为完好，因此尝试通过html中转
使用saucer来转一定程度上比itext要舒服很多

踩到的坑：
1、当文档字体为“宋体(中文正文)”时，字体似乎是会被识别为Calibri而不是SimSun从而丢失，我尝试在html中进行强转但是没有效果，最终决定将word文件中的“宋体(中文正文)”全都强转为其他可识别字体解决问题
2、特别注意poi和xdocreport的版本，能看出这是比较老的版本了，因为在之后的版本（例如我们常用的poi 4.0）中，poi把xdocreport的类加载路径修改了，但是maven上xdocreport本身因为一直没有更新所以会出现加载失败的问题

**未解决的问题：**生成的pdf文档格式整体向一侧偏移，造成部分文字丢失

先上maven依赖

             <!-- https://mvnrepository.com/artifact/org.jsoup/jsoup --><dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.11.3</version></dependency><dependency><groupId>org.apache.commons</groupId><artifactId>commons-compress</artifactId><version>1.19</version></dependency><!--iText and flying saucer--><dependency><groupId>org.apache.commons</groupId><artifactId>commons-collections4</artifactId><version>4.2</version></dependency><dependency><groupId>org.apache.xmlbeans</groupId><artifactId>xmlbeans</artifactId><version>3.1.0</version></dependency><dependency><groupId>com.itextpdf</groupId><artifactId>itextpdf</artifactId><version>5.5.13</version></dependency><dependency><groupId>com.itextpdf.tool</groupId><artifactId>xmlworker</artifactId><version>5.5.13</version></dependency><dependency><groupId>com.itextpdf</groupId><artifactId>itext-asian</artifactId><version>5.2.0</version></dependency><dependency><groupId>org.xhtmlrenderer</groupId><artifactId>flying-saucer-pdf</artifactId><version>9.0.3</version></dependency><!--poi--><dependency><groupId>xerces</groupId><artifactId>xercesImpl</artifactId><version>2.11.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>3.14</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml-schemas</artifactId><version>3.14</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-scratchpad</artifactId><version>3.14</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml</artifactId><version>3.14</version></dependency><!--XWPF--><dependency><groupId>fr.opensagres.xdocreport</groupId><artifactId>xdocreport</artifactId><version>2.0.2</version></dependency><dependency><groupId>fr.opensagres.xdocreport</groupId><artifactId>fr.opensagres.xdocreport.document</artifactId><version>2.0.2</version></dependency><dependency><groupId>fr.opensagres.xdocreport</groupId><artifactId>org.apache.poi.xwpf.converter.core</artifactId><version>1.0.6</version></dependency><dependency><groupId>fr.opensagres.xdocreport</groupId><artifactId>org.apache.poi.xwpf.converter.pdf</artifactId><version>1.0.6</version></dependency><dependency><groupId>fr.opensagres.xdocreport</groupId><artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId><version>1.0.6</version></dependency>

然后是主要功能代码部分


/*** docx格式word转换为html** @param fileName*            docx文件路径* @param outPutFile*            html输出文件路径* @param imagePath*            图片路径* @throws TransformerException* @throws IOException* @throws ParserConfigurationException*/public static void docx2Html(String fileName, String outPutFile,String imagePath) throws TransformerException, IOException, ParserConfigurationException {String fileOutName = outPutFile;long startTime = System.currentTimeMillis();XWPFDocument document = new XWPFDocument(new FileInputStream(fileName));List<XWPFParagraph> paragraphs = document.getParagraphs();//1、强转中文格式类型，解决中文消失问题for (XWPFParagraph paragraph:paragraphs) {List<XWPFRun> runs = paragraph.getRuns();for (XWPFRun run:runs) {if(run.getFontFamily()=="Calibri"){run.setFontFamily("SimHei");}run.setFontFamily("SimSun", XWPFRun.FontCharRange.ascii);}}XHTMLOptions options = XHTMLOptions.create().indent(4);// 导出图片File imageFolder = new File(imagePath);options.setExtractor(new FileImageExtractor(imageFolder));// URI resolveroptions.URIResolver(new FileURIResolver(imageFolder));File outFile = new File(fileOutName);outFile.getParentFile().mkdirs();OutputStreamWriter outputStreamWriter = new OutputStreamWriter(new FileOutputStream(outFile), StandardCharsets.UTF_8);XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance();xhtmlConverter.convert(document, outputStreamWriter, options);String html = FileUtil.readFileToString(fileOutName, "html");OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(new File(fileOutName)), StandardCharsets.UTF_8);//2、添加标准html头部，解决中文乱码问题html="<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"+html;int i = html.indexOf("<head>");StringBuffer buffer = new StringBuffer(html);html=buffer.insert(i+6,"<style type=\"text/css\">\n" +"    *\n" +"    {\n" +"        padding-left: 20pt;\n" +"        padding-right: -20pt;\n" +"    }\n" +"</style>").toString();writer.write(html);writer.flush();writer.close();System.out.println("Generate " + fileOutName + " with " + (System.currentTimeMillis() - startTime) + " ms.");}
/*** docx格式word转换为html** @param html*            html文件* @param pdfName*            pdf文件名* @param fontDir*            指定字体文件夹路径* @Param pdfDestPath*             pdf输出路径*/public static void html2pdf(String html, String pdfName, String fontDir,String pdfDestPath) {try {ByteArrayOutputStream os = new ByteArrayOutputStream();ITextRenderer renderer = new ITextRenderer();ITextFontResolver fontResolver = (ITextFontResolver) renderer.getSharedContext().getFontResolver();//遍历添加中文字体库File f = new File(fontDir);if (f.isDirectory()) {File[] files = f.listFiles((dir, name) -> {String lower = name.toLowerCase();return lower.endsWith(".otf") || lower.endsWith(".ttf") || lower.endsWith(".ttc");});for (int i = 0; i < files.length; i++) {fontResolver.addFont(files[i].getAbsolutePath(), BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);}}//添加字体库结束renderer.setDocumentFromString(html);renderer.layout();renderer.createPDF(os);renderer.finishPDF();byte[] buff = os.toByteArray();//保存到磁盘上FileUtil.byte2File(buff,pdfDestPath,pdfName);} catch (Exception e) {e.printStackTrace();}}

使用Jacob

这是不考虑跨平台的情况下的解决方案，不论是从配置复杂程度还是从转换的完成度来说都是极好的，几乎没有问题（实际上就是在调用本地office的另存为功能吧）
缺陷
1、不能跨平台
2、要求windows上有office和SaveAsPdf插件
3、需要把jacob的dll文件存放到本地的jre/bin中，一定程度上造成污染

链接：https://pan.baidu.com/s/1eBCOnYkem2XdwXI8d_A7_g
提取码：813q

先上maven依赖

  <dependency><groupId>com.hynnet</groupId><artifactId>jacob</artifactId><version>1.18</version></dependency>

再来主要功能代码

    private static final int wdFormatPDF = 17; // PDF 格式public static void word2PDF(String sfileName, String toFileName) {System.out.println("启动 Word...");long start = System.currentTimeMillis();ActiveXComponent app = null;Dispatch doc = null;try {app = new ActiveXComponent("Word.Application");app.setProperty("Visible", new Variant(false));Dispatch docs = app.getProperty("Documents").toDispatch();doc = Dispatch.call(docs, "Open", sfileName).toDispatch();System.out.println("打开文档..." + sfileName);System.out.println("转换文档到 PDF..." + toFileName);File tofile = new File(toFileName);if (tofile.exists()) {tofile.delete();}Dispatch.call(doc, "SaveAs", toFileName, // FileNamewdFormatPDF);long end = System.currentTimeMillis();System.out.println("转换完成..用时：" + (end - start) + "ms.");} catch (Exception e) {System.out.println("========Error:转换失败：" + e.getMessage());} finally {Dispatch.call(doc, "Close", false);System.out.println("关闭文档");if (app != null)app.invoke("Quit", new Variant[]{});}// 如果没有这句话,winword.exe进程将不会关闭ComThread.Release();}

使用Docx4j

算是一个比较完善的方案了，没有额外的配置需要添加，也不需要安装什么插件，只要maven导包即可，且支持跨平台操作，转换完善程度尚可

踩到的坑：
1、根据项目不同可能会存在依赖冲突，需要检查maven依赖树解决
2、还是会有中文乱码问题，需要导入中文库，最好是强转一下字体(如方案2中的“宋体（中文正文）这里也会变成乱码”)
3、linux上要求你先有安装中文库才能做字体映射

未解决的问题： 空格丢失，格式略微有点乱

先来maven依赖


<!---->
<!--doc4j--><dependency><groupId>com.itextpdf</groupId><artifactId>itextpdf</artifactId><version>5.4.3</version></dependency><dependency><groupId>org.docx4j</groupId><artifactId>docx4j</artifactId><version>6.1.2</version></dependency><dependency><groupId>org.docx4j</groupId><artifactId>docx4j-export-fo</artifactId><version>6.0.0</version></dependency><!--poi--><dependency><groupId>xerces</groupId><artifactId>xercesImpl</artifactId><version>2.11.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>4.0.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml-schemas</artifactId><version>4.0.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-scratchpad</artifactId><version>4.0.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml</artifactId><version>4.0.0</version></dependency>

主要功能代码

*** word（docx）转pdf** @param wordPath docx文件路径* @param pdfOutPath 文件输出路径*/public static void convertDocx2Pdf(String wordPath, String pdfOutPath) throws IOException {long startTime = System.currentTimeMillis();OutputStream os = null;InputStream is = null;FileInputStream fis = null;FileOutputStream fos = null;try {fis = new FileInputStream(wordPath);XWPFDocument document = new XWPFDocument(fis);List<XWPFParagraph> paragraphs = document.getParagraphs();for (XWPFParagraph paragraph : paragraphs) {List<XWPFRun> runs = paragraph.getRuns();for (XWPFRun run : runs) {run.setFontFamily("宋体");}}fos = new FileOutputStream(wordPath);document.write(fos);} catch (IOException e) {e.printStackTrace();} finally {if (fis != null) {fis.close();}if (fos != null) {fos.flush();fos.close();}}try {is = new FileInputStream(new File(wordPath));WordprocessingMLPackage mlPackage = WordprocessingMLPackage.load(is);Mapper fontMapper = new IdentityPlusMapper();fontMapper.put("隶书", PhysicalFonts.get("LiSu"));fontMapper.put("宋体", PhysicalFonts.get("SimSun"));fontMapper.put("微软雅黑", PhysicalFonts.get("Microsoft Yahei"));fontMapper.put("黑体", PhysicalFonts.get("SimHei"));fontMapper.put("楷体", PhysicalFonts.get("KaiTi"));fontMapper.put("新宋体", PhysicalFonts.get("NSimSun"));fontMapper.put("华文行楷", PhysicalFonts.get("STXingkai"));fontMapper.put("华文仿宋", PhysicalFonts.get("STFangsong"));fontMapper.put("宋体扩展", PhysicalFonts.get("simsun-extB"));fontMapper.put("仿宋", PhysicalFonts.get("FangSong"));fontMapper.put("仿宋_GB2312", PhysicalFonts.get("FangSong_GB2312"));fontMapper.put("幼圆", PhysicalFonts.get("YouYuan"));fontMapper.put("华文宋体", PhysicalFonts.get("STSong"));fontMapper.put("华文中宋", PhysicalFonts.get("STZhongsong"));mlPackage.setFontMapper(fontMapper);os = new java.io.FileOutputStream(pdfOutPath);//docx4j  docx转pdfFOSettings foSettings = Docx4J.createFOSettings();foSettings.setWmlPackage(mlPackage);Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);is.close();//关闭输入流os.close();//关闭输出流//输出System.out.println("转换完成" + (System.currentTimeMillis() - startTime) + " ms.");} catch (Exception e) {e.printStackTrace();try {if (is != null) {is.close();}if (os != null) {os.close();}} catch (Exception ex) {ex.printStackTrace();}} finally {File file = new File(wordPath);if (file != null && file.isFile() && file.exists()) {file.delete();}}}