文档转换性能测试
在财务系统中使用了两种PDF转换组件
一种是com.artofsolving,也是系统第一次引用的组件:

<!-- https://mvnrepository.com/artifact/com.artofsolving/jodconverter--><dependency><groupId>com.artofsolving</groupId><artifactId>jodconverter</artifactId><version>2.2.1</version></dependency>

另外一种是org.artofsolving,系统第二次引用的上传组件:

<!-- https://mvnrepository.com/artifact/org.artofsolving.jodconverter/jodconverter-core --><dependency><groupId>org.artofsolving.jodconverter</groupId><artifactId>jodconverter-core</artifactId><version>3.0-beta-4</version></dependency>

这两种在项目开发测试过程中有不同的表现,
首先openoffice是4.1.2
支持建议:
* 微软 Windows XP, Vista, Windows 7 或者 Windows 8
* Pentium III 或更高系列处理器
* 256 MB RAM(建议使用 512 MB RAM)
* 高达 1.5 GB 的硬盘可用空间
* 1024x768 分辨率(建议使用更高分辨率),至少 256 色


Dropzone:支持的配置
单个文件最大支持100M上传
没有限制上传文件数量
同时上传文件的数量是3

下面主要看上传后进行pdf转换的效率
测试文件:test.doc,test.ppt,test.xls

通过上面的对比,除了org支持更多格式之外在速度上没有优势,并且转换出来的文字清晰度比com低了一点点。

特别的在文件名有()符号的话Linux上传读取不到。
为什么支持的文件类型有区别,因为Com.artofsolving的源码中DocumentFormatRegistry有多种实现方式,这是一个接口,默认的文档格式注册对象documentFormats list,中就没有MS 2007的:

public class DefaultDocumentFormatRegistry extends BasicDocumentFormatRegistry {public DefaultDocumentFormatRegistry() {final DocumentFormat pdf = new DocumentFormat("Portable Document Format", "application/pdf", "pdf");pdf.setExportFilter(DocumentFamily.DRAWING, "draw_pdf_Export");pdf.setExportFilter(DocumentFamily.PRESENTATION, "impress_pdf_Export");pdf.setExportFilter(DocumentFamily.SPREADSHEET, "calc_pdf_Export");pdf.setExportFilter(DocumentFamily.TEXT, "writer_pdf_Export");addDocumentFormat(pdf);final DocumentFormat swf = new DocumentFormat("Macromedia Flash", "application/x-shockwave-flash", "swf");swf.setExportFilter(DocumentFamily.DRAWING, "draw_flash_Export");swf.setExportFilter(DocumentFamily.PRESENTATION, "impress_flash_Export");addDocumentFormat(swf);final DocumentFormat xhtml = new DocumentFormat("XHTML", "application/xhtml+xml", "xhtml");xhtml.setExportFilter(DocumentFamily.PRESENTATION, "XHTML Impress File");xhtml.setExportFilter(DocumentFamily.SPREADSHEET, "XHTML Calc File");xhtml.setExportFilter(DocumentFamily.TEXT, "XHTML Writer File");addDocumentFormat(xhtml);// HTML is treated as Text when supplied as input, but as an output it is also// available for exporting Spreadsheet and Presentation formatsfinal DocumentFormat html = new DocumentFormat("HTML", DocumentFamily.TEXT, "text/html", "html");html.setExportFilter(DocumentFamily.PRESENTATION, "impress_html_Export");html.setExportFilter(DocumentFamily.SPREADSHEET, "HTML (StarCalc)");html.setExportFilter(DocumentFamily.TEXT, "HTML (StarWriter)");addDocumentFormat(html);final DocumentFormat odt = new DocumentFormat("OpenDocument Text", DocumentFamily.TEXT, "application/vnd.oasis.opendocument.text", "odt");odt.setExportFilter(DocumentFamily.TEXT, "writer8");addDocumentFormat(odt);final DocumentFormat sxw = new DocumentFormat("OpenOffice.org 1.0 Text Document", DocumentFamily.TEXT, "application/vnd.sun.xml.writer", "sxw");sxw.setExportFilter(DocumentFamily.TEXT, "StarOffice XML (Writer)");addDocumentFormat(sxw);final DocumentFormat doc = new DocumentFormat("Microsoft Word", DocumentFamily.TEXT, "application/msword", "doc");doc.setExportFilter(DocumentFamily.TEXT, "MS Word 97");addDocumentFormat(doc);final DocumentFormat rtf = new DocumentFormat("Rich Text Format", DocumentFamily.TEXT, "text/rtf", "rtf");rtf.setExportFilter(DocumentFamily.TEXT, "Rich Text Format");addDocumentFormat(rtf);final DocumentFormat wpd = new DocumentFormat("WordPerfect", DocumentFamily.TEXT, "application/wordperfect", "wpd");addDocumentFormat(wpd);final DocumentFormat txt = new DocumentFormat("Plain Text", DocumentFamily.TEXT, "text/plain", "txt");// set FilterName to "Text" to prevent OOo from tryign to display the "ASCII Filter Options" dialog// alternatively FilterName could be "Text (encoded)" and FilterOptions used to set encoding if neededtxt.setImportOption("FilterName", "Text");txt.setExportFilter(DocumentFamily.TEXT, "Text");addDocumentFormat(txt);final DocumentFormat wikitext = new DocumentFormat("MediaWiki wikitext", "text/x-wiki", "wiki");wikitext.setExportFilter(DocumentFamily.TEXT, "MediaWiki");addDocumentFormat(wikitext);final DocumentFormat ods = new DocumentFormat("OpenDocument Spreadsheet", DocumentFamily.SPREADSHEET, "application/vnd.oasis.opendocument.spreadsheet", "ods");ods.setExportFilter(DocumentFamily.SPREADSHEET, "calc8");addDocumentFormat(ods);final DocumentFormat sxc = new DocumentFormat("OpenOffice.org 1.0 Spreadsheet", DocumentFamily.SPREADSHEET, "application/vnd.sun.xml.calc", "sxc");sxc.setExportFilter(DocumentFamily.SPREADSHEET, "StarOffice XML (Calc)");addDocumentFormat(sxc);final DocumentFormat xls = new DocumentFormat("Microsoft Excel", DocumentFamily.SPREADSHEET, "application/vnd.ms-excel", "xls");xls.setExportFilter(DocumentFamily.SPREADSHEET, "MS Excel 97");addDocumentFormat(xls);final DocumentFormat csv = new DocumentFormat("CSV", DocumentFamily.SPREADSHEET, "text/csv", "csv");csv.setImportOption("FilterName", "Text - txt - csv (StarCalc)");csv.setImportOption("FilterOptions", "44,34,0");  // Field Separator: ','; Text Delimiter: '"'  csv.setExportFilter(DocumentFamily.SPREADSHEET, "Text - txt - csv (StarCalc)");csv.setExportOption(DocumentFamily.SPREADSHEET, "FilterOptions", "44,34,0");addDocumentFormat(csv);final DocumentFormat tsv = new DocumentFormat("Tab-separated Values", DocumentFamily.SPREADSHEET, "text/tab-separated-values", "tsv");tsv.setImportOption("FilterName", "Text - txt - csv (StarCalc)");tsv.setImportOption("FilterOptions", "9,34,0");  // Field Separator: '\t'; Text Delimiter: '"'tsv.setExportFilter(DocumentFamily.SPREADSHEET, "Text - txt - csv (StarCalc)");tsv.setExportOption(DocumentFamily.SPREADSHEET, "FilterOptions", "9,34,0");addDocumentFormat(tsv);final DocumentFormat odp = new DocumentFormat("OpenDocument Presentation", DocumentFamily.PRESENTATION, "application/vnd.oasis.opendocument.presentation", "odp");odp.setExportFilter(DocumentFamily.PRESENTATION, "impress8");addDocumentFormat(odp);final DocumentFormat sxi = new DocumentFormat("OpenOffice.org 1.0 Presentation", DocumentFamily.PRESENTATION, "application/vnd.sun.xml.impress", "sxi");sxi.setExportFilter(DocumentFamily.PRESENTATION, "StarOffice XML (Impress)");addDocumentFormat(sxi);final DocumentFormat ppt = new DocumentFormat("Microsoft PowerPoint", DocumentFamily.PRESENTATION, "application/vnd.ms-powerpoint", "ppt");ppt.setExportFilter(DocumentFamily.PRESENTATION, "MS PowerPoint 97");addDocumentFormat(ppt);final DocumentFormat odg = new DocumentFormat("OpenDocument Drawing", DocumentFamily.DRAWING, "application/vnd.oasis.opendocument.graphics", "odg");odg.setExportFilter(DocumentFamily.DRAWING, "draw8");addDocumentFormat(odg);final DocumentFormat svg = new DocumentFormat("Scalable Vector Graphics", "image/svg+xml", "svg");svg.setExportFilter(DocumentFamily.DRAWING, "draw_svg_Export");addDocumentFormat(svg);}
}
而org中则有源码如下:public class DefaultDocumentFormatRegistry extends SimpleDocumentFormatRegistry {public DefaultDocumentFormatRegistry() {DocumentFormat pdf = new DocumentFormat("Portable Document Format", "pdf", "application/pdf");pdf.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "writer_pdf_Export"));pdf.setStoreProperties(DocumentFamily.SPREADSHEET, Collections.singletonMap("FilterName", "calc_pdf_Export"));pdf.setStoreProperties(DocumentFamily.PRESENTATION, Collections.singletonMap("FilterName", "impress_pdf_Export"));pdf.setStoreProperties(DocumentFamily.DRAWING, Collections.singletonMap("FilterName", "draw_pdf_Export"));this.addFormat(pdf);DocumentFormat swf = new DocumentFormat("Macromedia Flash", "swf", "application/x-shockwave-flash");swf.setStoreProperties(DocumentFamily.PRESENTATION, Collections.singletonMap("FilterName", "impress_flash_Export"));swf.setStoreProperties(DocumentFamily.DRAWING, Collections.singletonMap("FilterName", "draw_flash_Export"));this.addFormat(swf);DocumentFormat html = new DocumentFormat("HTML", "html", "text/html");html.setInputFamily(DocumentFamily.TEXT);html.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "HTML (StarWriter)"));html.setStoreProperties(DocumentFamily.SPREADSHEET, Collections.singletonMap("FilterName", "HTML (StarCalc)"));html.setStoreProperties(DocumentFamily.PRESENTATION, Collections.singletonMap("FilterName", "impress_html_Export"));this.addFormat(html);DocumentFormat odt = new DocumentFormat("OpenDocument Text", "odt", "application/vnd.oasis.opendocument.text");odt.setInputFamily(DocumentFamily.TEXT);odt.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "writer8"));this.addFormat(odt);DocumentFormat sxw = new DocumentFormat("OpenOffice.org 1.0 Text Document", "sxw", "application/vnd.sun.xml.writer");sxw.setInputFamily(DocumentFamily.TEXT);sxw.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "StarOffice XML (Writer)"));this.addFormat(sxw);DocumentFormat doc = new DocumentFormat("Microsoft Word", "doc", "application/msword");doc.setInputFamily(DocumentFamily.TEXT);doc.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "MS Word 97"));this.addFormat(doc);DocumentFormat docx = new DocumentFormat("Microsoft Word 2007 XML", "docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");docx.setInputFamily(DocumentFamily.TEXT);this.addFormat(docx);DocumentFormat rtf = new DocumentFormat("Rich Text Format", "rtf", "text/rtf");rtf.setInputFamily(DocumentFamily.TEXT);rtf.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "Rich Text Format"));this.addFormat(rtf);DocumentFormat wpd = new DocumentFormat("WordPerfect", "wpd", "application/wordperfect");wpd.setInputFamily(DocumentFamily.TEXT);this.addFormat(wpd);DocumentFormat txt = new DocumentFormat("Plain Text", "txt", "text/plain");txt.setInputFamily(DocumentFamily.TEXT);LinkedHashMap txtLoadAndStoreProperties = new LinkedHashMap();txtLoadAndStoreProperties.put("FilterName", "Text (encoded)");txtLoadAndStoreProperties.put("FilterOptions", "utf8");txt.setLoadProperties(txtLoadAndStoreProperties);txt.setStoreProperties(DocumentFamily.TEXT, txtLoadAndStoreProperties);this.addFormat(txt);DocumentFormat wikitext = new DocumentFormat("MediaWiki wikitext", "wiki", "text/x-wiki");wikitext.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "MediaWiki"));DocumentFormat ods = new DocumentFormat("OpenDocument Spreadsheet", "ods", "application/vnd.oasis.opendocument.spreadsheet");ods.setInputFamily(DocumentFamily.SPREADSHEET);ods.setStoreProperties(DocumentFamily.SPREADSHEET, Collections.singletonMap("FilterName", "calc8"));this.addFormat(ods);DocumentFormat sxc = new DocumentFormat("OpenOffice.org 1.0 Spreadsheet", "sxc", "application/vnd.sun.xml.calc");sxc.setInputFamily(DocumentFamily.SPREADSHEET);sxc.setStoreProperties(DocumentFamily.SPREADSHEET, Collections.singletonMap("FilterName", "StarOffice XML (Calc)"));this.addFormat(sxc);DocumentFormat xls = new DocumentFormat("Microsoft Excel", "xls", "application/vnd.ms-excel");xls.setInputFamily(DocumentFamily.SPREADSHEET);xls.setStoreProperties(DocumentFamily.SPREADSHEET, Collections.singletonMap("FilterName", "MS Excel 97"));this.addFormat(xls);DocumentFormat xlsx = new DocumentFormat("Microsoft Excel 2007 XML", "xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");xlsx.setInputFamily(DocumentFamily.SPREADSHEET);this.addFormat(xlsx);DocumentFormat csv = new DocumentFormat("Comma Separated Values", "csv", "text/csv");csv.setInputFamily(DocumentFamily.SPREADSHEET);LinkedHashMap csvLoadAndStoreProperties = new LinkedHashMap();csvLoadAndStoreProperties.put("FilterName", "Text - txt - csv (StarCalc)");csvLoadAndStoreProperties.put("FilterOptions", "44,34,0");csv.setLoadProperties(csvLoadAndStoreProperties);csv.setStoreProperties(DocumentFamily.SPREADSHEET, csvLoadAndStoreProperties);this.addFormat(csv);DocumentFormat tsv = new DocumentFormat("Tab Separated Values", "tsv", "text/tab-separated-values");tsv.setInputFamily(DocumentFamily.SPREADSHEET);LinkedHashMap tsvLoadAndStoreProperties = new LinkedHashMap();tsvLoadAndStoreProperties.put("FilterName", "Text - txt - csv (StarCalc)");tsvLoadAndStoreProperties.put("FilterOptions", "9,34,0");tsv.setLoadProperties(tsvLoadAndStoreProperties);tsv.setStoreProperties(DocumentFamily.SPREADSHEET, tsvLoadAndStoreProperties);this.addFormat(tsv);DocumentFormat odp = new DocumentFormat("OpenDocument Presentation", "odp", "application/vnd.oasis.opendocument.presentation");odp.setInputFamily(DocumentFamily.PRESENTATION);odp.setStoreProperties(DocumentFamily.PRESENTATION, Collections.singletonMap("FilterName", "impress8"));this.addFormat(odp);DocumentFormat sxi = new DocumentFormat("OpenOffice.org 1.0 Presentation", "sxi", "application/vnd.sun.xml.impress");sxi.setInputFamily(DocumentFamily.PRESENTATION);sxi.setStoreProperties(DocumentFamily.PRESENTATION, Collections.singletonMap("FilterName", "StarOffice XML (Impress)"));this.addFormat(sxi);DocumentFormat ppt = new DocumentFormat("Microsoft PowerPoint", "ppt", "application/vnd.ms-powerpoint");ppt.setInputFamily(DocumentFamily.PRESENTATION);ppt.setStoreProperties(DocumentFamily.PRESENTATION, Collections.singletonMap("FilterName", "MS PowerPoint 97"));this.addFormat(ppt);DocumentFormat pptx = new DocumentFormat("Microsoft PowerPoint 2007 XML", "pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");pptx.setInputFamily(DocumentFamily.PRESENTATION);this.addFormat(pptx);DocumentFormat odg = new DocumentFormat("OpenDocument Drawing", "odg", "application/vnd.oasis.opendocument.graphics");odg.setInputFamily(DocumentFamily.DRAWING);odg.setStoreProperties(DocumentFamily.DRAWING, Collections.singletonMap("FilterName", "draw8"));this.addFormat(odg);DocumentFormat svg = new DocumentFormat("Scalable Vector Graphics", "svg", "image/svg+xml");svg.setStoreProperties(DocumentFamily.DRAWING, Collections.singletonMap("FilterName", "draw_svg_Export"));this.addFormat(svg);}
}

原理实现基本差不多,可能通过定制化来实现com的多种文件方式支持。
对于测试文件数量和大小的不同所花费的时间也不同,多文件,中型文件大小采用串行方式进行pdf转换所用时间肯定比较长,这里可以通过改为并行的方式来加快处理速度。
特别的,org有两种创建转换方式,一种支持MS 2007的,另一种不支持:
不支持MS 2007:

DocumentConverter converter = new StreamOpenOfficeDocumentConverter(connection);
但是网上说可以解决:
com.artofsolving.jodconverter.openoffice.connection.OpenOfficeException: conversion failed: could not load input document的异常,也就是文件名在Linux系统中路径解析的问题。
支持:
OfficeManager officeManager = getOfficeManager();
// 连接OpenOffice
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);Com的创建转换对象的方式:
connection.connect();
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
com同时也有通过StreamOpenOfficeDocumentConverter创建转换对象的方式,本系统没有采用该方式。

综上,如果平均上传的文件不大于5M,并且不超过5个文件,系统可以在10秒内处理完成。
在后面的测试中如果文件大于10M,转换频率较高则会消耗系统资源,无法完成转换,后面提交的转换任务在组件的任务队列中将不会被接受。这里有个性能问题,大文件转换(20M左右)有时候会出现超时,而源码中设置的单个pdf转换任务的执行时间是120s,超时则报错,并重新进行连接,处理下一个任务。

在附件上传的开发中出现了很多坑:

  1. 无法读取输入的文件—端口占用,重新启动
  2. 无法解析文件名中的特殊字符串—这里跟阿里云文件上传有关
  3. 端口占用—无法继续处理其他小文件的转换工作
    将连接openoffice的代码修改一下,首先连接已经启动的openoffice服务,否则重启新建连接转换服务。(代码级修复)
  4. 不支持docx,等高版本MS 文档。(添加org组件解决该问题)
  5. 不支持并发处理,不支持大文件转换
    这里阅读源码后发现无法进行优化,所使用的组件基本没有源码,看到的也仅仅是反编译的。
  6. 在Windows上和Linux上的openoffice表现不太一样,主要就是转换时间,对文件格式,文件名,文件类型的解析不太一样。

openoffice jodconverter 文档转换pdf过坑记录相关推荐

  1. Office2007如何将Word文档转换PDF文档

    今天有需求将Word文档转换PDF文档,而公司安装的Office 2007自带没有另存PDF文档功能,所以我们要先给Office 2007 安装一个加载项,就能导出文件并将其保存为 PDF 或 XPS ...

  2. java实现word文档转换pdf文档并且添加水印功能使用插件Aspose.Words

    前段时间,项目需要自动生成word文档,用WordFreeMarker生成word文档后,又要求生成的文档能在浏览器浏览,思来想去,把word文档转成pdf就好了,于是乎研究了一下. 将word文档转 ...

  3. java : word,excel,img,ppt各种文档转换pdf格式以流方式

    前提: 面对各种文件转换pdf格式,我下面写的都是一些方法,其中每个方法都以流的方式进行参数的传递. 通过猿友的帮忙,修改了部分jar包,解决excel转换pdf导致的水印问题~ 源码链接:https ...

  4. Word文档转换PDF格式常见转换技巧汇总

    时下,随着PDF文件格式的优点爆出,PDF文件已成为出版业的新宠.不过大部分工作族仍习惯使用Word文档编辑,而不适应PDF文件编辑,但在传输上又造成了一定的麻烦,这就有了"转换" ...

  5. Java PDF文档转换 — PDF转Excel、SVG转PDF

    概述 Spire.PDF for Java支持将PDF文档高质量地转换为XPS.图片.SVG.Word.HTML和PDF/A格式,以及支持将XPS.HTML文档转换为PDF格式.本文将通过代码演示来介 ...

  6. java实现Word文档转换PDF文档

    最近需要实现在java语言Word文档转成PDF文档的功能,做了一下调研,最后使用aspose-words实现了该功能. 注意,aspose-words 为商业软件,本文仅是使用方法的demo,使用的 ...

  7. jodconverter,openoffice文档转换pdf 所需jar包

    官方網站: http://www.artofsolving.com/opensource/jodconverter 下載地點: http://www.artofsolving.com/opensour ...

  8. 仿百度文库方案[openoffice.org 3+swftools+flexpaper](三) 之 使用JODConverter将office文档转换为pdf...

    第三步,使用JODConverter将office文档转换为pdf JODConverter是一个java的OpenDucument文件转换器,可以进行许多文件格式的转换,它利用 OpenOffice ...

  9. java openoffice 打印_java调用openoffice将office系列文档转换为PDF的示例方法

    前导: 发过程中经常会使用java将office系列文档转换为PDF, 一般都使用微软提供的openoffice+jodconverter 实现转换文档. openoffice既有windows版本也 ...

最新文章

  1. GitHub 官方终于出 App 了!
  2. 微信AI从识物到通用图像搜索的探索揭秘
  3. Spring Boot——RabbitMQ
  4. 实用帖 | 使用Visual Studio开发.NET Core推荐设置
  5. 当杯子中的空气被抽走会发生什么?
  6. hibernate状态_Hibernate状态的自然身份证
  7. 检测和删除多余无用的css
  8. linux chmod命令参数及用法详解--文件文件夹权限设定命令
  9. querydsl动态 sql_Spring data jpa 复杂动态查询方式总结
  10. 10.MongoDB连接模型(长连接vs短连接vs连接池)
  11. javascritp读xml
  12. 一键搞定JavaEE应用,JRE+Tomcat+Mysql-JaveEE绿色运行环境JTM0.9版
  13. python读取npy文件_python – 如何在磁盘上创建一个numpy .npy文件?
  14. 学生管理系统IPO图_C语言学生信息管理系统演示和说明(文件版)
  15. 泰坦尼克号数据_案例三:泰坦尼克号乘客的幸存预测_使用文档_机器数据分析平台...
  16. 修改网络设备在路由器中显示名称
  17. 【eNSP 华为模拟器】三层交换技术及操作步骤【图文】
  18. Tampermonkey中文文档(部分)
  19. 一款常用文档生成工具:Doxygen
  20. python简单的加法问题_Python实现20以内加减法练习

热门文章

  1. COleDateTime和COleDateTimeSpan类
  2. MFC CFileDialog 相对路径
  3. 啥是单点登录及单点登录原理
  4. 家庭光纤宽带有必要升级千兆双频路由器吗?
  5. Rancher 2.2 GA:企业进入应用跨多K8S集群、混合云部署新时代
  6. 无需域名实现frp多端口内网穿透
  7. Coin Change
  8. php注册登录详解_实例讲解php用户注册与登录页面
  9. opencv学习之(三)-LBP算法的研究及其实现
  10. 多目标遗传算法NSGA-II原理详解及算法实现