文章目录

一、前言
- 1、aspose
- 2 、poi + pdfbox
- 3 spire
二、将文件转换成html字符串
- 1、将word文件转成html字符串
- - 1.1 使用aspose
  - 1.2 使用poi
  - 1.3 使用spire
- 2、将pdf文件转成html字符串
- - 2.1 使用aspose
  - 2.2 使用 poi + pbfbox
  - 2.3 使用spire
- 3、将excel文件转成html字符串
- - 3.1 使用aspose
  - 3.2 使用poi + pdfbox
  - 3.3 使用spire
- 三、将文件转换成html，并生成html文件
- - FileUtils类将html字符串生成html文件示例：
- 1、将word文件转换成html文件
- - 1.1 使用aspose
  - 1.2 使用poi + pdfbox
  - 1.3 使用spire
- 2、将pdf文件转换成html文件
- - 2.1 使用aspose
  - 2.2 使用poi + pdfbox
  - 2.3 使用spire
- 3、将excel文件转换成html文件
- - 3.1 使用aspose
  - 3.2 使用poi
  - 3.3 使用spire
- 四、总结

一、前言

    实现文档在线预览的方式除了上篇文章[《文档在线预览（一）通过将txt、word、pdf转成图片实现在线预览功能》](https://blog.csdn.net/q2qwert/article/details/130884607)说的将文档转成图片的实现方式外，还有转成pdf，前端通过pdf.js、pdfobject.js等插件来实现在线预览，以及本文将要说到的将文档转成html的方式来实现在线预览。

以下代码分别提供基于aspose、pdfbox、spire来实现来实现txt、word、pdf、ppt、word等文件转图片的需求。

1、aspose

Aspose 是一家致力于.Net ,Java,SharePoint,JasperReports和SSRS组件的提供商，数十个国家的数千机构都有用过aspose组件，创建、编辑、转换或渲染 Office、OpenOffice、PDF、图像、ZIP、CAD、XPS、EPS、PSD 和更多文件格式。注意aspose是商用组件，未经授权导出文件里面都是是水印（尊重版权，远离破解版）。

需要在项目的pom文件里添加如下依赖

        <dependency><groupId>com.aspose</groupId><artifactId>aspose-words</artifactId><version>23.1</version></dependency><dependency><groupId>com.aspose</groupId><artifactId>aspose-pdf</artifactId><version>23.1</version></dependency><dependency><groupId>com.aspose</groupId><artifactId>aspose-cells</artifactId><version>23.1</version></dependency><dependency><groupId>com.aspose</groupId><artifactId>aspose-slides</artifactId><version>23.1</version></dependency>

2 、poi + pdfbox

因为aspose和spire虽然好用，但是都是是商用组件，所以这里也提供使用开源库操作的方式的方式。

POI是Apache软件基金会用Java编写的免费开源的跨平台的 Java API，Apache POI提供API给Java程序对Microsoft Office格式档案读和写的功能。

Apache PDFBox是一个开源Java库，支持PDF文档的开发和转换。使用此库，您可以开发用于创建，转换和操作PDF文档的Java程序。

需要在项目的pom文件里添加如下依赖

      <dependency><groupId>org.apache.pdfbox</groupId><artifactId>pdfbox</artifactId><version>2.0.4</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>5.2.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml</artifactId><version>5.2.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-scratchpad</artifactId><version>5.2.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-excelant</artifactId><version>5.2.0</version></dependency>

3 spire

spire一款专业的Office编程组件，涵盖了对Word、Excel、PPT、PDF等文件的读写、编辑、查看功能。spire提供免费版本，但是存在只能导出前3页以及只能导出前500行的限制，只要达到其一就会触发限制。需要超出前3页以及只能导出前500行的限制的这需要购买付费版（尊重版权，远离破解版）。这里使用免费版进行演示。

spire在添加pom之前还得先添加maven仓库来源

      <repository><id>com.e-iceblue</id><name>e-iceblue</name><url>https://repo.e-iceblue.cn/repository/maven-public/</url></repository>

接着在项目的pom文件里添加如下依赖

免费版：

      <dependency><groupId>e-iceblue</groupId><artifactId>spire.office.free</artifactId><version>5.3.1</version></dependency>

付费版版：

      <dependency><groupId>e-iceblue</groupId><artifactId>spire.office</artifactId><version>5.3.1</version></dependency>

二、将文件转换成html字符串

1、将word文件转成html字符串

1.1 使用aspose

public static String wordToHtmlStr(String wordPath) {try {Document doc = new Document(wordPath); // Address是将要被转化的word文档String htmlStr = doc.toString();return htmlStr;} catch (Exception e) {e.printStackTrace();}return null;}

验证结果：

1.2 使用poi

public String wordToHtmlStr(String wordPath) throws TransformerException, IOException, ParserConfigurationException {String htmlStr = null;String ext = wordPath.substring(wordPath.lastIndexOf("."));if (ext.equals(".docx")) {htmlStr = word2007ToHtmlStr(wordPath);} else if (ext.equals(".doc")){htmlStr = word2003ToHtmlStr(wordPath);} else {throw new RuntimeException("文件格式不正确");}return htmlStr;}public String word2007ToHtmlStr(String wordPath) throws IOException {// 使用内存输出流try(ByteArrayOutputStream out = new ByteArrayOutputStream()){word2007ToHtmlOutputStream(wordPath, out);return out.toString();}}private void word2007ToHtmlOutputStream(String wordPath,OutputStream out) throws IOException {ZipSecureFile.setMinInflateRatio(-1.0d);InputStream in = Files.newInputStream(Paths.get(wordPath));XWPFDocument document = new XWPFDocument(in);XHTMLOptions options = XHTMLOptions.create().setIgnoreStylesIfUnused(false).setImageManager(new Base64EmbedImgManager());// 使用内存输出流XHTMLConverter.getInstance().convert(document, out, options);}private String word2003ToHtmlStr(String wordPath) throws TransformerException, IOException, ParserConfigurationException {org.w3c.dom.Document htmlDocument = word2003ToHtmlDocument(wordPath);// Transform document to stringStringWriter writer = new StringWriter();TransformerFactory tf = TransformerFactory.newInstance();Transformer transformer = tf.newTransformer();transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");transformer.setOutputProperty(OutputKeys.METHOD, "html");transformer.setOutputProperty(OutputKeys.INDENT, "yes");transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");transformer.transform(new DOMSource(htmlDocument), new StreamResult(writer));return writer.toString();}private org.w3c.dom.Document word2003ToHtmlDocument(String wordPath) throws IOException, ParserConfigurationException {InputStream input = Files.newInputStream(Paths.get(wordPath));HWPFDocument wordDocument = new HWPFDocument(input);WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());wordToHtmlConverter.setPicturesManager((content, pictureType, suggestedName, widthInches, heightInches) -> {System.out.println(pictureType);if (PictureType.UNKNOWN.equals(pictureType)) {return null;}BufferedImage bufferedImage = ImgUtil.toImage(content);String base64Img = ImgUtil.toBase64(bufferedImage, pictureType.getExtension());//  带图片的word，则将图片转为base64编码，保存在一个页面中StringBuilder sb = (new StringBuilder(base64Img.length() + "data:;base64,".length()).append("data:;base64,").append(base64Img));return sb.toString();});// 解析word文档wordToHtmlConverter.processDocument(wordDocument);return wordToHtmlConverter.getDocument();}

1.3 使用spire

 public String wordToHtmlStr(String wordPath) throws IOException {try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {Document document = new Document();document.loadFromFile(wordPath);document.saveToFile(outputStream, FileFormat.Html);return outputStream.toString();}}

2、将pdf文件转成html字符串

2.1 使用aspose

public static String pdfToHtmlStr(String pdfPath) throws IOException, ParserConfigurationException {PDDocument document = PDDocument.load(new File(pdfPath));Writer writer = new StringWriter();new PDFDomTree().writeText(document, writer);writer.close();document.close();return writer.toString();}

验证结果：

2.2 使用 poi + pbfbox

public String pdfToHtmlStr(String pdfPath) throws IOException, ParserConfigurationException {PDDocument document = PDDocument.load(new File(pdfPath));Writer writer = new StringWriter();new PDFDomTree().writeText(document, writer);writer.close();document.close();return writer.toString();}

2.3 使用spire

public String pdfToHtmlStr(String pdfPath) throws IOException, ParserConfigurationException {try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {PdfDocument pdf = new PdfDocument();pdf.loadFromFile(pdfPath);return outputStream.toString();}}

3、将excel文件转成html字符串

3.1 使用aspose

public static String excelToHtmlStr(String excelPath) throws Exception {FileInputStream fileInputStream = new FileInputStream(excelPath);Workbook workbook = new XSSFWorkbook(fileInputStream);DataFormatter dataFormatter = new DataFormatter();FormulaEvaluator formulaEvaluator = workbook.getCreationHelper().createFormulaEvaluator();Sheet sheet = workbook.getSheetAt(0);StringBuilder htmlStringBuilder = new StringBuilder();htmlStringBuilder.append("<html><head><title>Excel to HTML using Java and POI library</title>");htmlStringBuilder.append("<style>table, th, td { border: 1px solid black; }</style>");htmlStringBuilder.append("</head><body><table>");for (Row row : sheet) {htmlStringBuilder.append("<tr>");for (Cell cell : row) {CellType cellType = cell.getCellType();if (cellType == CellType.FORMULA) {formulaEvaluator.evaluateFormulaCell(cell);cellType = cell.getCachedFormulaResultType();}String cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);htmlStringBuilder.append("<td>").append(cellValue).append("</td>");}htmlStringBuilder.append("</tr>");}htmlStringBuilder.append("</table></body></html>");return htmlStringBuilder.toString();}

返回的html字符串：

<html><head><title>Excel to HTML using Java and POI library</title><style>table, th, td { border: 1px solid black; }</style></head><body><table><tr><td>序号</td><td>姓名</td><td>性别</td><td>联系方式</td><td>地址</td></tr><tr><td>1</td><td>张晓玲</td><td>女</td><td>11111111111</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>2</td><td>王小二</td><td>男</td><td>1222222</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>1</td><td>张晓玲</td><td>女</td><td>11111111111</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>2</td><td>王小二</td><td>男</td><td>1222222</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>1</td><td>张晓玲</td><td>女</td><td>11111111111</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>2</td><td>王小二</td><td>男</td><td>1222222</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>1</td><td>张晓玲</td><td>女</td><td>11111111111</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>2</td><td>王小二</td><td>男</td><td>1222222</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>1</td><td>张晓玲</td><td>女</td><td>11111111111</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>2</td><td>王小二</td><td>男</td><td>1222222</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>1</td><td>张晓玲</td><td>女</td><td>11111111111</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>2</td><td>王小二</td><td>男</td><td>1222222</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>1</td><td>张晓玲</td><td>女</td><td>11111111111</td><td>上海市浦东新区xx路xx弄xx号</td></tr><tr><td>2</td><td>王小二</td><td>男</td><td>1222222</td><td>上海市浦东新区xx路xx弄xx号</td></tr></table></body></html>

3.2 使用poi + pdfbox

public String excelToHtmlStr(String excelPath) throws Exception {FileInputStream fileInputStream = new FileInputStream(excelPath);try (Workbook workbook = WorkbookFactory.create(new File(excelPath))){DataFormatter dataFormatter = new DataFormatter();FormulaEvaluator formulaEvaluator = workbook.getCreationHelper().createFormulaEvaluator();org.apache.poi.ss.usermodel.Sheet sheet = workbook.getSheetAt(0);StringBuilder htmlStringBuilder = new StringBuilder();htmlStringBuilder.append("<html><head><title>Excel to HTML using Java and POI library</title>");htmlStringBuilder.append("<style>table, th, td { border: 1px solid black; }</style>");htmlStringBuilder.append("</head><body><table>");for (Row row : sheet) {htmlStringBuilder.append("<tr>");for (Cell cell : row) {CellType cellType = cell.getCellType();if (cellType == CellType.FORMULA) {formulaEvaluator.evaluateFormulaCell(cell);cellType = cell.getCachedFormulaResultType();}String cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);htmlStringBuilder.append("<td>").append(cellValue).append("</td>");}htmlStringBuilder.append("</tr>");}htmlStringBuilder.append("</table></body></html>");return htmlStringBuilder.toString();}}

3.3 使用spire

public String excelToHtmlStr(String excelPath) throws Exception {try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {Workbook workbook = new Workbook();workbook.loadFromFile(excelPath);workbook.saveToStream(outputStream, com.spire.xls.FileFormat.HTML);return outputStream.toString();}}

三、将文件转换成html，并生成html文件

有时我们是需要的不仅仅返回html字符串，而是需要生成一个html文件这时应该怎么做呢？一个改动量小的做法就是使用org.apache.commons.io包下的FileUtils工具类写入目标地址：

FileUtils类将html字符串生成html文件示例：

首先需要引入pom：

      <dependency><groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.8.0</version></dependency>

1、将word文件转换成html文件

word原文件效果：

1.1 使用aspose

public static void wordToHtml(String wordPath, String htmlPath) {try {File sourceFile = new File(wordPath);String path = htmlPath + File.separator + sourceFile.getName().substring(0, sourceFile.getName().lastIndexOf(".")) + ".html";File file = new File(path); // 新建一个空白pdf文档FileOutputStream os = new FileOutputStream(file);Document doc = new Document(wordPath); // Address是将要被转化的word文档HtmlSaveOptions options = new HtmlSaveOptions();options.setExportImagesAsBase64(true);options.setExportRelativeFontSize(true);doc.save(os, options);} catch (Exception e) {e.printStackTrace();}}

转换成html的效果：

1.2 使用poi + pdfbox

public void wordToHtml(String wordPath, String htmlPath) throws TransformerException, IOException, ParserConfigurationException {htmlPath = FileUtil.getNewFileFullPath(wordPath, htmlPath, "html");String ext = wordPath.substring(wordPath.lastIndexOf("."));if (ext.equals(".docx")) {word2007ToHtml(wordPath, htmlPath);} else if (ext.equals(".doc")){word2003ToHtml(wordPath, htmlPath);} else {throw new RuntimeException("文件格式不正确");}}public void word2007ToHtml(String wordPath, String htmlPath) throws TransformerException, IOException, ParserConfigurationException {//try(OutputStream out = Files.newOutputStream(Paths.get(path))){try(FileOutputStream out = new FileOutputStream(htmlPath)){word2007ToHtmlOutputStream(wordPath, out);}}private void word2007ToHtmlOutputStream(String wordPath,OutputStream out) throws IOException {ZipSecureFile.setMinInflateRatio(-1.0d);InputStream in = Files.newInputStream(Paths.get(wordPath));XWPFDocument document = new XWPFDocument(in);XHTMLOptions options = XHTMLOptions.create().setIgnoreStylesIfUnused(false).setImageManager(new Base64EmbedImgManager());// 使用内存输出流XHTMLConverter.getInstance().convert(document, out, options);}public void word2003ToHtml(String wordPath, String htmlPath) throws TransformerException, IOException, ParserConfigurationException {org.w3c.dom.Document htmlDocument = word2003ToHtmlDocument(wordPath);// 生成html文件地址try(OutputStream outStream = Files.newOutputStream(Paths.get(htmlPath))){DOMSource domSource = new DOMSource(htmlDocument);StreamResult streamResult = new StreamResult(outStream);TransformerFactory factory = TransformerFactory.newInstance();Transformer serializer = factory.newTransformer();serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");serializer.setOutputProperty(OutputKeys.INDENT, "yes");serializer.setOutputProperty(OutputKeys.METHOD, "html");serializer.transform(domSource, streamResult);}}private org.w3c.dom.Document word2003ToHtmlDocument(String wordPath) throws IOException, ParserConfigurationException {InputStream input = Files.newInputStream(Paths.get(wordPath));HWPFDocument wordDocument = new HWPFDocument(input);WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());wordToHtmlConverter.setPicturesManager((content, pictureType, suggestedName, widthInches, heightInches) -> {System.out.println(pictureType);if (PictureType.UNKNOWN.equals(pictureType)) {return null;}BufferedImage bufferedImage = ImgUtil.toImage(content);String base64Img = ImgUtil.toBase64(bufferedImage, pictureType.getExtension());//  带图片的word，则将图片转为base64编码，保存在一个页面中StringBuilder sb = (new StringBuilder(base64Img.length() + "data:;base64,".length()).append("data:;base64,").append(base64Img));return sb.toString();});// 解析word文档wordToHtmlConverter.processDocument(wordDocument);return wordToHtmlConverter.getDocument();}

转换成html的效果：

1.3 使用spire

public void wordToHtml(String wordPath, String htmlPath) {htmlPath = FileUtil.getNewFileFullPath(wordPath, htmlPath, "html");Document document = new Document();document.loadFromFile(wordPath);document.saveToFile(htmlPath, FileFormat.Html);}

转换成html的效果：

因为使用的是免费版，存在页数和字数限制，需要完整功能的的可以选择付费版本。PS：这回76页的文档居然转成功了前50页。

2、将pdf文件转换成html文件

图片版pdf原文件效果：

文字版pdf原文件效果：

2.1 使用aspose

public static void pdfToHtml(String pdfPath, String htmlPath) throws IOException, ParserConfigurationException {File file = new File(pdfPath);String path = htmlPath + File.separator + file.getName().substring(0, file.getName().lastIndexOf(".")) + ".html";PDDocument document = PDDocument.load(new File(pdfPath));Writer writer = new PrintWriter(path, "UTF-8");new PDFDomTree().writeText(document, writer);writer.close();document.close();}

图片版PDF文件验证结果：

文字版PDF文件验证结果：

2.2 使用poi + pdfbox

public void pdfToHtml(String pdfPath, String htmlPath) throws IOException, ParserConfigurationException {String path = FileUtil.getNewFileFullPath(pdfPath, htmlPath, "html");PDDocument document = PDDocument.load(new File(pdfPath));Writer writer = new PrintWriter(path, "UTF-8");new PDFDomTree().writeText(document, writer);writer.close();document.close();}

图片版PDF文件验证结果：

文字版PDF原文件效果：

2.3 使用spire

public void pdfToHtml(String pdfPath, String htmlPath) throws IOException, ParserConfigurationException {htmlPath = FileUtil.getNewFileFullPath(pdfPath, htmlPath, "html");PdfDocument pdf = new PdfDocument();pdf.loadFromFile(pdfPath);pdf.saveToFile(htmlPath, com.spire.pdf.FileFormat.HTML);}

图片版PDF文件验证结果：
因为使用的是免费版，所以只有前三页是正常的。。。有超过三页需求的可以选择付费版本。

文字版PDF原文件效果：

报错了无法转换。。。

java.lang.NullPointerExceptionat com.spire.pdf.PdfPageWidget.spr┢⅛(Unknown Source)at com.spire.pdf.PdfPageWidget.getSize(Unknown Source)at com.spire.pdf.PdfPageBase.spr†™—(Unknown Source)at com.spire.pdf.PdfPageBase.getActualSize(Unknown Source)at com.spire.pdf.PdfPageBase.getSection(Unknown Source)at com.spire.pdf.general.PdfDestination.spr︻┎—(Unknown Source)at com.spire.pdf.general.PdfDestination.spr┻┑—(Unknown Source)at com.spire.pdf.general.PdfDestination.getElement(Unknown Source)at com.spire.pdf.primitives.PdfDictionary.setProperty(Unknown Source)at com.spire.pdf.bookmarks.PdfBookmark.setDestination(Unknown Source)at com.spire.pdf.bookmarks.PdfBookmarkWidget.spr┭┘—(Unknown Source)at com.spire.pdf.bookmarks.PdfBookmarkWidget.getDestination(Unknown Source)at com.spire.pdf.PdfDocumentBase.spr╻⅝(Unknown Source)at com.spire.pdf.widget.PdfPageCollection.spr┦⅝(Unknown Source)at com.spire.pdf.widget.PdfPageCollection.removeAt(Unknown Source)at com.spire.pdf.PdfDocumentBase.spr┞⅝(Unknown Source)at com.spire.pdf.PdfDocument.loadFromFile(Unknown Source)

3、将excel文件转换成html文件

excel原文件效果：

3.1 使用aspose

public void excelToHtml(String excelPath, String htmlPath) throws Exception {htmlPath = FileUtil.getNewFileFullPath(excelPath, htmlPath, "html");Workbook workbook = new Workbook(excelPath);com.aspose.cells.HtmlSaveOptions options = new com.aspose.cells.HtmlSaveOptions();workbook.save(htmlPath, options);}

转换成html的效果：

3.2 使用poi

public void excelToHtml(String excelPath, String htmlPath) throws Exception {String path = FileUtil.getNewFileFullPath(excelPath, htmlPath, "html");try(FileOutputStream fileOutputStream = new FileOutputStream(path)){String htmlStr = excelToHtmlStr(excelPath);byte[] bytes = htmlStr.getBytes();fileOutputStream.write(bytes);}}public String excelToHtmlStr(String excelPath) throws Exception {FileInputStream fileInputStream = new FileInputStream(excelPath);try (Workbook workbook = WorkbookFactory.create(new File(excelPath))){DataFormatter dataFormatter = new DataFormatter();FormulaEvaluator formulaEvaluator = workbook.getCreationHelper().createFormulaEvaluator();org.apache.poi.ss.usermodel.Sheet sheet = workbook.getSheetAt(0);StringBuilder htmlStringBuilder = new StringBuilder();htmlStringBuilder.append("<html><head><title>Excel to HTML using Java and POI library</title>");htmlStringBuilder.append("<style>table, th, td { border: 1px solid black; }</style>");htmlStringBuilder.append("</head><body><table>");for (Row row : sheet) {htmlStringBuilder.append("<tr>");for (Cell cell : row) {CellType cellType = cell.getCellType();if (cellType == CellType.FORMULA) {formulaEvaluator.evaluateFormulaCell(cell);cellType = cell.getCachedFormulaResultType();}String cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);htmlStringBuilder.append("<td>").append(cellValue).append("</td>");}htmlStringBuilder.append("</tr>");}htmlStringBuilder.append("</table></body></html>");return htmlStringBuilder.toString();}}

转换成html的效果：

3.3 使用spire

public void excelToHtml(String excelPath, String htmlPath) throws Exception {htmlPath = FileUtil.getNewFileFullPath(excelPath, htmlPath, "html");Workbook workbook = new Workbook();workbook.loadFromFile(excelPath);workbook.saveToFile(htmlPath, com.spire.xls.FileFormat.HTML);}

转换成html的效果：

四、总结

从上述的效果展示我们可以发现其实转成html效果不是太理想，很多细节样式没有还原，这其实是因为这类转换往往都是追求目标是通过使用文档中的语义信息并忽略其他细节来生成简单干净的 HTML，所以在转换过程中复杂样式被忽略，比如居中、首行缩进、字体，文本大小，颜色。举个例子在转换是会将应用标题 1 样式的任何段落转换为 h1 元素，而不是尝试完全复制标题的样式。所以转成html的显示效果往往和原文档不太一样。这意味着对于较复杂的文档而言，这种转换不太可能是完美的。但如果都是只使用简单样式文档或者对文档样式不太关心的这种方式也不妨一试。

PS：如果想要展示效果好的话，其实可以将上篇文章《文档在线预览（一）通过将txt、word、pdf转成图片实现在线预览功能》说的内容和本文结合起来使用，即将文档里的内容都生成成图片（很可能是多张图片），然后将生成的图片全都放到一个html页面里，用html+css来保持样式并实现多张图片展示，再将html返回。开源组件kkfilevie就是用的就是这种做法。

kkfileview展示效果如下：

下图是kkfileview返回的html代码，从html代码我们可以看到kkfileview其实是将文件（txt文件除外）每页的内容都转成了图片，然后将这些图片都嵌入到一个html里，再返回给用户一个html页面。

文档在线预览（二）word、pdf、excel文件转html以实现文档在线预览相关推荐

vue实战--vue+elementUI实现多文件上传+预览（word/PDF/图片/docx/doc/xlxs/txt）
需求最近在做vue2.0+element UI的项目中遇到了一个需求:需求是多个文件上传的同时实现文件的在线预览功能.需求图如下: 看到这个需求的时候,小栗脑袋一炸.并不知道该如何下手,之前的实践项 ...
VSTO 得到Office文档的选中内容（Word、Excel、PPT、Outlook）
VSTO 得到Office文档的选中内容(Word.Excel.PPT.Outlook) 原文:VSTO 得到Office文档的选中内容(Word.Excel.PPT.Outlook) 目的:得到在W ...
java实现word,pdf,excel,图片添加水印
gitee项目地址:https://gitee.com/betelnutandwine/meutilswatermark: java实现pdf,word,excel,ppt,图片加水印 jar地址:s ...
如何保存PDF、Word和Excel文件到数据库中
在项目中,有时候我们很需要把PDF.Word和Excel文档等等上传到数据库,以便日后使用.今天这篇文章向大家讲解如何将这些文件保存到数据库的. 详细步骤第一步:打开数据库,单击新建查询,创建一个名 ...
给pdf、word、excel文件添加水印
今天公司让给pdf.word.excel文件添加水印, 在网上导了一堆,最后总结了一下方法. /** * @author CArom_XUE 2021-03-18 * * TODO 生成文 ...
如何将PDF如何存入MySQL_如何保存PDF、Word和Excel文件到数据库中
在项目中,有时候我们很需要把PDF.Word和Excel文档等等上传到数据库,以便日后使用.今天这篇文章向大家讲解如何将这些文件保存到数据库的. 详细步骤第一步:打开数据库,单击新建查询,创建一个名 ...
Vue下载本地pdf、word、excel文件
Vue下载本地pdf.word.excel文件项目需求具体实现注意项目需求在项目中需要对pdf.word.excel等文档的下载也就是获取文件的静态路径,下载到本地. 方案 :利用 axi ...
poi 顺序解析word_利用POI读取word、Excel文件的最佳实践教程
前言 POI是 Apache 旗下一款读写微软家文档声名显赫的类库.应该很多人在做报表的导出,或者创建 word 文档以及读取之类的都是用过 POI.POI 也的确对于这些操作带来很大的便利性.我最近 ...
ipad里excel文件计算机,ipad怎么看excel和Word?ipad查看Word和Excel文件
小编刚入手ipad4,想将Word和Excel文件放到ipad中,上班坐车时可以随时查看,刚开始还不知道怎么将文件导入到ipad中并可以查看,折腾了好久总算是可以查看了,相信很多人在购买iPad时会犹 ...
VC操作word和excel文件，查询与读写[依赖office环境]
2016-07-26 10:15AM 田海涛>>记录>>合肥工业大学由于孟师兄的碗扣式支架建模系统需求,须在程序中加入office相关处理,具体为读取excel文件与生成wo ...

文档在线预览（二）word、pdf、excel文件转html以实现文档在线预览

文章目录

一、前言

1、aspose

2 、poi + pdfbox

3 spire

二、将文件转换成html字符串

1、将word文件转成html字符串

1.1 使用aspose

1.2 使用poi

1.3 使用spire

2、将pdf文件转成html字符串

2.1 使用aspose

2.2 使用 poi + pbfbox

2.3 使用spire

3、将excel文件转成html字符串

3.1 使用aspose

3.2 使用poi + pdfbox

3.3 使用spire

三、将文件转换成html，并生成html文件

FileUtils类将html字符串生成html文件示例：

1、将word文件转换成html文件

1.1 使用aspose

1.2 使用poi + pdfbox

1.3 使用spire

2、将pdf文件转换成html文件

2.1 使用aspose

2.2 使用poi + pdfbox

2.3 使用spire

3、将excel文件转换成html文件

3.1 使用aspose

3.2 使用poi

3.3 使用spire

四、总结

文档在线预览（二）word、pdf、excel文件转html以实现文档在线预览相关推荐

最新文章

热门文章