java中word转pdf/word转图片/word转html/html转word等操作

word格式文件与pdf、html、图片等之间的相互转换

先附上demo代码地址：

Github - document-demoinit. Contribute to git-wuxianglong/document-demo development by creating an account on GitHub.https://github.com/git-wuxianglong/document-demo

简介

Aspose.Words 是提供了专门针对文档处理的 API 工具，用于以 Word、OpenDocument、Markdown、HTML、PDF 等流行的文件格式的创建、读取、编辑、打印和保存。

Aspose.Words 除了可以转换这些流行的文档格式外，Aspose.Words 还支持使用文档对象模型 (DOM) 对任何文档元素进行渲染、打印、报告、邮件合并选项和高级格式化。

支持的文件格式

DOC, DOCX, DOT, DOTX, DOCM, DOTM, Word 6.0 or Word 95
XML, WordML, XAML, Flat OPC, Flat OPC Macro-Enabled, Flat OPC Template, Flat OPC Macro-Enabled Template
HTML, MHTML, MD
PDF
EPUB, MOBI, CHM, AZW3
SVG, TIFF, PNG, BMP, JPEG, GIF, EMF
XPS, OpenXPS
TXT
RTF
ODT, OTT
PS
PCL

使用

jar 包引入

下载 jar 包，在 src 同级目录下新建 lib 文件夹，将下载的 jar 包拷贝进去
在 pom 文件中引入刚下载的 jar 包

<dependency><groupId>com.aspose</groupId><artifactId>aspose-words</artifactId><version>19.1</version><scope>system</scope><systemPath>${project.basedir}/lib/aspose-words-19.1.jar</systemPath>
</dependency>

使用代码

转换操作工具类代码

import com.aspose.words.*;
import com.google.common.collect.ImmutableMap;
import lombok.extern.slf4j.Slf4j;

import javax.imageio.ImageIO;
import javax.imageio.stream.ImageInputStream;
import java.awt.image.BufferedImage;
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/*** aspose words 操作工具类** @author wuxianglong*/
@Slf4j
public class WordUtils {private static final String OS_NAME_STR = "os.name";private static final String WINDOWS_STR = "windows";private static final String FORM_TEXT = "FORMTEXT";
/*** linux系统下pdf操作需要指定字体库* Centos8 字体库文件目录*/private static final String LINUX_FONTS_PATH = "/usr/share/fonts";
public static void main(String[] args) throws Exception {checkLicense();String inPath = "C:\\Users\\username\\Desktop\\test.docx";String outPath = "C:\\Users\\username\\Desktop\\test.html";docToPdf(inPath, outPath);}
/*** word转html** @param inPath  输入文件路径* @param outPath 输出文件路径* @throws Exception 操作异常*/public static void docToHtml(String inPath, String outPath) throws Exception {long start = System.currentTimeMillis();Document doc = new Document(inPath);HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.HTML);opts.setHtmlVersion(HtmlVersion.XHTML);opts.setExportImagesAsBase64(true);opts.setExportPageMargins(true);opts.setExportXhtmlTransitional(true);opts.setExportDocumentProperties(true);doc.save(outPath, opts);log.info("WORD转HTML成功，耗时：{}", System.currentTimeMillis() - start);}
/*** word转pdf** @param inPath  输入文件路径* @param outPath 输出文件路径* @throws Exception 操作异常*/public static void docToPdf(String inPath, String outPath) throws Exception {long start = System.currentTimeMillis();log.info("WORD转PDF保存路径:{}", outPath);FileOutputStream os = getFileOutputStream(outPath);Document doc = new Document(inPath);doc.save(os, SaveFormat.PDF);os.close();log.info("WORD转PDF成功，耗时：{}", System.currentTimeMillis() - start);}
/*** word转pdf** @param inputStream 文件输入流* @param outPath     输出文件路径* @throws Exception 操作异常*/public static void docToPdf(InputStream inputStream, String outPath) throws Exception {long start = System.currentTimeMillis();FileOutputStream os = getFileOutputStream(outPath);Document doc = new Document(inputStream);doc.save(os, SaveFormat.PDF);os.close();log.info("WORD转PDF成功，耗时：{}", System.currentTimeMillis() - start);}
/*** word转换为图片，每页一张图片** @param inPath word文件路径* @throws Exception 操作异常*/public static void docToImage(String inPath) throws Exception {long start = System.currentTimeMillis();log.info("根据WORD页数转换多张图片");InputStream inputStream = Files.newInputStream(Paths.get(inPath));File file = new File(inPath);String name = file.getName();String fileName = name.substring(0, name.lastIndexOf("."));// 文件父级路径String parent = file.getParent();log.info("parent:{}", parent);// 创建目录boolean mkdir = new File(parent + "/" + fileName).mkdir();log.info("mkdir:{}", mkdir);List<BufferedImage> bufferedImages = wordToImg(inputStream);for (int i = 0; i < bufferedImages.size(); i++) {// 写入文件ImageIO.write(bufferedImages.get(i), "png", new File(parent + "/" + fileName + "/" + "第" + i + "页" + fileName + ".png"));}inputStream.close();log.info("WORD转图片成功，耗时：{}", System.currentTimeMillis() - start);}
/*** word转换为图片，合并为一张图片** @param inPath word文件路径* @throws Exception 操作异常*/public static void docToOneImage(String inPath) throws Exception {long start = System.currentTimeMillis();log.info("WORD转换为一张图片");InputStream inputStream = Files.newInputStream(Paths.get(inPath));File file = new File(inPath);String name = file.getName();String fileName = name.substring(0, name.lastIndexOf("."));String parent = file.getParent();List<BufferedImage> bufferedImages = wordToImg(inputStream);// 合并为一张图片BufferedImage image = MergeImage.mergeImage(false, bufferedImages);ImageIO.write(image, "png", new File(parent + "/" + fileName + ".png"));inputStream.close();log.info("WORD转图片成功，耗时：{}", System.currentTimeMillis() - start);}
/*** html转word** @param inPath  输入文件路径* @param outPath 输出文件路径* @throws Exception 操作异常*/public static void htmlToWord(String inPath, String outPath) throws Exception {Document wordDoc = new Document(inPath);DocumentBuilder builder = new DocumentBuilder(wordDoc);for (Field field : wordDoc.getRange().getFields()) {if (field.getFieldCode().contains(FORM_TEXT)) {// 去除掉文字型窗体域builder.moveToField(field, true);builder.write(field.getResult());field.remove();}}wordDoc.save(outPath, SaveFormat.DOCX);}
/*** html转word，并替换指定字段内容** @param inPath  输入文件路径* @param outPath 输出文件路径* @throws Exception 操作异常*/public static void htmlToWordAndReplaceField(String inPath, String outPath) throws Exception {Document wordDoc = new Document(inPath);Range range = wordDoc.getRange();// 把张三替换成李四，把20替换成40ImmutableMap<String, String> map = ImmutableMap.of("张三", "李四", "20", "40");for (Map.Entry<String, String> str : map.entrySet()) {range.replace(str.getKey(), str.getValue(), new FindReplaceOptions());}wordDoc.save(outPath, SaveFormat.DOCX);}
/*** word转pdf，linux下设置字体库文件路径，并返回FileOutputStream** @param outPath pdf输出路径* @return pdf输出路径 -> FileOutputStream* @throws FileNotFoundException FileNotFoundException*/private static FileOutputStream getFileOutputStream(String outPath) throws FileNotFoundException {if (!System.getProperty(OS_NAME_STR).toLowerCase().startsWith(WINDOWS_STR)) {// linux 需要配置字体库log.info("【WordUtils -> docToPdf】linux字体库文件路径:{}", LINUX_FONTS_PATH);FontSettings.getDefaultInstance().setFontsFolder(LINUX_FONTS_PATH, false);}return new FileOutputStream(outPath);}
/*** word转图片** @param inputStream word input stream* @return BufferedImage list* @throws Exception exception*/private static List<BufferedImage> wordToImg(InputStream inputStream) throws Exception {Document doc = new Document(inputStream);ImageSaveOptions options = new ImageSaveOptions(SaveFormat.PNG);options.setPrettyFormat(true);options.setUseAntiAliasing(true);options.setUseHighQualityRendering(true);int pageCount = doc.getPageCount();List<BufferedImage> imageList = new ArrayList<>();for (int i = 0; i < pageCount; i++) {OutputStream output = new ByteArrayOutputStream();options.setPageIndex(i);doc.save(output, options);ImageInputStream imageInputStream = ImageIO.createImageInputStream(parse(output));imageList.add(ImageIO.read(imageInputStream));}return imageList;}
/*** outputStream转inputStream** @param out OutputStream* @return inputStream*/private static ByteArrayInputStream parse(OutputStream out) {return new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray());}
/*** 校验许可文件*/private static void checkLicense() {try {InputStream is = com.aspose.words.Document.class.getResourceAsStream("/com.aspose.words.lic_2999.xml");if (is == null) {return;}License asposeLicense = new License();asposeLicense.setLicense(is);is.close();} catch (Exception e) {e.printStackTrace();}}

}

图片合并工具类

import java.awt.image.BufferedImage;
import java.util.List;

/*** 图片合并工具** @author wuxianglong*/
public class MergeImage {
/*** 合并任数量的图片成一张图片** @param isHorizontal true代表水平合并，false代表垂直合并* @param images       待合并的图片数组* @return BufferedImage*/public static BufferedImage mergeImage(boolean isHorizontal, List<BufferedImage> images) {// 生成新图片BufferedImage destImage;// 计算新图片的长和高int allWidth = 0, allHeight = 0, allWidthMax = 0, allHeightMax = 0;// 获取总长、总宽、最长、最宽for (int i = 0; i < images.size(); i++) {BufferedImage img = images.get(i);allWidth += img.getWidth();if (images.size() != i + 1) {allHeight += img.getHeight() + 2;} else {allHeight += img.getHeight();}if (img.getWidth() > allWidthMax) {allWidthMax = img.getWidth();}if (img.getHeight() > allHeightMax) {allHeightMax = img.getHeight();}}// 创建新图片if (isHorizontal) {destImage = new BufferedImage(allWidth, allHeightMax, BufferedImage.TYPE_INT_RGB);} else {destImage = new BufferedImage(allWidthMax, allHeight, BufferedImage.TYPE_INT_RGB);}// 合并所有子图片到新图片int wx = 0, wy = 0;for (BufferedImage img : images) {int w1 = img.getWidth();int h1 = img.getHeight();// 从图片中读取RGBint[] imageArrayOne = new int[w1 * h1];// 逐行扫描图像中各个像素的RGB到数组中imageArrayOne = img.getRGB(0, 0, w1, h1, imageArrayOne, 0, w1);if (isHorizontal) {// 水平方向合并// 设置上半部分或左半部分的RGBdestImage.setRGB(wx, 0, w1, h1, imageArrayOne, 0, w1);} else {// 垂直方向合并// 设置上半部分或左半部分的RGBdestImage.setRGB(0, wy, w1, h1, imageArrayOne, 0, w1);}wx += w1;wy += h1 + 2;}return destImage;}

}

官方文档

Aspose-words 官方文档

java中word转pdf/word转图片/word转html/html转word等操作相关推荐

解决word转PDF文件时图片位置改变和字体格式改变的问题
解决word转PDF文件时图片位置改变和字体格式改变的问题在写论文的时候,使用word编辑完文本后将其导出为PDF总是出现图片位置改变或者字体格式改变的问题,好不容易在word中编辑好的格式却不能在 ...
word转pdf时，图片错位，乱跑
参考:word转pdf时,图片错位问题: 在word通过另存为转换pdf时,经常出现图片错误的现象,如何解决呢? 方法: 点word-文件-打印-adobe pdf-另存为-桌面,通过这方法打印出的 ...
解决java中使用getImage（）导入图片失败问题
解决java中使用getImage()导入图片失败问题在使用getImage(fileName)方法导入图片时,一直失败.后来加入相对路径解决找不到图片问题. 代码如下 image = Toolki ...
Java中world、PDF、Excel转图片
相对来说world.pdf转图片还是比较简单的,world.pdf转html坑是最多的.不过我们这篇文章只写world.pdf转图片,后者我将会用另一篇文章就行讲述. 原理: world.Excel转 ...
教你用8行代码将word转换为pdf格式及 6行代码实现批量将word转换为pdf格式--python实用小技能get起来
目录将word转换为pdf格式安装pywin32 上代码运行结果批量实现word转pdf 安装docx2pdf 上代码运行结果将word转换为pdf格式安装pywin32 代码(Anac ...
java调用pdfbox转pdf文件为图片文件，有中文时在windows上正常，部署到linux下乱码
最近遇到了一个很头疼的问题,我们的项目移动端需要一个读pdf的功能,但是手机端开发人员说移动端是用jquerymobile做的,不能直接甩pdf文件过去(原因是pdf文件太大),于是乎我们想到了2个解 ...
html内容导出word和pdf（带图片）
富文本内容导出为 word 安装依赖 npm install file-saver --save npm install html-docx-js --save 引入依赖 import htmlDoc ...
java中pdfbox处理pdf常用方法(读取、写入、合并、拆分、写文字、写图片)
本篇文档将介绍pdfbox处理pdf常用方法(读取.写入.合并.拆分.写文字.写图片). 图中为pdfbox用到的包 1.读取pdf 方法代码: /*** 读取pdf中文字信息(全部)* @param ...
去除word转pdf时的图片黑边
先说答案:将图片格式转为jpg,而非png,应该就可以解决该问题. 这是个遗留已久的古老问题,主要背锅侠应该是Adobe,或许还有巨硬.根据一个18年前的古老帖子的博主所言,在word2003-ado ...
java将数据转为pdf并添加图片、文字水印（表格样式）超简易模式
pdf预览.导出.文字.图片水印(表格样式) 用到的maven依赖  <dependency><groupId>com.itextpdf&l ...