pdf转html？pdf转图片

项目有需求，需要在微信直接浏览pdf文件。ios是可以的，安卓微信端就需要先下载，再用其他工具打开pdf，比如QQ浏览器（坑）。

有需求，就要想解决办法。原来的方法是pdf转html，先前提供的思路（后面发现带到坑里去了）。然后就万事找度娘，马上就找到方法了。就是使用pdfdom转换

引用包

 implementation 'net.sf.cssbox:pdf2dom:1.9'implementation 'org.apache.pdfbox:pdfbox:2.0.19'implementation 'org.apache.pdfbox:pdfbox-tools:2.0.19'

工具类（找的太多了，就不写引用地址了）

import org.apache.pdfbox.pdmodel.PDDocument;
import org.fit.pdfdom.PDFDomTree;import java.io.*;/*** @description:pdf转html* @author:* @create: 2020/05/15**/
public class PDFToHTMLUtils {/*pdf转换html*/public static String pdfToHtml(String filePath)  {String outputPath = filePath.substring(0,filePath.lastIndexOf(".")+1)+"html";byte[] bytes = getBytes(filePath);
//        try() 写在()里面会自动关闭流try (BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(outputPath)),"UTF-8"));){//加载PDF文档PDDocument document = PDDocument.load(bytes);PDFDomTree pdfDomTree = new PDFDomTree();pdfDomTree.writeText(document,out);} catch (Exception e) {e.printStackTrace();}return outputPath;}/*将文件转换为byte数组*/private static byte[] getBytes(String filePath){byte[] buffer = null;try {File file = new File(filePath);FileInputStream fis = new FileInputStream(file);ByteArrayOutputStream bos = new ByteArrayOutputStream();byte[] b = new byte[4*1024];int n;while ((n = fis.read(b)) != -1) {bos.write(b, 0, n);}fis.close();buffer = bos.toByteArray();bos.close();} catch (FileNotFoundException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();}return buffer;}
}

这块代码丝毫没有问题，唯一的缺点就是效率了。1MB的pdf转换需要30秒左右，体积大了9倍，10MB的需要一分钟以上，体积大了五倍。而且会有格式错乱问题。
解决思路是按页解析，这块没找到代码，不知道是不支持还是啥。看源码就算了，懒。还有一个利用jacob.jar这个包完成按页解析的，只是这个jar包支持不太好，还要改环境（以前解析Excel研究过，没成功，看了一会代码，直接pass）

后来看见第三方服务转换的结果，有了想法。转html不行，可以转图片啊。就度娘了一下，果然方法超级多。

一开始用的是 PDFRenderer，说是效率高，就是对中文字符不太友好，需要系统安装字库（没研究）。自己试了，表格文字大把缺失，pass

import javax.imageio.ImageIO;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FilenameFilter;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.Arrays;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;/*** @description:转换* @author: * @create: 2020/05/21**/
public class PDFToImage {private final int maxPage = 30;public Map change(String PDFPath) {//按照文件路径读取PDF文档，并将其按页转换为图片if (PDFPath == null || "".equals(PDFPath)) {}PDFFile pdfFile = this.getPdfFile(PDFPath);String path = PDFPath.substring(0, PDFPath.lastIndexOf("/")+1);String imageFile = PDFPath.substring(PDFPath.lastIndexOf("/") + 1, PDFPath.lastIndexOf("."));Map map = this.pdf2Images(pdfFile, path, imageFile);return map;}/*** PDF文档读取.** @param filePath -- 待读取PDF文件的路径.* @return null 或者 PDFFile instance.*/private PDFFile getPdfFile(String filePath) {try {//load a pdf file from byte buffer.File file = new File(filePath);RandomAccessFile raf = new RandomAccessFile(file, "r");FileChannel channel = raf.getChannel();ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0,channel.size());PDFFile pdfFile = new PDFFile(buf);return pdfFile;} catch (Exception ex) {ex.printStackTrace();}return null;}/*** PDF文档按页转换为图片.** @param pdfFile       -- PDFFile instance* @param imageSavePath -- 图片保存路径.* @param fileName      -- 保存图片文件夹名称.*/private Map pdf2Images(PDFFile pdfFile, String imageSavePath, String fileName) {//        if(pdfFile == null ) { //待转换文档不存在，返回false.
//            return false;
//        }//将转换后图片存放于path路径下String path = imageSavePath + fileName+"/";File filePath = new File(path);if (!filePath.exists()) { //判断以文件名命名的文件夹是否存在.filePath.mkdirs();}//取得当前文件夹下的所有jpg格式的文件名.String[] imageNames = filePath.list(new ImageFilter());if (imageNames.length == 0) { //当前文件夹下没有文件.//将pdf文档按页转为图片.String imagePath = "";try {//对转换页数进行限制,最多只转换前maxPage页.int pages = pdfFile.getNumPages();if (pages > maxPage) {pages = maxPage;}for (int i = 1; i <= pages; i++) {// draw the page to an imagePDFPage page = pdfFile.getPage(i);// get the width and height for the doc at the default zoomRectangle rect = new Rectangle(0,0,(int) page.getBBox().getWidth(),(int) page.getBBox().getHeight());// generate the imageImage img = page.getImage(rect.width, rect.height, // width & heightrect, // clip rectnull, // null for the ImageObservertrue, // fill background with whitetrue // block until drawing is done);BufferedImage tag = new BufferedImage(rect.width,rect.height,BufferedImage.TYPE_INT_RGB);tag.getGraphics().drawImage(img,0,0,rect.width,rect.height,null);imagePath = path + i + ".jpg";FileOutputStream out = new FileOutputStream(imagePath); // 输出到文件流.ImageIO.write(tag, "jpg", out);out.close();}} catch (Exception ex) {ex.printStackTrace();}}//取得当前文件夹下的所有jpg格式的文件名.imageNames = filePath.list(new ImageFilter());//对文件名排序.Arrays.sort(imageNames, new FileNameComparator());Map<String, Object> map = new HashMap<>();
//        servletRequest.setAttribute("state", "s");
//        servletRequest.setAttribute("fileName", fileName);
//        servletRequest.setAttribute("imageNames", imageNames);map.put("fileName", fileName);map.put("imageNames", imageNames);return map;}class FileNameComparator implements Comparator {public final int compare(Object first, Object second) {String[] fir = ((String) first).split("\\.");String[] sec = ((String) second).split("\\.");int firstPage = Integer.parseInt(fir[0]);int secondPage = Integer.parseInt(sec[0]);int diff = firstPage - secondPage;if (diff > 0)return 1;if (diff < 0)return -1;elsereturn 0;}}//图片jpg过滤器类class ImageFilter implements FilenameFilter {public boolean isImageFile(String fileName) {if (fileName.toLowerCase().endsWith("jpg")) {return true;} else {return false;}}public boolean accept(File dir, String name) {return isImageFile(name);}}
}

PDFBox，这是目前我最满意的方案，转换速度，大小，清晰度都很满意。（如果有更好的方法，一起交流）

引入的jar包，自己随便建了一个java文件，总是找不到类文件，出错是在没有引入commons-loggin这个jar包，谨记。（如果使用框架，应该不会因为这个出现bug，脑阔疼）

import org.apache.pdfbox.io.MemoryUsageSetting;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.*;/*** @description:转换2* @author: * @create: 2020/05/21**/
public class PdfBox {public void toImage() {try {File file = new File("F:/**.pdf");PDDocument doc = PDDocument.load(file, MemoryUsageSetting.setupTempFileOnly());
//            int pageCount = doc.getPageCount();int pageCount = doc.getNumberOfPages();System.out.println(pageCount);PDFRenderer pdfRenderer = new PDFRenderer(doc);BufferedOutputStream outputStream = null;String imgPath;float dpi=90;for (int i = 0; i < 1; i++) {imgPath = "F:/**/" + i + ".png";outputStream = new BufferedOutputStream(new FileOutputStream(imgPath));BufferedImage image = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);ImageIO.write(image, "png", outputStream);outputStream.close();}doc.close();System.out.println("over");} catch (FileNotFoundException e) {// TODO Auto-generated catch blocke.printStackTrace();} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}}
}

至于前台接收，目前我就只想到图片的网络地址和base64，还有其他方法，欢迎留言。
至于另一边，决定采用现成，比较成熟的方案，不用自己受苦受累的再完善这个东西了（我不会说我懒。。。）

代码copy来，copy去，我都忘了从哪来的了，不要在意这些细节。

完结撒花O(∩_∩)O哈哈~。

pdf转html？pdf转图片相关推荐

java读取pdf_Java 读取PDF中的文本和图片的方法
本文将介绍通过Java程序来读取PDF文档中的文本和图片的方法.分别调用方法extractText()和extractImages()来读取. 使用工具:Free Spire.PDF for Java ...
asp.net pdf如何转换成tif_如何将pdf转换成高清图片？你需要这个软件！|电脑|pdf|转换器...
如今在办公中很多人都喜欢将图片保存为PDF格式,因为PDF格式更便捷.但如果想在PDF文件中选取其中一些图片出来,那么应该如何操作呢?小编在这里给大家分享两个PDF转图片的方法,轻松帮助你解决格式转换 ...
java imageio删除图片_Java 提取、替换、删除PDF文档中的图片
在一篇文章里,配有与文本信息相得益彰的图片,不仅能够活跃与美化版面,同时也有利于提高文章的可读性和阅读效果,从而增强其吸引力.同时,对文档中已存在图片的处理也尤为重要.本文将通过使用Java程序来演示 ...
【itext学习之路】--4.给pdf增加文本水印和图片水印
来源:[itext学习之路]-------(第四篇)给pdf增加文本水印和图片水印_tomatocc的博客-CSDN博客_itext添加水印一般而言,许多公司在做pdf之后,都会将公司的logo或者 ...
foxit phantom pdf 7.3_Jpeg to Pdf Converter 3000批量将图片转为PDF的方法
Jpeg to Pdf Converter 3000是一款非常优秀的图片转PDF软件,该软件界面清爽美观,用户使用该软件,可以快速的将JPG图片转换为PDF文件,而且转换的质量非常高.我们在日常的办公 ...
python实现pdf解密和pdf转图片
python实现pdf解密和pdf转图片 pdf解密 pdf转图片 pdf解密安装PyPDF2 pip install PyPDF2 解密代码如下 # coding:utf-8 from PyPDF ...
c++ byte转cbitmap_关于 C++ 打印 PDF 打印及 PDF 转图片、合并
关于 C++ 打印 PDF 打印及 PDF 转图片.合并原文: http://www.aqcoder.com/post/42 pdf(Portable Document Format 的简称,意为& ...
java给文件添加水印_Java在PDF中添加水印（文本/图片水印）
水印是一种十分常用的防伪手段,常用于各种文档.资料等.常见的水印,包括文字类型的水印.图片或logo类型的水印.以下Java示例,将分别使用insertTextWatermark(PdfPageBas ...
pdf文件怎么转换成图片？
pdf文件怎么转换成图片?今天我要给大家分享一个日常经常用到的办公小技巧,就是pdf文件怎么转换成图片,上个星期我刚好有这么一个需求,老板要求我把一份有着几十页的pdf文件转换成图片给他,所以我也真的 ...
java解析pdf 图片文字_Java 读取PDF中的文本和图片
本文将介绍通过Java程序来读取PDF文档中的文本和图片的方法.分别调用方法extractText()和extractImages()来读取. 使用工具:Free Spire.PDF for Java ...

pdf转html？pdf转图片

pdf转html？pdf转图片相关推荐

最新文章

热门文章