python-将图片存储到pdf中

需求

有一个项目需求是需要把多张图片存储到pdf上，图片下面还要有信息描述，然后可以下载到手机查看。

问题

之前我查了网上很多方法，图片是存储到那个pdf变得相当大，才十张图片不到就已经70M。我尝试过图片压缩再存储到pdf上，并没卵用。
说到这就得提一下，python把图片存储到pdf上，与其说是存储不如说是描绘，所以说和图片大小并没有太大关系。
就当我以为没办法的时候，我仔细看了一下drawImage这个方法的注释。（在下面的代码中有：# 添加图片下面）。这个方法最后一段是重点。

原注释：

“”"
Unlike drawInlineImage, this creates ‘external images’ which
are only stored once in the PDF file but can be drawn many times.
If you give it the same filename twice, even at different locations
and sizes, it will reuse the first occurrence, resulting in a saving
in file size and generation time. If you use ImageReader objects,
it tests whether the image content has changed before deciding
whether to reuse it.
In general you should use drawImage in preference to drawInlineImage
unless you have read the PDF Spec and understand the tradeoffs.

“”"

翻译：

与drawInlineImage不同，它创建的“外部图像”
在PDF文件中仅存储一次，但可以多次绘制。
如果给它两次相同的文件名，即使是在不同的位置
和大小，它将重用第一个引用，从而实现保存
在文件大小和生成时间方面。如果使用ImageReader对象，
它测试图像内容是否已更改，然后再决定
是否重复使用它。

通常，您应该优先使用drawImage而不是drawInlineImage
除非您已经阅读了PDF规范并理解了权衡。

个人理解

调用drawImage方法，会在pdf上创建“外部图像”。ImageReader(image_file)里面的image_file是图片路径，要是图片路径不一样，drawImage方法就会创建新的“外部图像”。
所以我循环文件夹，会把新的图片重命名为1.jpg(什么名字都行)，覆盖前一张图片。要是你想保留图片就用
shutil.copy(image_file, new_path)，意思就是拷贝不是移动。

代码

from reportlab.pdfgen import canvas
from PIL import Image
from reportlab.lib.pagesizes import A4
import PIL.Image, PIL.ExifTags
from reportlab.lib.utils import ImageReader
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont# 生成pdf
class PicturePDF(Resource):   # 注册中文字体包def registerFont(self):fullPath = os.path.join(ROOT_PATH, "etc", "文泉驿正黑.ttf")pdfmetrics.registerFont(TTFont('song', fullPath))# 保存的图片转换为pdfdef imgtopdf(self, pdf_path):self.registerFont()# 保存为pdf的路径imgDoc = canvas.Canvas(pdf_path)# A4是格式imgDoc.setPageSize(A4)document_width, document_height = A4i = 0for image_name in os.listdir(self.pci_path):image_file = os.path.join(self.pci_path, image_name)new_path = os.path.join(self.pci_path, "1.jpg")print(i)shutil.move(image_file, new_path)# 尝试识别图片，失败则跳过继续try:image_file = PIL.Image.open(new_path)except Exception as e:print(f'{image_file = }')logger.ERROR(f'PicturePDF, image_name, {e}')continue# 图片的尺寸大小image_width, image_height = image_file.size# 处理图片在pdf保存的比例image_aspect = image_height / float(image_width)print_width = document_widthprint_height = document_width * image_aspectif i % 2 == 0:d_h = document_height - print_heightif i % 2 == 1:d_h = d_h - print_height - 30# 添加图片imgDoc.drawImage(ImageReader(new_path), document_width - print_width,d_h, width=print_width,height=print_height, preserveAspectRatio=True)# 注册中文字体imgDoc.setFont('song', 16)# 添加信息imgDoc.drawString(document_width - print_width, d_h - 15, text=image_name.encode('utf-8'))# 一页保存两张图片，两页之后另起新的一页if i % 2 == 1:imgDoc.showPage()i += 1image_file.close()imgDoc.save()

重点

这个博客的重点就是告诉大家，图片描绘到pdf中，pdf依旧可以很小，关键是图片名字一样，内容可以不一样。
还有就是添加中文信息需要注册中文字体，下载一个ttf包到本地，根据对应路径注册。