python合并多个pdf_python合并多个pdf文件

假设您有个无聊的工作，将几十个PDF文档合并成一个PDF文件。他们每个都有封面页作为第一页，但你不希望在最终结果中重复覆盖表。即使有有很多免费的程序来组合PDF，其中许多只是合并整个文件在一起。让我们编写一个Python程序来自定义哪些页面你想要的是组合PDF。从高层次来看，这是程序将要做的事情：

查找当前工作目录中的所有PDF文件。

对文件名进行排序，以便按顺序添加PDF。

将每个PDF的每个页面(不包括第一页)写入输出文件。

在实现方面，您的代码需要执行以下操作：

调用 os.listdir() 来查找工作目录中的所有文件，删除所有非PDF文件。

调用Python的sort()列表方法来按字母顺序排列文件名。

为输出PDF创建PdfFileWriter对象。

遍历每个PDF文件，为其创建PdfFileReader对象。

在每个PDF文件中循环遍历每个页面(第一页除外)。

将页面添加到输出PDF。

将输出PDF写入名为allminutes.pdf的文件。

对于此项目，请打开一个新的文件编辑器窗口并将其另存为 “combinePdfs.py”

Step 1:找到所有的PDF文件

首先，您的程序需要获取所有扩展名为.pdf的文件的列表

当前的工作目录并对它们进行排序。让你的代码看起来像

以下：

在这里插入代码片

在shebang线和关于什么的描述性评论之后程序没有，这段代码导入了os和PyPDF2模块。该

os.listdir(’.’) 调用将返回当前工作中的每个文件的列表目录。代码循环遍历此列表，并仅添加带有.pdf扩展的那些文件pdfFiles。之后，此列表按字母顺序排序，使用key = str.lower关键字参数对sort() 进行排序。创建PdfFileWriter对象以保存组合的PDF页面。最后，一些评论概述了该计划的其余部分。

#! /usr/bin/python3

# combinePdfs.py - Combines all the PDFs in the current working directory into

# a single PDF.

import PyPDF2, os

# Get all the PDF filenames.

pdfFiles = []

for filename in os.listdir('.'):

if filename.endswith('.pdf'):

pdfFiles.append(filename)

pdfFiles.sort(key = str.lower)

pdfWriter = PyPDF2.PdfFileWriter()

# TODO: Loop through all the PDF files.

# TODO: Loop through all the pages (except the first) and add them.

# TODO: Save the resulting PDF to a file.

第二步：打开每一个 PDF 文件

现在程序必须读取pdfFiles中的每个PDF文件。添加以下内容：

#! /usr/bin/python3

# combinePdfs.py - Combines all the PDFs in the current working directory into

# a single PDF.

import PyPDF2, os

# Get all the PDF filenames.

pdfFiles = []

for filename in os.listdir('.'):

if filename.endswith('.pdf'):

pdfFiles.append(filename)

pdfFiles.sort(key = str.lower)

pdfWriter = PyPDF2.PdfFileWriter()

# Loop through all the PDF files.

for filename in pdfFiles:

pdfFileObj = open(filename, 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

# TODO: Loop through all the pages (except the first) and add them.

# TODO: Save the resulting PDF to a file.

对于每个PDF，循环通过以读二进制模式(以’rb’作为第二个参数)调用open() 。 open()调用返回一个File对象，它被传递给PyPDF2.PdfFileReader() 。

第三步: 添加每一页

对于每个PDF，您都希望遍历除第一个页面之外的每个页面。加上这个代码到你的程序：

#! /usr/bin/python3

# combinePdfs.py - Combines all the PDFs in the current working directory into

# a single PDF.

import PyPDF2, os

# Get all the PDF filenames.

pdfFiles = []

for filename in os.listdir('.'):

if filename.endswith('.pdf'):

pdfFiles.append(filename)

pdfFiles.sort(key = str.lower)

pdfWriter = PyPDF2.PdfFileWriter()

# Loop through all the PDF files.

for filename in pdfFiles:

pdfFileObj = open(filename, 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

# Loop through all the pages (except the first) and add them.

for pageNum in range(1, pdfReader.numPages):

pageObj = pdfReader.getPage(pageNum)

pdfWriter.addPage(pageObj)

# TODO: Save the resulting PDF to a file.

for循环中的代码将每个Page对象分别复制到PdfFileWriter对象。请记住，您想跳过第一页。以来

PyPDF2认为0是第一页，你的循环应该从1 开始，然后转到但不包括pdfReader.numPages中的整数。

第四步: 保存结果

在这些嵌套的for循环完成循环之后，pdfWriter变量将会循环包含PdfFileWriter对象，其中包含所有PDF的页面。最后一步是将此内容写入硬盘驱动器上的文件。将此代码添加到你程序中：

#!/usr/bin/python3

# combinePdfs.py - Combines all the PDFs in the current working directory into

# a single PDF.

import PyPDF2, os

# Get all the PDF filenames.

pdfFiles = []

for filename in os.listdir('/home/hux/books/python'):

if filename.endswith('.pdf'):

pdfFiles.append('/home/hux/books/python/'+filename)

pdfFiles.sort(key = str.lower)

pdfWriter = PyPDF2.PdfFileWriter()

# Loop through all the PDF files.

for filename in pdfFiles:

pdfFileObj = open(filename, 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObj, strict=False)

for pageNum in range(1, pdfReader.numPages):

pageObj = pdfReader.getPage(pageNum)

pdfWriter.addPage(pageObj)

pdfOutput = open('allminutes.pdf', 'wb')

pdfWriter.write(pdfOutput)

pdfOutput.close()

python合并多个pdf_python合并多个pdf文件相关推荐

Python用img2pdf库批量转换图片为PDF文件
Python用img2pdf库批量转换图片为PDF文件 import os #导入os库 import time #导入时间库生成时间戳 import img2pdf #导入img2pdf库, 安装命 ...
r语言合并多个csv文件_PDF合并怎么做？分享多个PDF文件合并的方法
PDF合并怎么做?我们在工作中经常会碰到多个散落独立的文件,当我们需要将它们整合在一起时该怎么办呢,是不是还有不少小伙伴会选择去打印店将文件打印出来再逐一校对?不用这么麻烦啦!只需要将这些PDF文件合 ...
java生成pdf怎么合并行或者列_Java基础之PDF文件的合并
1.首先下载一个jar包:pdfbox-app-1.7.1.jar 2.代码如下: package com; import java.io.File; import java.io.IOExcepti ...
winfrom axacropdf预览pdf怎么一直显示_PDF合并怎么做？分享多个PDF文件合并的方法...
PDF合并怎么做?我们在工作中经常会碰到多个散落独立的文件,当我们需要将它们整合在一起时该怎么办呢,是不是还有不少小伙伴会选择去打印店将文件打印出来再逐一校对?不用这么麻烦啦!只需要将这些PDF文件合 ...
java删减pdf内容合并_[Java教程]Java基础之PDF文件的合并
[Java教程]Java基础之PDF文件的合并 0 2017-07-28 00:00:45 1.首先下载一个jar包:pdfbox-app-1.7.1.jar 2.代码如下:package com;i ...
Python 办公自动化：全网最强最详细 PDF 文件操作手册！
PDF(Portable Document Format)是一种便携文档格式,便于跨操作系统传播文档.PDF文档遵循标准格式,因此存在很多可以操作PDF文档的工具,Python自然也不例外. 而Pyt ...
python读取加密word_Python 实现加密过的PDF文件转WORD格式
更多python教程请到: 菜鸟教程www.piaodoo.com 人人影视www.sfkyty.com 实现方法简介许多文件都支持转换为PDF格式,诸如Word,Excel,PowerPoint, ...
python合并多个pdf_pypdf将多个pdf文件合并到一个pd中
我最近遇到了一个完全相同的问题,所以我深入PyPDF2,看看发生了什么,以及如何解决它. 注意:我假设filename是格式良好的文件路径字符串.假设我所有的代码都是一样的简短的回答使用PdfFi ...
Python爬取网站小说保存txt，pdf文件
# 爬取小说 http://www.hengyan.com/dir/9495.aspxfrom lxml.html import etree import requests import re imp ...

python合并多个pdf_python合并多个pdf文件

python合并多个pdf_python合并多个pdf文件相关推荐

最新文章

热门文章