Python 解密 pdf 文件

一，利用pypdf库批量解除pdf 的文件的密码。这里选择pypdf4，其它pypdf2，pypdf3等，亦可参考，代码如下：

import os
from PyPDF4 import PdfFileReader
from PyPDF4 import PdfFileWriterres_path="./resdir/"def decrypt_pdf(srcfname, resfname, password):try:file = open(srcfname, 'rb')except Exception as err:print('file open failed!' + str(err))return Nonepdf_reader = PdfFileReader(file, strict=False)if not pdf_reader.isEncrypted:print('file is no encrypted, do nothing. file: %s' % srcfname)return Noneret = pdf_reader.decrypt(password)if (ret != 1):print("%s no password (%s) is error" % (srcfname, password))return Nonepdf_writer = PdfFileWriter()pdf_writer.appendPagesFromReader(pdf_reader)res_file = open(resfname, 'wb')pdf_writer.write(res_file)file.close()res_file.close()return Nonedef main():os.mkdir(res_path)src_path = input(r"input pdf path(example: D:\\pdf\): ")password = input(r"input passwd(example: 123456): ") if src_path == "" or password == "":print('please input right path and password !!!')returnfor filename in os.listdir(src_path):sfname = src_path + filenamerfname = res_path + filenameprint("----- start decrypting file-----------")decrypt_pdf(sfname, rfname)print("----- end decrypting file-------------")if __name__ == '__main__':main()

使用环境：python3环境，将此脚本和要解密的pdf文件夹放在同级目录下执行。

二，解密过程中遇到的问题：

  File "/xxx/lib/python3.10/site-packages/PyPDF4/utils.py", line 237, in b_r = s.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u02c6' in position 0: ordinal not in range(256)

这个问题是pypdf库在解析 pdf中文文档时会出现，解决方法是修改库里面的utils.py文件，如下：

源代码：

...
r = s.encode('latin-1')if len(s) < 2:bc[s] = rreturn r
...

修改后：

...
try:r = s.encode('latin-1')
except Exception as e:r = s.encode('utf-8')if len(s) < 2:bc[s] = r
return r
...

修改完后重新运行上面脚本，既可解决此问题。

Python 解密 pdf 文件相关推荐

Python调用pikepdf模块解密PDF文件（使用tkinter模块绘制GUI）
Python调用pikepdf模块解密.合并PDF文件(使用tkinter模块绘制GUI) 安装模块 pip install pikepdf 代码 #!/usr/bin/pythonimport os ...
利用Python提取PDF文件中的文本信息
如何利用Python提取PDF文件中的文本信息日常工作中我们经常会用到pdf格式的文件,大多数情况下是浏览或者编辑pdf信息,但有时候需要提取pdf中的文本,如果是单个文件的话还可以通过复制粘贴来直 ...
8、【办公自动化】Python实现PDF文件的批量操作
说明平时工作中,经常会和 PDF 文件打交道,比如,合并.拆分.加解密.添加和去除水印.提取指定内容.转换成其他文件格式等操作.如果只是处理单个 PDF 文件的话,有些操作是比较简单的,而如果需要批 ...
python合并pdf文件并生成页面
1. 环境搭建 # python 版本 3.9.7 # 依赖包 reportlab==3.6.9,PyPdf2==1.27.3,pikepdf==3.0.8,pyinstaller==4.10 2.功 ...
python读取pdf文件_python读取pdf文件
广告关闭腾讯云11.11云上盛惠 ,精选热门产品助力上云,云服务器首年88元起,买的越多返的越多,最高返5000元! 一.安装pdfminer3k模块?二. 读取pdf文件import sysimp ...
Python绘制PDF文件~超简单的小程序
Python绘制PDF文件项目简介这次项目很简单,本次项目课,代码不超过40行,主要是使用 urllib和reportlab模块,来生成一个pdf文件. reportlab官方文档 http:// ...
gnuradio上怎么使用python文件_使用Python从PDF文件中提取数据
前言数据是数据科学中任何分析的关键,大多数分析中最常用的数据集类型是存储在逗号分隔值(csv)表中的干净数据.然而,由于可移植文档格式(pdf)文件是最常用的文件格式之一,因此每个数据科学家都应该了 ...
pdf exe如何提取pdf文件_python应用：如何用python提取pdf文件中的文字
从pdf中提取文字,相信很多人都干过这事,怎么在python中实现呢,今天带大家看看. 第一步导入库 import PyPDF2 第二步导入pdf文件 pdf_file =open('dataset/ ...
python数据生成pdf,Python生成pdf文件的方法
摘要:这篇Python开发技术栏目下的"Python生成pdf文件的方法",介绍的技术点是"python生成pdf文件.python生成pdf.生成pdf文件.Pytho ...

Python 解密 pdf 文件

Python 解密 pdf 文件相关推荐

最新文章

热门文章