PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

参考资料:https://github.com/mstamy2/PyPDF2/issues/438

使用 PyPDF2 做合并 PDF 文件时报错如下:

Traceback (most recent call last):File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484, in readFromStreamreturn NameObject(name.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byteDuring handling of the above exception, another exception occurred:Traceback (most recent call last):File "D:\projects\myproject\apps\backstage\views\busi_contract_manage_view.py", line 703, in postmerge_pdf_result = merge_pdf(final_files, pdf_path)File "D:\projects\myproject\apps\utils\doc_convert_util.py", line 86, in merge_pdfpdf_writer.write(new_file)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 482, in writeself._sweepIndirectReferences(externalReferenceMap, self._root)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferencesself._sweepIndirectReferences(externMap, realdata)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferencesvalue = self._sweepIndirectReferences(externMap, value)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferencesself._sweepIndirectReferences(externMap, realdata)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferencesvalue = self._sweepIndirectReferences(externMap, value)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferencesvalue = self._sweepIndirectReferences(externMap, data[i])File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferencesself._sweepIndirectReferences(externMap, realdata)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferencesvalue = self._sweepIndirectReferences(externMap, value)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferencesvalue = self._sweepIndirectReferences(externMap, value)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferencesvalue = self._sweepIndirectReferences(externMap, value)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferencesnewobj = data.pdf.getObject(data)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObjectretval = readObject(self.stream, self)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 66, in readObjectreturn DictionaryObject.readFromStream(stream, pdf)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 579, in readFromStreamvalue = readObject(stream, pdf)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 60, in readObjectreturn NameObject.readFromStream(stream, pdf)File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 492, in readFromStreamraise utils.PdfReadError("Illegal character in Name Object")
PyPDF2.utils.PdfReadError: Illegal character in Name Object

找到对应的报错文件

File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484

第484行 原代码:

try:return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:# Name objects should represent irregular characters# with a '#' followed by the symbol's hex numberif not pdf.strict:warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)return NameObject(name)else:raise utils.PdfReadError("Illegal character in Name Object")

在 except 中加入代码

return NameObject(name.decode('gbk'))

修改后

try:return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:try:return NameObject(name.decode('gbk'))except (UnicodeEncodeError, UnicodeDecodeError) as e:# Name objects should represent irregular characters# with a '#' followed by the symbol's hex numberif not pdf.strict:warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)return NameObject(name)else:raise utils.PdfReadError("Illegal character in Name Object")

修改后仍会报错,需要修改修改另一处

Lib/site-packages/PyPDF2/utils.py 第238行

原代码

r = s.encode('latin-1')
if len(s) < 2:bc[s] = r
return r

修改后代码:

try:r = s.encode('latin-1')
except Exception as e:r = s.encode('utf-8')
if len(s) < 2:bc[s] = r
return r

PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object相关推荐

  1. PyPDF2.utils.PdfReadError: Unexpected destination '/__WKANCHOR_2'

    在学习这篇文章之前,对于一点都不懂python的朋友,可以去看下我之前写过的博客文章,也都是学习过程中的一些收获,感兴趣的可以去看看http://www.flybi.net/blog/seng. 之所 ...

  2. 解决firebug报“illegal character错误的问题

    项目中的一个js文件在firefox中总报"illegal character"错误,而且firebug中显示js文件的首字符有乱码. 用notepad2打开js文件查看编码,当前 ...

  3. 【请求第三方 url 异常】Illegal character in scheme name at index x (已解决)

    目录 一.写在前面 二.问题场景 三.场景重现 1.拼接字符串并请求高德url 2.控制台异常提示 3.手动访问对应 `URL` 结果 四.正确流程 1.报错分析 2.查并修改找异常 3.重新运行并请 ...

  4. Illegal character in query Url中含有{}

    在抓取某网站信息的时候,url是这样的: http....{...}...,出现了大括号. 原样将url放在代码里,发送post请求的时候,发现请求不到数据.但把url放到浏览器里是ok的. 对相关的 ...

  5. Error: illegal character: '\u200b'编译错误解决方法

    Error: illegal character: '\u200b' 编译错误如下 错误原因 此类错误关注这个错误的字符--'\u200b' '\u200b'表示为zero-width-space,以 ...

  6. 关于(NOTICE)iconv(): Detected an illegal character in input string无法转码解决方案

    今天遇到了一个坑,就是用iconv转文件名编码时出现(NOTICE)iconv(): Detected an illegal character in input string错误,我用iconv(& ...

  7. php iconv detected,PHP错误:iconv() Detected an illegal character

    @header("Content-type:text/html;charset=GB2312"); $arr = array ('name'=>"贾朝藤" ...

  8. 线上问题处理-feign调用报错(Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) )

    线上问题处理-feign调用报错 业务场景:服务1通过Feign调用服务2,测试阶段一切正常,线上有数据丢失(为避免敏感本地简单复现了下).报错如下: 2021-12-04 13:47:47.774 ...

  9. 【Android】java.lang.IllegalArgumentException Illegal character in scheme 异常

    在使用字符串拼接成url地址进行数据请求的时候,有时在数据请求的时候会出现 java.lang.IllegalArgumentException: Illegal character in schem ...

最新文章

  1. php开发之登录注册教程,PHP开发登录注册完整代码之注册PHP页面
  2. PHP开发移动端接口
  3. 业务系统设计之一:系统菜单设计
  4. arm linux 蜂鸣器qt,Qt 程序中使用蜂鸣器 ioctl()
  5. deepin终端编译c程序_在deepin linux上安装国人开发的编程语言-“明”语言
  6. 麦库:盛大的知识管理软件
  7. 完成蓝蜂浏览器的框架重构
  8. java每轮排序结果,冒泡排序及其优化java
  9. python datetime strptime_python datetime模块strptime/strptime format常见格式命令_施罗德_新浪博客...
  10. 国际学术期刊会议大排名与常用的期刊会议名字
  11. WinPE启动映像制作(具体可参考WAIK帮助文档,这里只做简介)WinPe Image 制作篇
  12. 扫描枪取消回车二维码_如何使条码扫描枪录入完不自动回车呢-百度经验
  13. 小程序防抖功能以及wx:for的使用
  14. 浅看Redis内存回收
  15. 高清视频文件丢了怎么恢复丨电脑下载好的缓存数据
  16. Zrlog开源博客网站 安装教程
  17. 过来人教你如何系统学STM32
  18. Canvas绘制一个时钟
  19. 给用Rose的提个醒
  20. python移除文本中英文,数字和字符

热门文章

  1. AN0152—AT32WB415直接测试模式入门指南
  2. 什么是IGBT ,其工作的原理是什么
  3. PyTorch中的matmul函数详解
  4. 太阳能遥控LED灯方案
  5. 视频教程-Html交互式网站制作视频课程-HTML5/CSS
  6. 不确定性:人类的现状,未来,过去,特质,在宇宙中的位置,道德与良知
  7. win2003服务器360修复漏洞打不开网页,WIN2003服务器出现HookPort 服务启动失败的解决办法!...
  8. 在Arcgis及ArcgisPro中加载全国亚米级影像——星图影像
  9. 甲骨文吞日,java何去何从
  10. Linux下抓包命令Tcpdump