【Python】处理UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xa2 in position…
编码格式问题
Traceback (most recent call last):
File “D:/PyCharm/text_processing/解析并清洗HTML.py”, line 6, in <module>
lines = f.readlines()
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xa2 in position 216: illegal multibyte sequence
遇到此情况,就是编码格式的问题了,我们应该设置编码格式为UTF-8:
with open(‘test.html’, ‘r’, encoding=‘UTF-8’) as f:
bs4.FeatureNotFound
Traceback (most recent call last):
File “D:/PyCharm/text_processing/解析并清洗HTML.py”, line 11, in <module>
soup = BeautifulSoup(string, “lxml”)
File “D:\PyCharm\text_processing\venv\lib\site-packages\bs4\__init__.py”, line 228, in __init__
% “,”.join(features))
bs4.FeatureNotFound: Couldn’t find a tree builder with the features you requested: lxml. Do you need to install a parser library?
这个问题的产生原因是缺少lxml库,可以pip一下:pip3 install lxml
测试代码
from bs4 importstring = ""with open('test.html', 'r', encoding='UTF-8') as f:lines = f.readlines()for i in lines:string = '%s %s' % (string, i)# 解析HTML
soup = BeautifulSoup(string, "lxml")# 查找id为"b_id"的div标签,并查看文本
print(soup.find("div", {"id": "b_id"}).text)
<!DOCTYPE html>
<html lang="ch">
<head><meta charset="UTF-8"><title>测试页面</title>
</head>
<body><div id="a_id"><div id="b_id"><div id="c_id">大碗宽面,真香!</div></div></div>
</body>
</html>
大碗宽面,真香!
<!DOCTYPE html>
<html lang="ch">
<head><meta charset="UTF-8"><title>测试页面</title>
</head>
<body><div id="a_id"><div id="b_id"><div id="c_id">大碗宽面,真香!</div></div></div>
</body>
</html>
大碗宽面,真香!
<!DOCTYPE html>
<html lang="ch">
<head><meta charset="UTF-8"><title>测试页面</title>
</head>
<body><div id="a_id"><div id="b_id"><div id="c_id">大碗宽面,真香!</div></div></div>
</body>
</html>
大碗宽面,真香!
可见这东西与文档格式有关,它会关注到空白字符,和浏览器解析显示不同。
【Python】处理UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xa2 in position…相关推荐
- 解决Python报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 658: illegal multibyte
解决Python报错–UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 658: illegal multibyte ...
- python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205: illegal multib
python 读取文件时报错UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205: illegal multib ...
- python打开xml文件报错:UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xb7 in position 58: illegal multi
本篇文章主要讲解,python打开xml文件报错:UnicodeDecodeError: 'gbk' codec can't decode byte 0xb7 in position 58: ille ...
- Python读取文件时出现UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0x80 in position xx: 解决方案
Python读取文件时出现UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position xx: 解决方案 参考文章: (1)Py ...
- python报错:UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xa3 in position 48
python报错: UnicodeDecodeError: 'gbk' codec can't decode byte 0xa3 in position 48: illegal multibyte s ...
- Python报错UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0x80 in position 10
Python报错(字节编码gbk) UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 10: illegal mul ...
- python报错UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0x97 in position的解决方法
在编写代码时,调用python解释器中的模块时出现 UnicodeDecodeError: 'gbk' codec can't decode byte 0x97 in position 20: ill ...
- Python——web.py模块错误【UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xab in position 285】解决方案
问题描述 render = web.template.render('templates', base='base') During handling of the above exception, ...
- python读取文件时提示“UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xad in position 1264: illegal multi
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 1264: illegal multibyte sequence ...
最新文章
- CashTippr:比特币现金MoneyButton打赏插件
- 使用SpringBoot配置了 server.servlet.path后无效的解决方案
- java 控制 android_Java For Android - 流程控制
- 也谈大公司病1——正确是最大的错误
- 离散信号的抽取和内插例题_《数字信号处理》学习指导与题解 2011年版
- php 几秒后返回,php计算两个时间差并返回差多少天、时、分、秒
- python 笔记数据类型
- 【字体】编程常用字体推荐,微软,苹果,开源系统默认代码字体
- Android Toast 吐司 自定义使用 展示图片 Toast自定义教程(一)
- Php生成图片的大小单位是cm,php生成图片缩略图代码类
- linux学习笔记——创建软连接
- AndroidStudio:The application‘s minSdkVersion is newer than the device API level.
- 计算机的组成以及其功能
- SAP销售发票会计凭证汇率跟随客户汇率类型
- 高德地图实时定位显示图标和名字
- JavaWeb商城项目笔记--- Day1 (热门商品,热销商品)
- vue项目有几个接口content Download时间特别长的解决办法
- 摄像头如何抓住你超速、压线、闯红灯?
- 【移动终端应用开发】实验2:SQLite数据库的使用
- 解决中文乱码的文章,抄的