[python3 - TroubleShooting] requests爬取中文网站后直接print，以及保存csv乱码

问题：

page = requests.get().text返回的中文不管是直接print，还是保存进csv都出现了乱码

背景：

我爬取的网站head里定义了charset=utf-8

本地windows系统默认编码是gbk

尝试：

直接print page; 写入csv文件的时候, 不指定编码
- writerow()报错 - UnicodeEncodeError: 'gbk' codec can't encode character '\xe6' in position 0: illegal multibyte sequence
直接print page; 写入csv文件的时候，encoding='gbk'
- 报错同上
直接print page; 写入csv文件时，encoding='utf-8'
- print/csv乱码1
page.encode('utf-8').decode('gbk') - 对page进行utf-8编码后用gbk解码
- print/csv乱码1
page.encode('gbk','ignore').decode('gbk')；写入csv时，encoding='gbk' - 对page进行gbk编码后用gbk解码
- print/csv乱码2
page.encode('gbk','ignore').decode('gbk')；写入csv时，encoding='utf-8' - 对page进行gbk编码后用gbk解码
- print乱码2/csv乱码3
page.encode(resquests.get().encoding).decode('gbk'); 写入csv时，encoding='utf-8'
- print正常显示/csv乱码4
page.encode(resquests.get().encoding).decode('gbk'); 写入csv时，encoding='gbk'
- 都正常显示
page.encode(resquests.get().encoding).decode('gbk'); 写入csv时，不指定编码
- 都正常显示

结论：

不要自己想网页是用什么编码，直接用requests.get()返回的编码，我这个例子中的网站其实用的是ISO-8859-1编码。。。
csv写入的时候，默认编码就是windows的编码，也就是说一般中文系统的电脑，要写入中文的话，不需要指定encoding

转载于:https://www.cnblogs.com/break-dawnn/p/9044075.html

[python3 - TroubleShooting] requests爬取中文网站后直接print，以及保存csv乱码相关推荐

python3.x+requests 爬取网站遇到中文乱码的解决方案
正常情况下,遇见问题上google找答案能甩百度100条街,但是这个问题是个例外······人家老外就没有乱码的问题.言归正传,首先建议大家看一下python3.x+requests 爬取网站遇到中文 ...
python爬虫案例——根据网址爬取中文网站，获取标题、子连接、子连接数目、连接描述、中文分词列表
全栈工程师开发手册 (作者:栾鹏) python教程全解其中使用到了urllib.BeautifulSoup爬虫和结巴中文分词的相关知识. 调试环境python3.6 # 根据连接爬取中文网站,获取 ...
requests爬取中文网页时中文字符变英文的解决方法
在使用python requests库爬取网页时,源代码中的中文字符在爬取下来后变成了英文字符例如: import requests r = requests.get('http://apps.we ...
【爬虫】002 python3 +beautifulsoup4 +requests 爬取静态页面
实验环境: win7 python3.5 bs4 0.0.1 requests 2.19 实验日期:2018-08-07 爬取网站:http://www.xhsd.cn/ 现在的网站大多有复杂的 ...
Scrapy 简单爬取厨房网站菜谱清单，并将结果保存为csv文件
链接:http://www.xiachufang.com/explore/ from scrapy import Request from scrapy.spiders import Spidercl ...
python爬取小说网站资源_利用python的requests和BeautifulSoup库爬取小说网站内容
1. 什么是Requests?html Requests是用Python语言编写的,基于urllib3来改写的,采用Apache2 Licensed 来源协议的HTTP库.python 它比urlli ...
python3.6爬虫案例：爬取某网站所有PPT（上）。
写在前面这次实现之前的flag:爬取第一ppt网站的所有PPT,当然网站中有其他很多的学习资料,这次只爬取PPT.不仅可以平时做模板演示用,还可以练习爬虫,岂不美滋滋.闲话不多说,进入正题. 先 ...
Python3爬取豆瓣网站奇幻小说信息
目的:爬取豆瓣网站的奇幻小说信息 **分析:**URL=https://book.douban.com/tag/%E5%A5%87%E5%B9%BB?start=0&type=T,通过手动翻页 ...
使用requests爬取实习僧网站数据
任务要求: 爬取实习僧网站的招聘公司信息和职位信息,并存储到数据库中,对应的数据库表和需要爬取的字段见下面表一和表二(注意:爬取存在的字段) 代码以上传带github上:使用requests爬取实习僧 ...

[python3 - TroubleShooting] requests爬取中文网站后直接print，以及保存csv乱码

[python3 - TroubleShooting] requests爬取中文网站后直接print，以及保存csv乱码相关推荐

最新文章

热门文章