python request url编码_Python 爬虫 (requests) 发送中文编码的 HTTP POST 请求

向往常一样发送POST请求出现错误

网站信息

表单页面

结果

网页使用 gb2312 编码

使用 requests 发送 post 请求

In [2]: import requests

In [3]: from bs4 import BeautifulSoup as BS

In [4]: url = 'http://example.com/ip/search.asp'

In [5]: data = {

...: 'loudong': '女生九栋',

...: 'fangjian': '101-1'}

In [6]: res = requests.post(url, data=data)

In [9]: res.encoding = 'gb2312'

查询失败

使用 Wireshark 对比浏览器发送的数据和 requests 发送的数据

浏览器发送的 post 数据

requests 发送的 post 数据

可以看到 loudong 的值编码后不一样：

浏览器使用 gb2312 进行编码

requests 使用 utf-8 进行编码

使用 requests 发送自己编码后的 post 请求

步骤：

为 HTTP Headers 手动加上 Content-Type: application/x-www-form-urlencoded

以字符串形式将编码后的 post 数据传给 requests 的 data 属性

如果表单处理的文本：enctype 的值是 application/x-www-form-urlencoded，也是默认值

如果表单处理的是提交文件：enctype 的值是 multipart/form-data

enctype 表示表单提交的数据的编码方式

如果将字典传给 requests 的 data 属性：requests 自动为数据进行编码

如果将字符串传给 requests 的 data 属性：requests 会直接发送字符串数据

In [12]: from urllib.parse import urlencode

# 对 post 数据进行 gb2312 编码

In [13]: data_gb2312 = urlencode(data, encoding='gb2312')

# 在 HTTP头部添加 application/x-www-form-urlencoded

In [14]: headers = {

...: 'Content-Type': 'application/x-www-form-urlencoded'}

In [15]: res = requests.post(url, data=data_gb2312, headers=headers)

In [16]: res.encoding = 'gb2312'

In [17]: soup = BS(res.text, 'lxml')

In [18]: for item in soup.findAll('strong'):

...: print(item.parent.parent.text.replace('\n', ''))

...:

楼栋：女生九栋

房间号-端口号：101-1

IP地址：10.0.79.2

子网掩码：255.255.255.0

默认网关：10.0.79.1

首选DNS服务器：192.168.170.254

python request url编码_Python 爬虫 (requests) 发送中文编码的 HTTP POST 请求相关推荐

python request url编码_Python爬虫进阶——Request对象之Get请求与URL编码【英雄联盟吧】...
在上一篇中,我们是通过urllib.request.urlopen直接访问的网页地址,但在实际应用中,我们更多地使用urllib.request.Request对象,因为其可以封装headers和da ...
python request url 转义_Python爬虫入门笔记
来源:blog.csdn.net/weixin_44864260 爬虫四大步骤: 1.获取页面源代码 2.获取标签 3.正则表达式匹配 4.保存数据 1. 获取页面源代码 5个小步骤: 1.伪装成浏览 ...
python request url 转义_Python多线程抓取Google搜索链接网页
1)urllib2+BeautifulSoup抓取Goolge搜索链接近期,参与的项目需要对Google搜索结果进行处理,之前学习了Python处理网页相关的工具.实际应用中,使用了urllib2和 ...
python request headers获取_Python爬虫实战—— Request对象之header伪装策略
在header当中,我们经常会添加两个参数--cookie 和 User-Agent,来模拟浏览器登录,以此提高绕过后台服务器反爬策略的可能性. User-Agent获取 User-Agent可通过随 ...
python post请求实例_Python使用requests发送POST请求实例代码
本文研究的主要是Python使用requests发送POST请求的相关内容,具体介绍如下. 一个http请求包括三个部分,为别为请求行,请求报头,消息主体,类似以下这样: 请求行请求报头消息主体 ...
python url编码_python url编码和解码
一.为什么要进行Url编码 url带参数的请求格式为(举例): http://www.baidu.com/s?k1=v1&k2=v2 当请求数据为字典data = {k1:v1, k2:v2} ...
python使用代理爬虫_python爬虫requests使用代理ip
python爬虫requests使用代理ip 一.总结一句话总结: a.请求时,先将请求发给代理服务器,代理服务器请求目标服务器,然后目标服务器将数据传给代理服务器,代理服务器再将数据给爬虫. b. ...
python用法查询笔记_Python爬虫学习笔记(三)
handler处理器自定义 - Cookies && URLError && json简单使用 Cookies: 以抓取https://www.yaozh.com/为例 ...
python 12306抢票_Python爬虫实战：12306抢票开源！
今天就和大家一起来讨论一下python实现12306余票查询(pycharm+python3.7),一起来感受一下python爬虫的简单实践我们说先在浏览器中打开开发者工具(F12),尝试一次余票的 ...

python request url编码_Python 爬虫 (requests) 发送中文编码的 HTTP POST 请求

python request url编码_Python 爬虫 (requests) 发送中文编码的 HTTP POST 请求相关推荐

最新文章

热门文章