一、基本概念

1、请求方法

1.1 GET

查询参数都会在URL上显示出来

1.2 POST

查询参数和需要提交数据是隐藏在Form表单里的,不会在URL地址上显示出来

2、Refer

表明当前这个请求是从哪个url过来的。一般情况下可以用来做反爬的技术

3、状态码

200 : 请求成功
301 : 永久重定向
302 : 临时重定向
403 : 服务器拒绝请求
404 : 请求失败(服务器无法根据客户端的请求找到资源（网页）)
500 : 服务器内部请求

二、获取图片

1、requests模块

import requests
url='https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1594560732508&di=abeab1548602c8dd299bae4ae8222ddb&imgtype=0&src=http%3A%2F%2Fa1.att.hudong.com%2F05%2F00%2F01300000194285122188000535877.jpg'
req=requests.get(url)
with open('abc.png','wb') as f:f.write(req.content)   # content 二进制写入文件

2、request模块

from urllib import request
request.urlretrieve(url,'abc.png')

三、获取网页源码 urllib.request

urllib.request.urlopen(“url”) 作用：向网站发起一个请求并获取响应，但不支持重构User-Agent

import urllib.request
url='https://image.so.com/view?q=%E5%9B%BE%E7%89%87&listsrc=sobox&listsign=39bec53705f38065e770b1790dc15312&src=360pic_strong&correct=%E5%9B%BE%E7%89%87&ancestor=list&cmsid=b241ef363f385a18b6dc881a59fef59f&cmras=6&cn=0&gn=0&kn=50&crn=0&bxn=20&fsn=130&cuben=0&adstar=0&clw=233#id=39bec53705f38065e770b1790dc15312&currsn=0&ps=112&pc=112'
# 向网站发起一个请求，得到一个结果，用一个变量接收
response=urllib.request.urlopen(url)
# 从响应对象中获取数据 read()函数读取
html=response.read().decode('utf-8')
print(html)

获取百度网页源码
urllib.request.Request"url",headers=headers)

import urllib.request
url='https://www.baidu.com/'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'}
# 1、创建请求对象（构建User-Agent）
response=urllib.request.Request(url,headers=headers)
# 2、获取响应对象（urlopen()）
res=urllib.request.urlopen(response)
# 3、读取响应对象的内容（read().decode('utf-8')）
html=res.read().decode('utf-8')
print(html)
print(res.getcode())   # 返回状态码
print(res.geturl())    # 返回实际请求的网站

四、urllib.parse模块

4.1 urllib.parse.urlencode()

import urllib.parse
r={'wd':'海贼王'}
result=urllib.parse.urlencode(r)
print(result)     # wd=%E6%B5%B7%E8%B4%BC%E7%8E%8B

练习：在百度上输入一个内容（动物），数据保存到本地文件

import urllib.request
import urllib.parse
baseurl='https://www.baidu.com/s?'
content=input('请输入：')
wd={'wd':content}
wd=urllib.parse.urlencode(wd)
url=baseurl + wd
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 ''(KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'}
response=urllib.request.Request(url,headers=headers)
res=urllib.request.urlopen(response)
html=res.read().decode('utf-8')
with open('动物.html','w',encoding='utf-8') as f:f.write(html)

4.2 urllib.parse.quote()

import urllib.parse
key=input('请输入:')
baseurl='https://www.baidu.com/s?wd='
r=urllib.parse.quote(key)
url=baseurl + r
print(url)   # https://www.baidu.com/s?wd=%E5%8A%A8%E7%89%A9

Python爬虫01—请求模块相关推荐

Python 爬虫网络请求模块下——requests
requests 上一节我们了解了 urllib 的基本用法,但是其中确实有不方便的地方,比如处理网页验证,需要写Opener 和 Handler 来处理.为了更加方便的实现这些操作,就有了更为强大的 ...
Python爬虫进阶——urllib模块使用案例【淘宝】
Python爬虫基础--HTML.CSS.JavaScript.JQuery网页前端技术 Python爬虫基础--正则表达式 Python爬虫基础--re模块的提取.匹配和替换 Python爬虫基础- ...
Python爬虫之urllib模块2
Python爬虫之urllib模块2 本文来自网友投稿作者:PG-55,一个待毕业待就业的二流大学生. 看了一下上一节的反馈,有些同学认为这个没什么意义,也有的同学觉得太简单,关于Beautiful ...
python爬虫——https请求
python爬虫--https请求 from urllib.request import Request,urlopen from fake_useragent import UserAgent im ...
python爬虫之bs4模块（超详细）
python爬虫之bs4模块(超详细) 一.bs4简介二.使用方法三.BeautifulSoup四大对象种类 (1)tag (2)NavigableString (3)BeautifulSoup ...
python urllib.request 爬虫数据处理-python爬虫1--urllib请求库之request模块
urllib为python内置的HTTP请求库,包含四个模块: request:最基本的HTTP请求模块, 只需要传入URL和参数 error:异常处理模块 parse:工具模块,处理URL,拆分.解 ...
python爬虫post请求翻页_python爬虫如何POST request payload形式的请求
python爬虫如何POST request payload形式的请求 1. 背景最近在爬取某个站点时,发现在POST数据时,使用的数据格式是request payload,有别于之前常见的 POS ...
python爬虫之requests模块2
python网络爬虫之requests模块 session处理cookie proxies参数设置请求代理ip 基于线程池的数据爬取一获取验证码步骤: 1 注册云大码 http://www. ...
Python爬虫-02 request模块爬取妹子图网站
简介 #介绍:使用requests可以模拟浏览器的请求,比起之前用到的urllib,requests模块的api更加便捷(本质就是封装了urllib3)#注意:requests库发送请求将网页内容下载 ...

Python爬虫01—请求模块

请求模块