Python Urllib库详解
Urllib库详解
什么是Urllib?
Python内置的HTTP请求库
urllib.request 请求模块
urllib.error 异常处理模块
urllib.parse url解析模块
urllib.robotparser robots.txt解析模块
相比Python2变化
python2
import urllib2
response = urllib2.urlopen('http://www.baidu.com')
python3
import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')
urllib
urlopen
urllib.request.urlopen(url,data=None,[timeout,]*,cafile=None,capath=None,cadefault=False,context=None)
import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')
print(response.read().decode('utf-8'))
import urllib.parse
import urllib.requestdata = bytes(urllib.parse.urlencode({'word':'hello'}),encoding='utf8')
response = urllib.request.urlopen('http://httpbin.org/post',data=data)
print(response.read())
import urllib.request
response = urllib.request.urlopen('http://httpbin.org/get',timeout=1)
print(response.read())
import socket
import urllib.request
import urllib.error
try:response = urllib.request.urlopen('http://httpbin.org/get',timeout=0.1)
except urllib.error.URLError as e:if isinstance(e.reason,socket.timeout):print('TIME OUT')
响应
响应类型
import urllib.request
response = urllib.request.urlopen('https://www.python.org')
print(type(response))
状态码、响应头
import urllib.request
response = urllib.request.urlopen('http://www.python.org')
print(response.status)
print(response.getheaders())
Request
import urllib.request
request = urllib.request.Request('https://python.org')
response = urllib.request.urlopen(request)
print(response.read().decode('utf-8'))
from urllib import request,parse
url = 'http://httpbin.org/post'
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36','Host':'httpbin.org'
}
dict = {'name':'puqunzhu'
}
data = bytes(parse.urlencode(dict),encoding='utf8')
req = request.Request(url=url,data=data,headers=headers,method='POST')
response = request.urlopen(req)
print(response.read().decode('utf-8'))
from urllib import request,parse
url = 'http://httpbin.org/post'
dict = {'name':'puqunzhu'
}
data = bytes(parse.urlencode(dict),encoding='utf8')
req = request.Request(url=url,data=data,method='POST')
req.add_header('User-Agent','Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36')
response = request.urlopen(req)
print(response.read().decode('utf-8'))
Handler
代理
import urllib.request
proxy_handler = urllib.request.ProxyHandler({'http':'http://61.135.217.7:80','https':'https://61.150.96.27:46111',
})
opener = urllib.request.build_opener(proxy_handler)
response = opener.open('http://www.baidu.com')
print(response.read())
Cookie
import http.cookiejar,urllib.request
cookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com')
for item in cookie:print(item.name+"="+item.value)
import http.cookiejar,urllib.request
filename = "cookie.txt"
cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True,ignore_expires=True)
import http.cookiejar,urllib.request
filename = "cookie.txt"
cookie = http.cookiejar.LWPCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open("http://www.baidu.com")
cookie.save(ignore_discard=True,ignore_expires=True)
import http.cookiejar,urllib.request
cookie = http.cookiejar.LWPCookieJar()
cookie.load('cookie.txt',ignore_discard=True,ignore_expires=True)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open("http://www.baidu.com")
print(response.read().decode('utf-8'))
异常处理
from urllib import request,error
try:response = request.urlopen('http://cuiqingcai.com/index.htm')
except error.URLError as e:print(e.reason)
from urllib import request,error
try:response = request.urlopen('http://cuiqingcai.com/index.htm')
except error.HTTPError as e:print(e.reason,e.code,e.headers,sep="\n")
except error.URLError as e:print(e.reason)
else:print("Request Successfully")
import socket
import urllib.request
import urllib.error
try:response = urllib.request.urlopen('http://www.baiduc.com',timeout=0.01)
except urllib.error.URLError as e:print(type(e.reason))if isinstance(e.reason,socket.timeout):print("TIME OUT")
URL解析
urlparse
urllib.parse.urlparse(urlstring,scheme='',allow_fragments=True)
from urllib.parse import urlparse
result = urlparse('http://www.baidu.com/index.html;urser?id=5#comment')
print(type(result),result)
from urllib.parse import urlparse
result = urlparse('www.baidu.com/index.html;user?id=5#comment,scheme="https"')
print(result)
from urllib.parse import urlparse
result = urlparse('http://www.baidu.com/index.html;user?id=5#comment,scheme="https"')
print(result)
from urllib.parse import urlparse
result = urlparse('http://www.baidu.com/index.html;user?id=5#comment,allow_fragments=False')
print(result)
from urllib.parse import urlparse
result = urlparse('http://www.baidu.com/index.html#comment',allow_fragments=False)
print(result)
utlunoarse
from urllib.parse import urlunparse
data =['http','www.baidu.com','index.html','user','a=6','comment']
print(urlunparse(data))
urljoin
from urllib.parse import urljoin
print(urljoin('http://www.baidu.com','?category=2#comment'))
print(urljoin('htttp://www.baidu.com/about.html','htttp://www.baidu.com/FAQ.html'))
urlencode
from urllib.parse import urlencode
params = {'name':'puqunzhu','age':23
}
base_url="http://www.baidu.com?"
url=base_url + urlencode(params)
print(url)
转载于:https://www.cnblogs.com/puqunzhu/p/9803769.html
Python Urllib库详解相关推荐
- python爬虫之urllib库详解
python爬虫之urllib库详解 前言 一.urllib库是什么? 二.urllib库的使用 urllib.request模块 urllib.parse模块 利用try-except,进行超时处理 ...
- 爬虫入门之urllib库详解(二)
爬虫入门之urllib库详解(二) 1 urllib模块 urllib模块是一个运用于URL的包 urllib.request用于访问和读取URLS urllib.error包括了所有urllib.r ...
- python re库 详解(正则表达式)
python re库 详解(正则表达式) 说明 则表达式(英文名称:regular expression,regex,RE)是用来简洁表达一组字符串特征的表达式.最主要应用在字符串匹配中. 1).re ...
- python解析库详解_PyQuery库详解
通过这篇文章为大家介绍崔庆才老师对Python爬虫PyQuery库的讲解,包括基本原理及其理论知识点 本文代码较多,建议阅读时间10分钟,并且注重理论与实践相结合 觉得文章比较枯燥和用电脑观看的可以点 ...
- python requests库详解_python爬虫之路(一)-----requests库详解
requests库 requests库是python实现的最简单易用的http库. requests库的功能详解. 我们可以自然而然地想到这些方法其实就是http协议对资源的操作. 调用request ...
- python爬虫知识点总结(三)urllib库详解
一.什么是Urllib? 官方学习文档:https://docs.python.org/3/library/urllib.html 廖雪峰的网站:https://www.liaoxuefeng.com ...
- pythonurllib模块-urllib库详解 --Python3
相关:urllib是python内置的http请求库,本文介绍urllib三个模块:请求模块urllib.request.异常处理模块urllib.error.url解析模块urllib.parse. ...
- 爬虫笔记:Urllib库详解
如果不懂爬虫,请看链接 爬虫第一课:爬虫的基本原理 什么是Urllib Urllib是python 内置的http的请求库.包含四个模块 urllib.request 请求模块 urllib.erro ...
- python pandas库详解_Pandas 库的详解和使用补充
pandas 库总体说明 Pandas 基亍 NumPy.SciPy 补充了大量数据操作功能,能实现统计.凾组.排序.透规 表,可以代替 Excel 的绛大部凾功能. Pandas 主要有 2 种重要 ...
最新文章
- 给Python的类和对象动态增加属性和方法
- python断言失败_python异常处理、自定义异常、断言原理与用法分析
- windows下apache+php+mysql 环境配置方法
- 并查集模板——并查集(洛谷 P3367)
- atitit.信息系统方案规划 p71.doc
- 抓肇事车(入门级算法)(C语言)
- 控制台打印vue实例
- 软件工程造价是做什么的?
- codeblocks怎么编程c语言,如何能使用Codeblocks进行C语言编程操作.doc
- 云南省21年春季高中信息技术学业水平考试真题
- 小米生态链成功的12个关键因素
- 火狐受信任站点设置_火狐浏览器如何添加信任站点?添加信任站点的方法说明...
- 计算机nas一般指用户,NAS网络存储器·什么是网络服务
- 【前端】jQuery上传图片插件 : uploadifive
- Java 来判断手机号码是否已经存在例子
- 利用Python去除图片水印,太神奇了!
- 《卓有成效的管理者》 读书摘要
- 使用开源实时监控系统 HertzBeat 5分钟搞定 Mysql 数据库监控告警
- 3d打印出现孔洞和裂缝问题
- 计算机音乐谱红昭愿,天谕手游红昭愿乐谱代码是什么-天谕手游红昭愿乐谱代码分享_快吧手游...