Python Urllib库详解

Urllib库详解

什么是Urllib?

Python内置的HTTP请求库

urllib.request 请求模块
urllib.error 异常处理模块
urllib.parse url解析模块
urllib.robotparser robots.txt解析模块

相比Python2变化

python2

import urllib2
response = urllib2.urlopen('http://www.baidu.com')

python3

import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')

urllib

urlopen

urllib.request.urlopen(url,data=None,[timeout,]*,cafile=None,capath=None,cadefault=False,context=None)

import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')
print(response.read().decode('utf-8'))

import urllib.parse
import urllib.requestdata = bytes(urllib.parse.urlencode({'word':'hello'}),encoding='utf8')
response = urllib.request.urlopen('http://httpbin.org/post',data=data)
print(response.read())

import urllib.request
response = urllib.request.urlopen('http://httpbin.org/get',timeout=1)
print(response.read())

import socket
import urllib.request
import urllib.error
try:response = urllib.request.urlopen('http://httpbin.org/get',timeout=0.1)
except urllib.error.URLError as e:if isinstance(e.reason,socket.timeout):print('TIME OUT')

响应

响应类型

import urllib.request
response = urllib.request.urlopen('https://www.python.org')
print(type(response))

状态码、响应头

import urllib.request
response = urllib.request.urlopen('http://www.python.org')
print(response.status)
print(response.getheaders())

Request

import urllib.request
request = urllib.request.Request('https://python.org')
response = urllib.request.urlopen(request)
print(response.read().decode('utf-8'))

from urllib import request,parse
url = 'http://httpbin.org/post'
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36','Host':'httpbin.org'
}
dict = {'name':'puqunzhu'
}
data = bytes(parse.urlencode(dict),encoding='utf8')
req = request.Request(url=url,data=data,headers=headers,method='POST')
response = request.urlopen(req)
print(response.read().decode('utf-8'))

from urllib import request,parse
url = 'http://httpbin.org/post'
dict = {'name':'puqunzhu'
}
data = bytes(parse.urlencode(dict),encoding='utf8')
req = request.Request(url=url,data=data,method='POST')
req.add_header('User-Agent','Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36')
response = request.urlopen(req)
print(response.read().decode('utf-8'))

Handler

代理

import urllib.request
proxy_handler = urllib.request.ProxyHandler({'http':'http://61.135.217.7:80','https':'https://61.150.96.27:46111',
})
opener = urllib.request.build_opener(proxy_handler)
response = opener.open('http://www.baidu.com')
print(response.read())

Cookie

import http.cookiejar,urllib.request
cookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com')
for item in cookie:print(item.name+"="+item.value)

import http.cookiejar,urllib.request
filename = "cookie.txt"
cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True,ignore_expires=True)

import http.cookiejar,urllib.request
filename = "cookie.txt"
cookie = http.cookiejar.LWPCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open("http://www.baidu.com")
cookie.save(ignore_discard=True,ignore_expires=True)

import http.cookiejar,urllib.request
cookie = http.cookiejar.LWPCookieJar()
cookie.load('cookie.txt',ignore_discard=True,ignore_expires=True)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open("http://www.baidu.com")
print(response.read().decode('utf-8'))

异常处理

from urllib import request,error
try:response = request.urlopen('http://cuiqingcai.com/index.htm')
except error.URLError as e:print(e.reason)

from urllib import request,error
try:response = request.urlopen('http://cuiqingcai.com/index.htm')
except error.HTTPError as e:print(e.reason,e.code,e.headers,sep="\n")
except error.URLError as e:print(e.reason)
else:print("Request Successfully")

import socket
import urllib.request
import urllib.error
try:response = urllib.request.urlopen('http://www.baiduc.com',timeout=0.01)
except urllib.error.URLError as e:print(type(e.reason))if isinstance(e.reason,socket.timeout):print("TIME OUT")

URL解析

urlparse

urllib.parse.urlparse(urlstring,scheme='',allow_fragments=True)

from urllib.parse import urlparse
result = urlparse('http://www.baidu.com/index.html;urser?id=5#comment')
print(type(result),result)

from urllib.parse import urlparse
result = urlparse('www.baidu.com/index.html;user?id=5#comment,scheme="https"')
print(result)

from urllib.parse import urlparse
result = urlparse('http://www.baidu.com/index.html;user?id=5#comment,scheme="https"')
print(result)

from urllib.parse import urlparse
result = urlparse('http://www.baidu.com/index.html;user?id=5#comment,allow_fragments=False')
print(result)

from urllib.parse import urlparse
result = urlparse('http://www.baidu.com/index.html#comment',allow_fragments=False)
print(result)

utlunoarse

from urllib.parse import urlunparse
data =['http','www.baidu.com','index.html','user','a=6','comment']
print(urlunparse(data))

urljoin

from urllib.parse import urljoin
print(urljoin('http://www.baidu.com','?category=2#comment'))
print(urljoin('htttp://www.baidu.com/about.html','htttp://www.baidu.com/FAQ.html'))

urlencode

from urllib.parse import urlencode
params = {'name':'puqunzhu','age':23
}
base_url="http://www.baidu.com?"
url=base_url + urlencode(params)
print(url)

转载于:https://www.cnblogs.com/puqunzhu/p/9803769.html

Python Urllib库详解相关推荐

python爬虫之urllib库详解
python爬虫之urllib库详解前言一.urllib库是什么? 二.urllib库的使用 urllib.request模块 urllib.parse模块利用try-except,进行超时处理 ...
爬虫入门之urllib库详解(二)
爬虫入门之urllib库详解(二) 1 urllib模块 urllib模块是一个运用于URL的包 urllib.request用于访问和读取URLS urllib.error包括了所有urllib.r ...
python re库详解(正则表达式）
python re库详解(正则表达式) 说明则表达式(英文名称:regular expression,regex,RE)是用来简洁表达一组字符串特征的表达式.最主要应用在字符串匹配中. 1).re ...
python解析库详解_PyQuery库详解
通过这篇文章为大家介绍崔庆才老师对Python爬虫PyQuery库的讲解,包括基本原理及其理论知识点本文代码较多,建议阅读时间10分钟,并且注重理论与实践相结合觉得文章比较枯燥和用电脑观看的可以点 ...
python requests库详解_python爬虫之路（一）-----requests库详解
requests库 requests库是python实现的最简单易用的http库. requests库的功能详解. 我们可以自然而然地想到这些方法其实就是http协议对资源的操作. 调用request ...
python爬虫知识点总结（三）urllib库详解
一.什么是Urllib? 官方学习文档:https://docs.python.org/3/library/urllib.html 廖雪峰的网站:https://www.liaoxuefeng.com ...
pythonurllib模块-urllib库详解 --Python3
相关:urllib是python内置的http请求库,本文介绍urllib三个模块:请求模块urllib.request.异常处理模块urllib.error.url解析模块urllib.parse. ...
爬虫笔记：Urllib库详解
如果不懂爬虫,请看链接爬虫第一课:爬虫的基本原理什么是Urllib Urllib是python 内置的http的请求库.包含四个模块 urllib.request 请求模块 urllib.erro ...
python pandas库详解_Pandas 库的详解和使用补充
pandas 库总体说明 Pandas 基亍 NumPy.SciPy 补充了大量数据操作功能,能实现统计.凾组.排序.透规表,可以代替 Excel 的绛大部凾功能. Pandas 主要有 2 种重要 ...

Python Urllib库详解

Urllib库详解

什么是Urllib?

Python内置的HTTP请求库

相比Python2变化

python2

python3

urllib

urlopen

响应

响应类型

状态码、响应头

Request

Handler

代理

Cookie

异常处理

URL解析

urlparse

utlunoarse

urljoin

urlencode

Python Urllib库详解相关推荐

最新文章

热门文章