Python-urllib2的使用

最近学习写Python爬虫的实战，会经常用到urllib2库中的函数。因此转载自http://www.cnblogs.com/youxin/archive/2013/05/07/3064434.html。

urllib2是一个类似curl的Python扩展，默认已经安装。官网：http://docs.python.org/2/library/urllib2.html

The urllib2 module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.

urllib2.urlopen(url[, data][, timeout])

Open the URL url, which can be either a string or a Request object.

This function returns a file-like object with two additional methods。

class urllib2.Request(url[, data][, headers][, origin_req_host][, unverifiable])¶

This class is an abstraction of a URL request.

使用如下：

import urllib2
req=urllib2.Request("http://www.baidu.com") //urllib2.Request(url)
response=urllib2.urlopen(req)
html=response.read()

或者

import urllib2
html = urllib2.urlopen('http://piratebay.se/browse/200').read()

Post 传参

import urllib
import urllib2url='http://localhost/php/GetPost.php'dataArr={'name':'jack','password':'pass'}
data=urllib.urlencode(dataArr)#encode a sequence of two tuple or dictreq=urllib2.Request(url,data)
res=urllib2.urlopen(req)
html=res.read()

Get传参

import urllib
import urllib2url='http://localhost/php/GetPost.php'
dataArr={'name':'jack','password':'pass'}
data=urllib.urlencode(dataArr)#encode a sequence of two tuple or dictfull_url = url + '?' + datareq=urllib2.Request(full_url)
res=urllib2.urlopen(req)
html=res.read()

设置Header到Http请求

有一些站点不喜欢被非人为地访问，或者发送不同的版本给不同的浏览器。

默认的urllib2把自己作为‘Python-urllib/x.y’如'‘Python-urllib./2.7'

这身份可能会使站点身份迷惑，或者干脆不工作。

浏览器确认自己身份是通过User-Agent头（http://baike.baidu.com/view/3398471.htm），当你创建了一个请求对象，你可以给他一个包含头数据的字典。

下面的例子发送跟上面一样的内容，但把自身模拟成Internet Explorer。

import urllib
import urllib2  url = 'http://www.someserver.com/cgi-bin/register.cgi'
<strong>
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' </strong>
values = {'name' : 'WHY',  'location' : 'SDU',  'language' : 'Python' }  headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

Python-urllib2的使用相关推荐

python urllib2及beautifulsoup学习
1.python urllib2爬虫下载网页的三种方法 #-*-coding:utf-8 -*- import urllib2 import cookieliburl = "http://w ...
Python urllib2和urllib的使用
Python urllib2和urllib的使用在Python中有很多库可以用来模拟浏览器发送请求抓取网页,本文中介绍使用urllib2来实现获取网页数据. urllib2是在Python2标准库中 ...
python urllib2 （转）
python urllib2 (转) hooky_20 加博友关注他他的网易微博最新日志国内最大的免费空间站,国内基于嵌套模型的无限级分类虚拟机VMware tools安装[转 d ...
python urllib2.request_Python自动化测试（九）urllib2 发送HTTP Request
urllib2 是Python自带的标准模块, 用来发送HTTP Request的. 类似于 .NET中的, HttpWebRequest类 urllib2 的优点 Python urllib2 ...
Python:urllib2模块Handler处理器和自定义Opener
Handler处理器和自定义Opener opener是 urllib2.OpenerDirector 的实例,我们之前一直都在使用的urlopen,它是一个特殊的opener(也就是模块帮我们构 ...
python urllib2 开启调试
2019独角兽企业重金招聘Python工程师标准>>> 发一段在网上看见. USING HTTPLIB.HTTPCONNECTION.SET_DEBUGLEVEL() WITH UR ...
python2 urllib2,Python urllib2保持活着
How can I make a "keep alive" HTTP request using Python's urllib2? 解决方案 Use the urlgrabber ...
使用 python urllib2 抓取网页时出现乱码的解决方案
这里记录的是一个门外汉解决使用 urllib2 抓取网页时遇到乱码.崩溃.求助.解决和涨经验的过程.这类问题,事后看来只是个极小极小的坑,不过竟然花去很多时间,也值得记录一下.过程如下: 目标: 抓取 ...
python urllib2 下载文件_urllib2下载文件
下面是编程之家 jb51.cc 通过网络收集整理的代码片段. 编程之家小编现在分享给大家,也给大家做个参考. #!/usr/bin/python # coding=utf8 import os imp ...
Python urllib2
Python 标准库 urllib2 的使用细节 Python 标准库中有很多实用的工具类,但是在具体使用时,标准库文档上对使用细节描述的并不清楚,比如 urllib2 这个 HTTP 客户端库.这里 ...

Python-urllib2的使用

Python-urllib2的使用相关推荐

最新文章

热门文章