在Python中最快的HTTP GET方法是什么？

如果我知道内容将是字符串，那么用Python进行HTTP GET的最快方法是什么？我正在搜索文档，以查找像以下这样的快速单行代码：

contents = url.get("http://example.com/foo/bar")

但是我可以使用Google找到的只是httplib和urllib我无法在这些库中找到快捷方式。

标准Python 2.5是否具有上述某种形式的快捷方式，还是应该编写url_get函数？

我宁愿不要捕获脱壳输出到wget或curl的输出。

#1楼

您可以使用一个称为request的库。

import requests
r = requests.get("http://example.com/foo/bar")

这很容易。然后您可以这样做：

>>> print(r.status_code)
>>> print(r.headers)
>>> print(r.content)

#2楼

theller的wget解决方案确实很有用，但是，我发现它无法在整个下载过程中打印出进度。如果在reporthook中的print语句后添加一行，那是完美的。

import sys, urllibdef reporthook(a, b, c):print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),sys.stdout.flush()
for url in sys.argv[1:]:i = url.rfind("/")file = url[i+1:]print url, "->", fileurllib.urlretrieve(url, file, reporthook)
print

#3楼

如果您专门使用HTTP API，那么还有更方便的选择，例如Nap 。

例如，以下是自2014年5月1日起从Github获取要点的方法：

from nap.url import Url
api = Url('https://api.github.com')gists = api.join('gists')
response = gists.get(params={'since': '2014-05-01T00:00:00Z'})
print(response.json())

更多示例： https : //github.com/kimmobrunfeldt/nap#examples

#4楼

出色的解决方案轩，塞勒。

为了使其与python 3配合使用，请进行以下更改

import sys, urllib.requestdef reporthook(a, b, c):print ("% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c))sys.stdout.flush()
for url in sys.argv[1:]:i = url.rfind("/")file = url[i+1:]print (url, "->", file)urllib.request.urlretrieve(url, file, reporthook)
print

另外，您输入的URL之前应带有“ http：//”，否则将返回未知的URL类型错误。

#5楼

无需其他必要的导入，此解决方案（对我而言）有效-也适用于https：

try:import urllib2 as urlreq # Python 2.x
except:import urllib.request as urlreq # Python 3.x
req = urlreq.Request("http://example.com/foo/bar")
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36')
urlreq.urlopen(req).read()

在标头信息中未指定“ User-Agent”时，通常很难抓住内容。然后通常使用类似以下内容来取消请求： urllib2.HTTPError: HTTP Error 403: Forbidden或urllib.error.HTTPError: HTTP Error 403: Forbidden 。

#6楼

如何发送标头

Python 3：

import urllib.request
contents = urllib.request.urlopen(urllib.request.Request("https://api.github.com/repos/cirosantilli/linux-kernel-module-cheat/releases/latest",headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)

Python 2：

import urllib2
contents = urllib2.urlopen(urllib2.Request("https://api.github.com",headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)

#7楼

使用urllib3很简单。

像这样导入它：

import urllib3pool_manager = urllib3.PoolManager()

并发出这样的请求：

example_request = pool_manager.request("GET", "https://example.com")print(example_request.data.decode("utf-8")) # Response text.
print(example_request.status) # Status code.
print(example_request.headers["Content-Type"]) # Content type.

您也可以添加标题：

example_request = pool_manager.request("GET", "https://example.com", headers = {"Header1": "value1","Header2": "value2"
})

#8楼

Python 3：

import urllib.request
contents = urllib.request.urlopen("http://example.com/foo/bar").read()

Python 2：

import urllib2
contents = urllib2.urlopen("http://example.com/foo/bar").read()

urllib.request和read文档。

#9楼

看一下httplib2 ，它旁边有许多非常有用的功能，可提供所需的功能。

import httplib2resp, content = httplib2.Http().request("http://example.com/foo/bar")

其中content是响应主体（作为字符串），而resp将包含状态和响应标头。

虽然它不包含在标准python安装中（但仅需要标准python），但是绝对值得一试。

#10楼

如果您希望使用httplib2的解决方案成为一体，请考虑实例化匿名Http对象。

import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")

#11楼

这是Python中的wget脚本：

# From python cookbook, 2nd edition, page 487
import sys, urllibdef reporthook(a, b, c):print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
for url in sys.argv[1:]:i = url.rfind("/")file = url[i+1:]print url, "->", fileurllib.urlretrieve(url, file, reporthook)
print