爬虫利器：Python获取免费IP代理

由于现在很多网站都有反爬虫机制，同一个ip不能频繁访问同一个网站，这就使得我们在进行大量数据爬取时需要使用代理进行伪装，本博客给出几个免费ip代理获取网站爬取ip代理的代码，可以嵌入到不同的爬虫程序中去，已经亲自测试有用。需要的可以拿去使用（本人也是参考其他人爬虫程序实现的，但是忘记原地址了）。

# coding=utf-8
import urllib2
import reproxy_list = []
total_proxy = 0def get_proxy_ip():global proxy_listglobal total_proxyrequest_list = []headers = {'Host': 'www.xicidaili.com','User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)','Accept': r'application/json, text/javascript, */*; q=0.01','Referer': r'http://www.xicidaili.com/',}for i in range(3, 11):request_item = "http://www.xicidaili.com/nn/" + str(i)request_list.append(request_item)for req_id in request_list:req = urllib2.Request(req_id, headers=headers)response = urllib2.urlopen(req)html = response.read().decode('utf-8')ip_list = re.findall(r'\d+\.\d+\.\d+\.\d+', html)port_list = re.findall(r'<td>\d+</td>', html)for i in range(len(ip_list)):total_proxy += 1ip = ip_list[i]port = re.sub(r'<td>|</td>', '', port_list[i])proxy = '%s:%s' % (ip, port)proxy_list.append(proxy)return proxy_listdef get_proxy_ip1():global proxy_listglobal total_proxyrequest_list = []headers = {'Host': 'www.kuaidaili.com','User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)','Accept': r'application/json, text/javascript, */*; q=0.01','Referer': r'www.kuaidaili.com/',}for i in range(1, 10):request_item = "https://www.kuaidaili.com/free/inha/" + str(i)+"/"request_list.append(request_item)for req_id in request_list:req = urllib2.Request(req_id, headers=headers)response = urllib2.urlopen(req)html = response.read().decode('utf-8')ip_list = re.findall(r'\d+\.\d+\.\d+\.\d+', html)port_list = re.findall(r'<td data-title="PORT">\d+</td>', html)for i in range(len(ip_list)):total_proxy += 1ip = ip_list[i]port = re.findall(r'\d+',  port_list[i])[0]proxy = '%s:%s' % (ip, port)proxy_list.append(proxy)return proxy_listdef get_proxy_ip2():global proxy_listglobal total_proxyrequest_list = []headers = {'Host': 'www.ip3366.net','User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)','Accept': r'application/json, text/javascript, */*; q=0.01','Referer': r'www.ip3366.net/',}for i in range(1, 10):request_item = "http://www.ip3366.net/?stype=1&page=" + str(i)request_list.append(request_item)for req_id in request_list:req = urllib2.Request(req_id, headers=headers)response = urllib2.urlopen(req)html = response.read()ip_list = re.findall(r'\d+\.\d+\.\d+\.\d+', html)port_list = re.findall(r'<td>\d+</td>', html)for i in range(len(ip_list)):total_proxy += 1ip = ip_list[i]port = re.sub(r'<td>|</td>', '', port_list[i])proxy = '%s:%s' % (ip, port)proxy_list.append(proxy)return proxy_listif __name__=="__main__":get_proxy_ip()# get_proxy_ip1()get_proxy_ip2()print("获取ip数量为：" + total_proxy)

获取结果：

下面给一个简单的使用代理访问网站的代码：

    proxy_ip = random.choice(proxy_list)user_agent = random.choice(user_agent_list)print proxy_ipprint user_agentproxy_support = urllib2.ProxyHandler({'http': proxy_ip})opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler)urllib2.install_opener(opener)req = urllib2.Request(url)req.add_header("User-Agent", user_agent)c = urllib2.urlopen(req, timeout=10)

爬虫利器：Python获取免费IP代理相关推荐

Python创建免费Ip代理池，伪装Ip。
Python创建免费Ip代理池主要使用requests第三方库.欸嘿,有了这个,就不用花钱买Ip了,生活小妙招.妙哇. 一.具体思路 1.利用requests爬取免费代理Ip的网页 2.存储列表后, ...
利用爬虫获取免费IP代理
项目目标通过爬虫获取"西拉代理"(http://www.xiladaili.com)上的高匿代理,并储存至一个列表. 项目分析首先对网页进行观察,主体内容如下图所示. 不但指明 ...
Python每日一练(24)-requests 模块获取免费的代理并检测代理 IP 是否有效
目录 1. 通过代理服务发送请求 2. 获取免费的代理 IP 3. 检测代理 IP 是否有效 1. 通过代理服务发送请求在爬取网页的过程中,经常会出现不久前可以爬取的网页现在无法爬取的情况,这是因为 ...
免费IP代理池定时维护，封装通用爬虫工具类每次随机更新IP代理池跟UserAgent池，并制作简易流量爬虫...
前言我们之前的爬虫都是模拟成浏览器后直接爬取,并没有动态设置IP代理以及UserAgent标识,这样很容易被服务器封IP,因此需要设置IP代理,但又不想花钱买,网上有免费IP代理,但大多都数都是不可 ...
【实用工具系列之爬虫】python实现爬取代理IP（防 ‘反爬虫’）
系列 [实用工具系列之爬虫]python实现爬取代理IP(防 '反爬虫') [实用工具系列之爬虫]python实现快速爬取财经资讯(防 '反爬虫') 本文使用python实现代理IP的爬取,并可以防' ...
Python 爬虫入门（二）—— IP代理使用 - 不剃头的一休哥 - 博客园
Python 爬虫入门(二)-- IP代理使用 - 不剃头的一休哥 - 博客园 Python 爬虫入门(二)-- IP代理使用 - 不剃头的一休哥 - 博客园 posted on 2016-01-26 ...
Python搭建自己[IP代理池]
IP代理是什么: ip就是访问网页数据服务器位置信息,每一个主机或者网络都有一个自己IP信息为什么要使用代理ip: 因为在向互联网发送请求中,网页端会识别客户端是真实用户还是爬虫程序,在今天以互联网 ...
使用免费ip代理进行投票
只要是投票系统,必然要限制一个用户投多张票. 如何限制呢?限制ip是最直观最简单的思路,可是代理池可以解决限制ip的情况. 如果投票页面前面加上一个验证码,那程序就会有点困难了. 有些投票使用微信号, ...
爬虫中的User-Agent和IP代理
爬虫中的User-Agent和IP代理一.User-Agent 按照百度百科的解释:User-Agent中文名为用户代理,简称 UA,它是一个特殊字符串头,使得服务器能够识别客户使用的操作系统及版本 ...

爬虫利器：Python获取免费IP代理

爬虫利器：Python获取免费IP代理相关推荐

最新文章

热门文章