python爬虫抓取,免费高匿快代理 IP

直接上代码了,没什么解释的每步都有注释的复制粘贴就可以用了.不能运行直接回复"代码" 给你源码

import requests
from lxml import etree
import jsonclass XiciProxiesSpider(object):def __init__(self):self.num = 1self.start_url = 'https://www.kuaidaili.com/free/inha/{}'.format(self.num)self.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36'}def get_page_from_url(self, url):response = requests.get(url, headers=self.headers)return response.content.decode()def get_data_from_page(self, page):# 把page转换为Element对象html = etree.HTML(page)# 获取包含代理信息的tr列表trs = html.xpath('//tbody//tr')# 遍历trs, 获取数据信息data = {'http': [],# 'https': []}for tr in trs:try:ip = tr.xpath('./td[1]/text()')[0]  # IP地址port = tr.xpath('./td[2]/text()')[0]  # 端口ip_type = tr.xpath('./td[4]/text()')[0].lower()  # 类型 以及大小写转换# 如果ip不是http或https直接返回if ip_type not in data.keys():return# 构建代理数据item = {ip_type: '{}:{}'.format(ip, port)}# 检查代理IP是否可用, 如果可用添加到列表中if self.validate_ip(item, ip_type):data[ip_type].append(item)except Exception as ex:print(ex)print(etree.tostring(tr))print("222",data)return datadef validate_ip(self, item, ip_type):try:test_url = "{}://blog.csdn.net/weixin_43407092/article/details/89743502".format(ip_type)response = requests.get(test_url, proxies=item, timeout=2)if response.status_code == 200:return Truereturn Falseexcept Exception as ex:return Falsedef save_data(self, data):with open('快代理.txt', 'a') as f:json.dump(data, f, indent=2)self.num += 1def run(self):while True:# 获取页面内宽容page = self.get_page_from_url(self.start_url)# 获取可用代理IPdata = self.get_data_from_page(page)# 保存数据self.save_data(data)if __name__ == '__main__':fps = XiciProxiesSpider()fps.run()

执行结果如下,有用的代理不多.

python爬虫抓取,免费高匿快代理 IP相关推荐

用Python爬虫抓取免费代理IP
点击上方"程序员大咖",选择"置顶公众号" 关键时刻,第一时间送达! 不知道大家有没有遇到过"访问频率太高"这样的网站提示,我们需要等待一段 ...
python爬虫抓取百度图片_Python爬虫抓取百度的高清摄影图片
成果预览: 源代码: import requests import re url = 'https://image.baidu.com/search/index' headers = { 'User- ...
如何用python爬股票数据_python爬虫股票数据,如何用python 爬虫抓取金融数据
Q1:如何用python 爬虫抓取金融数据获取数据是数据分析中必不可少的一部分,而网络爬虫是是获取数据的一个重要渠道之一.鉴于此,我拾起了Python这把利器,开启了网络爬虫之路. 本篇使用的版本为 ...
Python学习教程：Python爬虫抓取技术的门道
Python学习教程:Python爬虫抓取技术的门道 web是一个开放的平台,这也奠定了web从90年代初诞生直至今日将近30年来蓬勃的发展.然而,正所谓成也萧何败也萧何,开放的特性.搜索引擎以及简单 ...
Python爬虫抓取考试试题
Python爬虫抓取考试试题今天做了个小玩意,但觉得挺有意思的,分享给大家.主要是这样的,因为帮妹子寻找考试资料,发现同一本书不同的章节分别在不同的链接中,复制起来实在要命,所以就在想能不能用爬虫实 ...
Python爬虫抓取东方财富网股票数据并实现MySQL数据库存储
前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理. PS:如有需要Python学习资料的小伙伴可以加点击下方链接自行获取 python免费学习资 ...
如何使用Python爬虫抓取数据？
Python爬虫应用十分广泛,无论是各类搜索引擎,还是日常数据采集,都需要爬虫的参与.其实爬虫的基本原理很简单,今天小编就教大家如何使用Python爬虫抓取数据,感兴趣的小伙伴赶紧看下去吧! 工具安装 ...
python爬虫抓取网站技巧总结
不知道为啥要说是黑幕了??哈哈哈-..以后再理解吧 python爬虫抓取网站的一些总结技巧学用python也有3个多月了,用得最多的还是各类爬虫脚本:写过抓代理本机验证的脚本,写过在discuz论坛 ...
python 爬虫抓取心得分享
/** author: insun title:python 爬虫抓取心得分享 blog:http://yxmhero1989.blog.163.com/blog/static/11215795620 ...
Python爬虫抓取论文引用量
Python爬虫抓取论文引用量目录 Python爬虫抓取论文引用量 1 平台情况介绍 2 爬虫抓取引用量 2.1 正则表达式匹配 2.2 循环获取数据 2.3 数据保存 3 完整代码 1 平台情况介 ...

python爬虫抓取,免费高匿快代理 IP

python爬虫抓取,免费高匿快代理 IP相关推荐

最新文章

热门文章