python爬取快代理IP并测试IP的可用性

用到的网站https://www.kuaidaili.com/，免费的IP很不稳定，随时会挂，有需求的还是购买付费IP比较稳

import requests
from urllib import parse
from bs4 import BeautifulSoupheaders={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}session=requests.session()
session.headers=headers# 获取IP地址页面
def getIP(url):html=requests.get(url)# print(html.status_code)# print(html.text)if html.status_code==200:html.encoding=html.apparent_encodingsoup=BeautifulSoup(html.text,'lxml')trs=soup.select('tbody tr')for tr in trs:ip=tr.select('td')[0].textport=tr.select('td')[1].texttype=tr.select('td')[3].textTestIP(ip,port,type)# 测试IP代理的可用性，并将可用IP写入文件
def TestIP(ip,port,type):url="https://www.baidu.com"proxies={'http':'{}://{}:{}'.format(type,ip,port),'https':'{}://{}:{}'.format(type,ip,port)}# print(proxies)try:re=session.get(url,proxies=proxies,timeout=2,verify=False)print(re.status_code)print("可用IP为{}://{}:{}".format(type,ip,port))f.write("{}://{}:{}\n".format(type,ip,port))    # 写入文件except:print("不可用IP为{}://{}:{}".format(type,ip,port))if __name__ == '__main__':t=input("请输入要爬取的页数，每页15个：")url1="https://www.kuaidaili.com/free/intr/"with open('IP代理.txt','w') as f:for i in range(int(t)):url=parse.urljoin(url1,str(i+1))print(url)getIP(url)

python爬取快代理IP并测试IP的可用性相关推荐

Python爬取快代理
前天,本人在爬取某网站时,第一次遇到IP被封的情况,等了几个小时之后,还是不行.最后,迫于无奈,还是请出了大招,使用代理IP.今天,闲来无事,本人爬取了快代理网站上 5 万多条免费高匿名代理IP. 首 ...
爬虫爬取快代理网站动态IP
爬虫爬取快代理网站动态IP import requests, time from lxml import etree import time import randomcookie = "& ...
Scrapy-Redis 爬取快代理免费
前面写过使用scrapy爬取快代理的免费ip 接下来使用的是基于Redis的分布式scrapy爬取快代理免费ip 1.准备好Redis 如何安装和使用Redis这里就不做介绍了,没有安装的可以参考我之 ...
python爬取国内代理ip_【python】国内高匿代理爬取,并验证代理ip有效性
运行环境:python 3.7.3 所需库: 1. requests 2. lxml 3. time 4. multiprocessing 5. sys 目的:构建自己的代理ip池,针对封ip型反爬虫 ...
Python 爬取可用代理 IP
2019独角兽企业重金招聘Python工程师标准>>> 通常情况下爬虫超过一定频率或次数,对应的公网 IP 会被封掉,为了能稳定爬取大量数据,我们一般从淘宝购买大量代理ip,一般 1 ...
python爬取国内代理ip_Python语言爬取代理IP
本文主要向大家介绍了Python语言爬取代理IP,通过具体的内容向大家展示,希望对大家学习Python语言有所帮助. #!/usr/bin/env python #-*-coding=utf-8 -* ...
Python 抓取快代理、西刺代理、西拉代理等等构建免费代理池
import reimport requests from lxml import etreeheaders = {"User-Agent": "Mozilla/5.0 ...
Python爬虫实战013：Python爬取免费代理ip
import requests import time import random from lxml import etree from fake_useragent import UserAgen ...
python爬取高匿代理IP（再也不用担心会进小黑屋了）
一起进步为什么要用代理IP 很多数据网站,对于反爬虫都做了一定的限制,这个如果写过一些爬虫程序的小伙伴应该都深有体会,其实主要还是IP进了小黑屋了,那么为了安全,就不能使用自己的实际IP去爬取人家网 ...

python爬取快代理IP并测试IP的可用性

python爬取快代理IP并测试IP的可用性相关推荐

最新文章

热门文章