python 抓取快代理- 国内高匿代理 IP

简介

简介

抓取快代理-国内高匿代理；
默认抓取所有的数据，从第一页到最后一页；
抓取关键词： ip port；
时间间隔默认：2s，时间太短会导致抓取失败；
保存形式：text文本

创建kuaiDaiLiHidden.py文件

#! /usr/bin/env python
# -*- coding: utf-8 -*-
import os
import time
import requests
from bs4 import BeautifulSoupclass KuaiDaiLiHidden(object):def __init__(self):self.session = requests.session()self.proxies = Noneself.timeout = 10self.time_interval = 2self.headers = {"Accept": "text/html,application/xhtml+xml,""application/xml;q=0.9,image/webp,*/*;q=0.8","Accept-Encoding": "gzip, deflate, sdch, br","Accept-Language": "zh-CN,zh;q=0.8","Connection": "Keep-Alive","User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) ""AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/55.0.2883.87 Safari/537.36",}def get_status(self, url):"""获取状态:param url: 访问地址:return: 返回response或False"""response = self.session.get(url=url,headers=self.headers,proxies=self.proxies,timeout=self.timeout,# verify=False,# allow_redirects=False)if response.status_code == 200:return responseelse:print("ERROR: 网络连接失败！ status: %s url: %s" % (response.status_code, url))return Falsedef get_last_page(self, url):"""获取最后一页page:param url: 第一页的url:return: 返回int(last_page)或None"""response = self.get_status(url)if not response:return Nonehtml = response.textsoup = BeautifulSoup(html, "html5lib")lis = soup.select("#listnav > ul > li")if lis[-1].text == "页":last_page = lis[-2].find("a").textreturn int(last_page)return Nonedef get_index(self, url):"""访问首页，建立连接:param url::return:"""response = self.get_status(url)if response:# response.encoding = "utf-8"# html = response.text# print(html)print("首页,建立连接...")return Trueelse:print("ERROR: 首页访问失败！")return Falsedef parse_html(self, url):response = self.get_status(url)if not response:return Nonehtml = response.textsoup = BeautifulSoup(html, 'html.parser')items = soup.find(id="list").find("tbody").find_all("tr")ip_port_list = []for item in items:ip = item.find(attrs={"data-title": "IP"}).textport = item.find(attrs={"data-title": "PORT"}).textip_port = ip + ":" + port + "\n"ip_port_list.append(ip_port)return ip_port_list@staticmethoddef write_to_text(path, content):path = os.path.abspath(path)with open(path, 'a+', encoding='utf-8') as f:f.writelines(content)def next_page(self, last_page):for i in range(1, last_page + 1):time.sleep(self.time_interval)url = "https://www.kuaidaili.com/free/inha/{i}".format(i=i)print(url)ip_port_list = self.parse_html(url)path = os.path.join(os.getcwd(), "IP.txt")self.write_to_text(path, ip_port_list)def main(self):url = "https://www.kuaidaili.com"if not self.get_index(url):return Nonetime.sleep(self.time_interval)url = "https://www.kuaidaili.com/free/inha/1"last_page = self.get_last_page(url)if not last_page:return Noneself.next_page(last_page)if __name__ == '__main__':kuai_dai_li = KuaiDaiLiHidden()kuai_dai_li.main()

python 抓取快代理- 国内高匿代理 IP相关推荐

Python爬取西刺国内高匿代理ip并验证
1.抓取ip存入文件首先,我们访问西刺首页 http://www.xicidaili.com/,并点击国内高匿代理,如下图: 按 F12 检查网页元素或者 ctrl+u查看网页源代码: 我们需要提取 ...
第2.1章 scrapy之国内高匿代理IP爬取
这个网站较为简单,故作为爬虫的第一个示例代码如下: # -*- coding: utf-8 -*- ''' Created on 2017年6月12日从国内高匿代理IP网站中获取动态ip信息 @s ...
西刺代理有效高匿代理爬取demo
1. 爬取西刺代理网站的国内高匿代理的IP地址和端口 2. 使用随机用户代理生成器高匿代理:服务器只能发现代理的地址,但是发现不了你真实的IP地址起始网页:https://www.xicidail ...
2019年9月-最新2000个国内高匿代理ip
最新2000个国内高匿代理ip 210.22.176.146:32153 125.123.123.218:9999 27.152.90.198:9999 171.13.137.81:9999 27.4 ...
爬虫ip代理对高匿代理ip的重要性
对于爬虫来说,在听到ip代理的时候,听得最多的莫过于透明代理.匿名代理.高匿代理这几个词了,那么分别是什么意思呢?互相之间有什么样的区别呢? 为什么说爬虫ip代理一定要使用高匿代理呢? 带着这些问题, ...
爬虫ip代理对高匿代理ip的必要性
对于爬虫来说,在听到ip代理的时候,听得最多的莫过于透明代理.匿名代理.高匿代理这几个词了,那么分别是什么意思呢?互相之间有什么样的区别呢? 为什么说爬虫ip代理一定要使用高匿代理呢? 带着这些问题, ...
python爬取国内代理ip_【python】国内高匿代理爬取,并验证代理ip有效性
运行环境:python 3.7.3 所需库: 1. requests 2. lxml 3. time 4. multiprocessing 5. sys 目的:构建自己的代理ip池,针对封ip型反爬虫 ...
透明代理、匿名代理、混淆代理、高匿代理有什么区别？
区别这4种代理,主要是在代理服务器端的配置不同,导致其向目标地址发送请求时,REMOTE_ADDR, HTTP_VIA,HTTP_X_FORWARDED_FOR三个变量不同. 1.透明代理(Tran ...
Python 抓取快代理、西刺代理、西拉代理等等构建免费代理池
import reimport requests from lxml import etreeheaders = {"User-Agent": "Mozilla/5.0 ...
爬取国内高匿代理，并验证每个代理是否可用
目标网站https://www.xicidaili.com | 一.建立项目 scrapy startproject proxy_example cd scrapy genspider XiciSpi ...

python 抓取快代理- 国内高匿代理 IP

python 抓取快代理- 国内高匿代理 IP

简介

python 抓取快代理- 国内高匿代理 IP相关推荐

最新文章

热门文章

python 抓取 快代理- 国内高匿代理 IP

python 抓取 快代理- 国内高匿代理 IP

简介

python 抓取 快代理- 国内高匿代理 IP相关推荐

最新文章

热门文章

python 抓取快代理- 国内高匿代理 IP

python 抓取快代理- 国内高匿代理 IP

python 抓取快代理- 国内高匿代理 IP相关推荐