[Python] 从ip138网站爬取ip所处地点

1. 首先从纯真ip下载最新ip数据，地址：http://www.cz88.net/，数据格式是这样的

0.0.0.0 0.255.255.255 IANA保留地址 CZ88.NET
1.0.0.0 1.0.0.255 澳大利亚 CZ88.NET
1.0.1.0 1.0.3.255 福建省 电信
1.0.4.0 1.0.7.255 澳大利亚 CZ88.NET

2. 根据ip爬取该ip所处地点

import urllib.parse
import urllib.request
from html.parser import HTMLParser
import time''' global constants
'''
START = ''
END = ''
LOCATION = ''
NET = ''class MyHTMLParser(HTMLParser):def __init__(self):HTMLParser.__init__(self)self.region = ''def handle_data(self, data):if data[0:5] == '本站主数据':#print(data[6:])self.region = data[6:]''' remove the invalid space
'''
def remove_invalid_space(line):return line.split()''' format one line
'''
def format_one_line(line):if len(line) == 6:net = line[-3] + line[-2] + line[-1]elif len(line) == 5:net = line[-2] + line[-1]else:net = line[-1]line_format = [line[0], line[1] , line[2], net]return line_format''' get location from ip
'''
def get_location_from_ip(line):url = 'http://www.ip138.com/ips1388.asp'data = {'ip': line[0],'action': '2'}params = urllib.parse.urlencode(data)full_url = url + '?' + paramsresponse = urllib.request.urlopen(full_url)html = response.read().decode('GBK')parser = MyHTMLParser()parser.feed(html)parser.close()region = remove_invalid_space(parser.region)if len(region) == 1:location = region[0]net = ''else:location = region[0]net = region[-1]line_format = [line[0], line[1], location, net]global LOCATIONglobal NETglobal STARTglobal ENDif LOCATION == location and NET == net:line_format_over_write = START + ' ' + str(line_format[1]) + ' ' + location + ' ' + netover_write_tmp_file(line_format_over_write)else:write_to_tmp_file(str(line_format[0]) + ' ' + str(line_format[1]) + ' ' + str(line_format[2]) + ' ' + str(line_format[3]))START = line[0]END = line[1]LOCATION = locationNET = net''' write to tmp file
'''
def write_to_tmp_file(line):try:file = open('ip_tmp.txt', 'a')file.write(line + '\n')except FileNotFoundError:print('file not found')finally:if 'file' in locals():file.close()''' over write tmp file
'''
def over_write_tmp_file(line):try:file = open('ip_tmp.txt')lines = file.readlines()curr = lines[:-1]except FileNotFoundError:print('file not found')finally:if 'file' in locals():file.close()try:file = open('ip_tmp.txt', 'w')curr.append(line + '\n')file.writelines(curr)except FileNotFoundError:print('file not found')finally:if 'file' in locals():file.close()def format_ip_file(path):try:file = open(path)for line in file:# main logic of get location from ipget_location_from_ip(format_one_line(remove_invalid_space(line)))time.sleep(0.1)except FileNotFoundError:print('file not found')finally:if 'file' in locals():file.close()print('start')
format_ip_file('D:\workspace\Python\ip\ip.txt')
print('end', end = '')

3. 最好设置延时，要不然搞崩溃ip138

[Python] 从ip138网站爬取ip所处地点相关推荐

python自动登录网站爬取数据_Python爬虫实战：自动化登录网站，爬取商品数据
前言随着互联网时代的到来,人们更加倾向于互联网购物.某东又是电商行业的巨头,在某东平台中有很多商家数据.今天带大家使用python+selenium工具获取这些公开的商家数据适合阅读人群:sele ...
python爬虫实例之爬取智联招聘数据
这是作者的处女作,轻点喷.... 实习在公司时领导要求学习python,python的爬虫作为入门来说是十分友好的,话不多说,开始进入正题. 主要是爬去智联的岗位信息进行对比分析出java和pytho ...
Python爬虫之利用xpath爬取ip代理网站的代理ip
爬虫工具 python3 pycharm edge/chrome requests库的用法 requests库是python中简单易用的HTTP库用命令行安装第三方库 pip install req ...
python爬虫实战：爬取西刺代理网站，获取免费的代理IP
爬取的网站链接:西刺网站 import requests import chardet import random import time from bs4 import BeautifulSoup ...
[Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息（四） —— 应对反爬技术（选取 User-Agent、添加 IP代理池以及Cookies池）
上一篇:[Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(三) -- 数据的持久化--使用MongoDB存储爬取的数据最近项目有些忙,很多需求紧急上线,所以一直没能完善< 使用 ...
Python之 - 使用Scrapy建立一个网站抓取器，网站爬取Scrapy爬虫教程
Scrapy是一个用于爬行网站以及在数据挖掘.信息处理和历史档案等大量应用范围内抽取结构化数据的应用程序框架,广泛用于工业. 在本文中我们将建立一个从Hacker News爬取数据的爬虫,并将数据按我 ...
python——图片爬虫：爬取爱女神网站(www.znzhi.net)上的妹子图进阶篇
在上一篇博客中:python--图片爬虫:爬取爱女神网站(www.znzhi.net)上的妹子图基础篇我讲解了图片爬虫的基本步骤,并实现了爬虫代码在本篇中,我将带领大家对基础篇中的代码进行改善, ...
python多线程爬取多个网址_【Python爬虫】多线程爬取斗图网站（皮皮虾，我们上车）...
原标题:[Python爬虫]多线程爬取斗图网站(皮皮虾,我们上车) 斗图我不怕没有斗图库的程序猿是无助,每次在群里斗图都以惨败而告终,为了能让自己在斗图界立于不败之地,特意去网上爬取了斗图包.在这里 ...
Python的Scrapy框架爬取诗词网站爱情诗送给女友
文章目录前言效果展示: 一.安装scrapy库二.创建scrapy项目三.新建爬虫文件scmg_spider.py 四.配置settings.py文件五.定义数据容器,修改item.py文件 ...
Python 爬虫复习之爬取笔趣阁小说网站（不用正则）
前言小说网站-笔趣阁:URL:https://www.qu.la/ 笔趣阁是一个盗版小说网站,这里有很多起点中文网的小说,该网站小说的更新速度稍滞后于起点中文网正版小说的更新速度.并且该网站只支持在 ...

[Python] 从ip138网站爬取ip所处地点

[Python] 从ip138网站爬取ip所处地点相关推荐

最新文章

热门文章