python爬虫学习之爬取全国各省市县级城市邮政编码

实例需求：运用python语言在ip查询查ip 网站ip查询同ip网站查询 iP反查域名 iP查域名同ip域名网站爬取全国各个省市县级城市的邮政编码，并且保存在excel文件中

实例环境：python3.7
　　　　　　 requests库(内置的python库，无需手动安装)
　　　　　　 xlwt库(需要自己手动安装)

实例网站：

　　　第一步，在ip查询查ip 网站ip查询同ip网站查询 iP反查域名 iP查域名同ip域名网站通过查询源代码可以找到各个省份的链接

　　　第二步，点击链接，即可看到所点击省份的城市的邮政编码

实例代码：　　　　

import requests
import xlwt# 返回一个字典，键是各个省份的名字，值是对应省份的网址url
def getProvinceCode(url):response = requests.get(url)response.encoding = response.apparent_encodingcontent = response.textstart = content.find('<map name="map_86" id="map_86">') + len('<map name="map_86" id="map_86">') + len("\n")end = content.find('</map>')mapStr = content[start:end]#print(mapStr)lines = mapStr.split("\n")baseUrl = 'http://www.ip138.com/'city_urls = []city_name = []for line in lines:if line:index1 = line.find('href="/') + len('href="/')index2 = line.find('/"')code = line[index1:index2]url = baseUrl + codecity_urls.append(url)title1 = line.find('title="')+len('title="')title2 = line.find('"', title1)title = line[title1:title2]city_name.append(title)dict_prov_url = dict(zip(city_name,city_urls))for item in dict_prov_url.items():  # 显示各个省份名称和对应的urlprint(item)return dict_prov_url# 根据url得到省份的各个城市的城市名、邮政编码以及长途区号，返回一个二维的列表。
def getPostCode(url):response = requests.get(url)response.encoding = response.apparent_encodingcontent = response.textstart = content.find('长途区号</b></td></tr>') + len("长途区号</b></td></tr>")end = content.find('</table>', start)add_post = content[start:end]posts = add_post.strip().split('<tr bgcolor="#ffffff">')  # posts为每一个去掉<tr bgcolor="#ffffff">组成的列表code_list = []for post in posts:if post:lines = post.strip().split('<td')if len(lines) >= 2:if 'nbsp' in lines[4]:if len(lines) >= 6:if 'nbsp' in lines[5]:test = []city = lines[1][lines[1].find('>')+len('>'):lines[1].find('</')]post_code = lines[2][lines[2].find('">')+len('">'):lines[2].find('</')]area_code = lines[3][lines[3].find('">')+len('">'):lines[3].find('</')]test.append(city)test.append(post_code)test.append(area_code)code_list.append(test)else:test = []city = lines[1][lines[1].find('<b>')+len('<b>'):lines[1].find('</')]post_code = lines[2][lines[2].find('">')+len('">'):lines[2].find('</')]area_code = lines[3][lines[3].find('">')+len('">'):lines[3].find('</')]test.append(city)test.append(post_code)test.append(area_code)code_list.append(test)else :test1 = []city = lines[1][lines[1].find('>')+len('>'):lines[1].find('</')]post_code = lines[2][lines[2].find('">')+len('">'):lines[2].find('</')]area_code = lines[3][lines[3].find('">')+len('">'):lines[3].find('</')]test1.append(city)test1.append(post_code)test1.append(area_code)code_list.append(test1)test2 = []city = lines[4][lines[4].find('>')+len('>'):lines[4].find('</')]post_code = lines[5][lines[5].find('">')+len('">'):lines[5].find('</')]area_code = lines[6][lines[6].find('">')+len('">'):lines[6].find('</')]test2.append(city)test2.append(post_code)test2.append(area_code)code_list.append(test2)showPost(code_list)return code_list# 在终端上显示上面getPostCode(url)函数的得到二维的列表
def showPost(code_list):for i in range(len(code_list)):print(code_list[i])# 写入excel文件
def write_excel(path):# 创建工作簿workbook = xlwt.Workbook(encoding='utf-8')# 创建sheetfor title,url in getProvinceCode('http://www.ip138.com/post/').items():data_sheet = workbook.add_sheet(title)row0 = [u'城市名称', u'邮政编码', u'长途区号']  # 每个表的第一行文字，表头for i in range(len(row0)):data_sheet.write(0, i, row0[i])code_list = getPostCode(url)for i in range(len(code_list)):        # 循环写入所有邮政编码信息for j in range(len(code_list[i])):data_sheet.write(i+1,j,code_list[i][j])workbook.save(path)if __name__ == '__main__':path = './postcode.xls'write_excel(path)print(u'写入postcode.xls文件成功')

实例结果：

　终端显示：

　 excel文件：

python爬虫学习之爬取全国各省市县级城市邮政编码相关推荐

python输入城市找省份_python爬虫学习之爬取全国各省市县级城市邮政编码
importrequestsimportxlwt#返回一个字典,键是各个省份的名字,值是对应省份的网址url defgetProvinceCode(url): response=requests.ge ...
Python爬虫学习笔记 -- 爬取糗事百科
Python爬虫学习笔记 -- 爬取糗事百科代码存放地址: https://github.com/xyls2011/python/tree/master/qiushibaike 爬取网址:https ...
Python爬虫学习之爬取淘宝搜索图片
Python爬虫学习之爬取淘宝搜索图片准备工作因为淘宝的反爬机制导致Scrapy不能使用,所以我这里是使用selenium来获取网页信息,并且通过lxml框架来提取信息. selenium.lxm ...
Python爬虫实战之爬取全国理工类大学数量+数据可视化
上次爬取高考分数线这部分收了个尾,今天咱们来全面爬取全国各省有多少所理工类大学,并简单实现一个数据可视化.话不多说,咱们开始吧. 第一步,拿到url地址第二步,获取高校数据第三步,地图可视化第四 ...
为了部落来自艾泽拉斯勇士的python爬虫学习心得爬取大众点评上的各种美食数据并进行数据分析
为了希尔瓦娜斯第一个爬虫程序 csgo枪械数据先上代码基本思想问题1 问题2 爬取大众点评 URL分析第一个难题生成csv文件以及pandas库 matplotlib.pyplot库 K- ...
python爬虫学习之爬取超清唯美壁纸
简介壁纸的选择其实很大程度上能看出电脑主人的内心世界,有的人喜欢风景,有的人喜欢星空,有的人喜欢美女,有的人喜欢动物.然而,终究有一天你已经产生审美疲劳了,但你下定决定要换壁纸的时候,又发现网上的壁 ...
python爬虫学习一--爬取网络小说实例
最近疫情猖獗,长假憋在家里实在无聊,早上突然看了一篇python爬虫文章,当场决定试验一下,参照了一下别人的案例,自己各种踩坑捣鼓了好几个小时,终于成功最后把具体步骤和注意点分享给大家: 1.Pyth ...
python爬虫学习之爬取某网站上的视频
""" 实现步骤:发送请求 >>> 获取数据 >>> 解析数据 >>> 保存数据 1.发送请求,对于视频信息数据包发 ...
Python爬虫学习3----xpath爬取哔哩哔哩排行榜
爬取哔哩哔哩月排行榜,并输出csv格式文件. import requests import lxml.html import csvsource = requests.get('https://www ...

python爬虫学习之爬取全国各省市县级城市邮政编码

python爬虫学习之爬取全国各省市县级城市邮政编码相关推荐

最新文章

热门文章