全国所有城市当天天气数据爬虫

本次项目是要通过爬虫获取到全国所有城市的当天的天气数据，然后进行进行保存整体思路分为以下几步：
（1）找到目标网站
（2）对所有城市的天气进行获取
（3）最终进行保存数据到本地
然后通过浏览器进行解锁，找到了一个会显示天气预报的网站，目标网址为ur=https://www.tianqi.com/，网站图片如下

首先对这个网站进行分析，随意输入几个城市的url如下：
（1）北京：https://www.tianqi.com/beijing/
（2）上海：https://www.tianqi.com/shanghai/
（3）广州：https://www.tianqi.com/guangzhou/

观察这几个网址会发现他能的组成是“https://www.tianqi.com/”+“城市拼音”+/
所以下一步的思路就是找到全国所有城市的名称，然后将其转换为拼音，从而实现对网页进行翻页。

接下来的任务就是获取到全国所有城市的名字，我找到一个json数据可以获得所有的城市名称网站如下：
url=https://img.weather.com.cn/newwebgis/fc/nation_fc24h_wea_2022010420.json
爬入代码如下

 def main():    #爬取全国所有城市的名字url = 'https://img.weather.com.cn/newwebgis/fc/nation_fc24h_wea_2022010420.json'resp = requests.get(url).text[10:337263:]  # 对数据进行处理resp1 = json.loads(resp)['data']  # 将数据转换为json数据# print(type(resp1))# print(resp1)# print(resp)lis = []for i in resp1:lis.append(i['namecn'])# print(i['namecn'])#print(lis)#global lisname(lis)

此时已经获取到所有城市的名称，接下来就是将其转换为拼音，此时需要用到python中的pypinyin库，需要使用pip install pypinyin命令进行安装。
转换代码如下：

def name(lis):       #将所有的名字转换为拼音lis1 = []for i in lis:lis2=[]lis2.append(i)name = lazy_pinyin(i)# print(name)a=''for i in name:a+=ilis2.append(a)lis1.append(lis2)#print(lis1)fun2(lis1)

最终将获取到的所有城市名称转换为了拼音
最后就是将所有的城市数据进行爬取，其完整代码如下：

import requests
from bs4 import BeautifulSoup
import json
from pypinyin import lazy_pinyin
import time
import xlwtf=open('python.txt','w',encoding='utf-8')          #创建一个文化def main():    #爬取全国所有城市的名字url = 'https://img.weather.com.cn/newwebgis/fc/nation_fc24h_wea_2022010420.json'resp = requests.get(url).text[10:337263:]  # 对数据进行处理resp1 = json.loads(resp)['data']  # 将数据转换为json数据# print(type(resp1))# print(resp1)# print(resp)lis = []for i in resp1:lis.append(i['namecn'])# print(i['namecn'])#print(lis)#global lisname(lis)passdef name(lis):       #将所有的名字转换为拼音lis1 = []for i in lis:lis2=[]lis2.append(i)name = lazy_pinyin(i)# print(name)a=''for i in name:a+=ilis2.append(a)lis1.append(lis2)#print(lis1)fun2(lis1)passdef fun2(lis1):                   #爬取天气预报网站获取该网站原代码for i in lis1:time.sleep(1)url = f'https://www.tianqi.com/{i[1]}/'print(url)headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36','cookie': 'Hm_lvt_ab6a683aa97a52202eab5b3a9042a8d2=1661411951; Hm_lvt_30606b57e40fddacb2c26d2b789efbcb=1661413345; Hm_lpvt_3060''6b57e40fddacb2c26d2b789efbcb=1661414297; cs_prov=04; cs_city=0401; ccity=101040100; Hm_lpvt_ab6a683aa97a52202eab5b3a9''042a8d2=1661414326'}resp = requests.get(url=url, headers=headers).text# print(resp)fun3(resp,i)# print(resp)def fun3(resp,i):            #对天气预报网站的数据进行解析try:html = BeautifulSoup(resp, 'html.parser')html1 = html.find('div', class_="weatherbox").find("dl", class_="weather_info").find("dd",class_="week").text  # 获取当天时间html2 = html.find('div', class_="weatherbox").find("dl", class_="weather_info").find('dd',class_="weather").find('span').text  # 获取当天温度html3 = html.find('div', class_="weatherbox").find("dl", class_="weather_info").find('dd',class_="shidu").text  # 湿度，风度紫外线html4 = html.find('div', class_="weatherbox").find("dl", class_="weather_info").find('dd',class_="kongqi").text  # 空气质量print(i[0], html1, html2, html3, html4)lis = []lis.append(html1)lis.append(html2)lis.append(html3)lis.append(html4)fun4(lis)except Exception as e :print(e)'''html=BeautifulSoup(resp,'html.parser')html1=html.find('div', class_="weatherbox").find("dl", class_="weather_info").find("dd", class_="week").text    #获取当天时间html2=html.find('div', class_="weatherbox").find("dl", class_="weather_info").find('dd', class_="weather").find('span').text   #获取当天温度html3=html.find('div', class_="weatherbox").find("dl", class_="weather_info").find('dd', class_="shidu").text    #湿度，风度紫外线html4=html.find('div', class_="weatherbox").find("dl", class_="weather_info").find('dd', class_="kongqi").text   #空气质量print(i[0],html1,html2,html3,html4)lis=[]lis.append(html1)lis.append(html2)lis.append(html3)lis.append(html4)fun4(lis)'''def fun4(lis):       #数据存储'''lis=['当天时间','当天温度','湿度，分度紫外线','空气质量']book=xlwt.Workbook(encoding='utf-8')shell=book.add_sheet('天气数据',cell_overwrite_ok=True)'''f.write(str(lis) + '\n')print(f'存储')passif __name__ == '__main__':main()

最终得到的数据如下：

本次项目不足之处在于没有使用多线程，使得爬取速度过慢。

全国所有城市当天天气数据爬虫相关推荐

抓取中国天气网当前时段所有城市的天气数据(python+xpath)
先给大家看一看效果图(我一共获取到了462个城市的天气): 前不久,2019年开放数据中心峰会在北京国际会议中心成功召开,ODCC指出:"对数据进行汇聚,在体系化融合中产生新的价值已成为未来 ...
[python爬虫]爬取天气网全国所有县市的天气数据
[python爬虫]爬取天气网全国所有县市的天气数据访问URL 解析数据保存数据所要用到的库 import requests from lxml import etree import xlwt ...
Unity 工具之获取当前所在城市的天气数据的封装（自动定位当前所在城市，天气数据可以获得多天天数据）
Unity 工具之获取当前所在城市的天气数据的封装(自动定位当前所在城市,天气数据可以获得多天天数据) 目录 Unity 工具之获取当前所在城市的天气数据的封装(自动定位当前所在城市,天气数据可以 ...
如何用Python自动爬取全国30+城市地铁图数据？
阅读本文大概需要 3 分钟. 最近两天工作之余,开始涉猎python,自动爬取了全国30+城市地铁图数据,这里分享下整个爬虫过程 1. 数据来源首先分析全国各个城市地铁图的数据来源,无非就是百度或者 ...
利用Python自动爬取全国30+城市地铁图数据
数据来源首先分析全国各个城市地铁图的数据来源,无非就是百度或者高德,这次选择用高德作为数据来源. 基本环境配置版本:Python3 系统:Windows 相关模块: 安装请求库 pip insta ...
python爬取天气数据的header_[python爬虫]爬取天气网全国所有县市的天气数据
def get_data(url): html = ask_url(url) base_url = 'http://www.weather.com.cn' province_name = []# 省份 ...
Python3爬取国家统计局官网2017年全国所有城市县镇数据
最近由于项目需要用到全国城镇乡的数据,网上找了下大部分都是很久之前的,或者不理想的数据,某文库更是无论文章好不好都要下载券,所以索性自己用Python写爬虫爬数据,以下是代码(Python3.6版本) ...
你在的城市撒币了吗？Python爬取全国各城市消费券发放数据并分析
前言近期,全国多地以各种形式投放消费券.消费补贴来鼓励消费,部分城市在首期消费券的基础上,连续追加发放多期消费券.你在的城市撒币了吗?哪个省份最爱撒币?哪个城市撒币最多?跟随本文一起来看看. 数据说 ...
Python爬取国家统计局官网最新全国所有城市县镇数据
最近项目里需要省市区村的数据,网上找了很多方法,都没有如意的,有的老数据竟然还要钱,要积分的我也还认可,我在网上查了下,参考了这位老兄的博客,自己又动手把第五级村级行政加了上去.下面请看大屏幕,我要划 ...

全国所有城市当天天气数据爬虫

全国所有城市当天天气数据爬虫相关推荐

最新文章

热门文章