1. 项目简介

在中国天气网(http://www.weather.com.cn)中输入一个城市的名称，例如输入深圳，那么会转到地址http://www.weather.com.cn/weather1d/101280601.shtml的网页显示深圳的天气预报，其中101280601是深圳的代码，每个城市或者地区都有一个代码。如下图：

在上图中可以看到，深圳今天，7天，8-15天等的天气数据，这里爬取7天的天气预报数据。

2. HTML 代码分析

分析这段代码：

7天的天气预报实际上在一个<ul class="t clearfix">元素中，每天是一个M<li>元素，7天的结构差不多是一样的(注意：今天没有最高温度与最低温度)。

3. 爬取天气预报数据

from bs4 import BeautifulSoup
from bs4.dammit import UnicodeDammit  # BS内置库，用于推测文档编码
import urllib.request  # 发起请求，获取响应url = "http://www.weather.com.cn/weather/101280601.shtml"try:headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ""(KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78"}req = urllib.request.Request(url, headers=headers)  # 创建请求对象data = urllib.request.urlopen(req)  # 发起请求data = data.read()  # 获得响应体dammit = UnicodeDammit(data, ["utf-8", "gbk"])data = dammit.unicode_markup  # 解码soup = BeautifulSoup(data, "lxml")lis = soup.select("ul[class='t clearfix'] li")x = 0for li in lis:try:date = li.select('h1')[0].textweather = li.select('p[class="wea"]')[0].textif x == 0:  # 为今天只有一个温度做判断 <i>14℃</i>x += 1temp = li.select('p[class="tem"] i')[0].textelse:temp = li.select('p[class="tem"] span')[0].text + "/" + li.select('p[class="tem"] i')[0].textprint(date, weather, temp)# 22日（今天） 晴 14℃# 23日（明天） 晴 23℃/14℃# 24日（后天） 晴转多云 25℃/13℃# 25日（周六） 多云 21℃/13℃# 26日（周日） 多云转晴 22℃/12℃# 27日（周一） 晴 21℃/12℃# 28日（周二） 晴 24℃/14℃except Exception as err:print(err)
except Exception as err:print(err)

4. 爬取与存储天气预报数据

获取北京、上海、广州、深圳等城市的代码，爬取这些城市的天气预报数据，并存储到sqllite数据库weathers.db中，存储的数据表weathers是：

create table weathers (wCity varchar(16),wDate varchar(16),wWeather varchar(64),wTemp varchar(32),constraint pk_weather primary key (wCity,wDate))"

编写程序依次爬取各个城市的天气预报数据存储在数据库中，程序如下：

from bs4 import BeautifulSoup
from bs4.dammit import UnicodeDammit
import urllib.request
import sqlite3# 天气数据库
class WeatherDB:def __init__(self):self.cursor = Noneself.con = Nonedef openDB(self):self.con = sqlite3.connect("weathers.db")self.cursor = self.con.cursor()try:self.cursor.execute("create table weathers (wCity varchar(16),""wDate varchar(16),""wWeather varchar(64),""wTemp varchar(32),""constraint pk_weather primary key (wCity,wDate))")except Exception as err:print(err)self.cursor.execute("delete from weathers")def closeDB(self):self.con.commit()self.con.close()def insert(self, city, date, weather, temp):try:self.cursor.execute("insert into weathers (wCity,wDate,wWeather,wTemp) values (?,?,?,?)",(city, date, weather, temp))except Exception as err:print(err)def show(self):self.cursor.execute("select * from weathers")rows = self.cursor.fetchall()print("%-16s%-16s%-32s%-16s" % ("city", "date", "weather", "temp"))for row in rows:print("%-16s%-16s%-32s%-16s" % (row[0], row[1], row[2], row[3]))# 天气预报
class WeatherForecast:def __init__(self):self.headers = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 6.0 x64; en-US; rv:1.9pre) ""Gecko/2008072421 Minefield/3.0.2pre"}self.cityCode = {"北京": "101010100", "上海": "101020100", "广州": "101280101", "深圳": "101280601"}def forecastCity(self, city):if city not in self.cityCode.keys():print(city + " 找不到代码")returnurl = "http://www.weather.com.cn/weather/" + self.cityCode[city] + ".shtml"try:req = urllib.request.Request(url, headers=self.headers)data = urllib.request.urlopen(req)data = data.read()dammit = UnicodeDammit(data, ["utf-8", "gbk"])data = dammit.unicode_markupsoup = BeautifulSoup(data, "lxml")lis = soup.select("ul[class='t clearfix'] li")x = 0for li in lis:try:date = li.select('h1')[0].textweather = li.select('p[class="wea"]')[0].textif x == 0:  # 为今天只有一个温度做判断 <i>14℃</i>x += 1temp = li.select('p[class="tem"] i')[0].textelse:temp = li.select('p[class="tem"] span')[0].text + "/" + li.select('p[class="tem"] i')[0].textprint(city, date, weather, temp)self.db.insert(city, date, weather, temp)except Exception as err:print(err)except Exception as err:print(err)def process(self, cities):self.db = WeatherDB()self.db.openDB()for city in cities:self.forecastCity(city)# self.db.show()self.db.closeDB()ws = WeatherForecast()
ws.process(["北京", "上海", "广州", "深圳"])
print("completed")

程序执行结果如下：

【爬虫】2.6 实践项目——爬取天气预报数据相关推荐

php解析和风天气api,API爬取天气预报数据
API爬取天气预报数据 """ 该网站为个人开发者提供免费的天气预报数据,自行访问官网注册,在控制台看到个人的key. 然后看API文档,基本可以开发了,有访问次数限制. ...
【爬虫】4.5 实践项目——爬取当当网站图书数据
目录 1. 网站图书数据分析 2. 网站图书数据提取 3. 网站图书数据爬取 (1)创建 MySQL 数据库 (2)创建 scrapy 项目 (3)编写 items.py 中的数据项目类 (4)编写 ...
爬虫实战——绝对通俗易懂，爬取房产数据
爬取房产数据爬虫介绍实战目标 1.获取url 2.利用BeautifulSoup获取html的索引 3.查找所需数据索引 4.正则表达式获取所需要的信息完整代码爬虫介绍简单介绍一下爬虫,百度 ...
python爬取天气预报数据，并实现数据可视化
文章目录一.前言二.爬取目标及结果展示三.页面分析四.完整代码五.补充六.不足(经读者反馈) 一.前言在爬取数据时,有些数据,如图片.视频等等,爬到就是赚到:而有时候,我们爬到的可能只是 ...
利用python爬虫(案例6+part14)--如何爬取科研数据
学习笔记文章目录 Ajax动态加载网站数据抓取动态加载的类型那么该如何抓取数据? 如何得到JSON文件的地址? 观察JSON文件URL地址的查询参数 JSON格式数据转换成python字典如何 ...
easyui datalist 不显示数据_爬虫练习——豆瓣电影信息爬取及数据可视化
最近自学了简单的爬虫项目,简单记录下自己的小白学习路径. 本次爬取的是豆瓣电影TOP250数据,主要用到beautifulsoup.re.urllib库.SQLite包,数据可视化方面主要用到flas ...
python爬取天气数据山东_Python爬取天气预报数据，并存入到本地EXCEL中-Go语言中文社区...
近期忙里偷闲,搞了几天python爬虫,基本可以实现常规网络数据的爬取,比如糗事百科.豆瓣影评.NBA数据.股票数据.天气预报等的爬取,整体过程其实比较简单,有一些HTML+CSS+DOM树等知识就很 ...
python爬取天气预报数据并保存为txt格式_今天分享一个用Python来爬取小说的小脚本！（附源码）...
本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理. 以下文章天气预报数据分析与统计之美 ,作者:❦大头雪糕❦ Python GUI制作小说下载器教学讲 ...
python爬取去哪网数据_Python爬虫入门：使用Python爬取网络数据
1 网络爬虫引用百度百科的定义:网络爬虫是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本. 简单的说,就是有一个程序可以自动去访问网页. 2 Python爬虫如何实现爬虫? 简单的讲,一共 ...

【爬虫】2.6 实践项目——爬取天气预报数据

1. 项目简介

2. HTML 代码分析

3. 爬取天气预报数据

4. 爬取与存储天气预报数据

【爬虫】2.6 实践项目——爬取天气预报数据相关推荐

最新文章

热门文章