Python 依赖模块：

requests
parsel
csv

功能要求：

请求网页

打开开发者工具（ F12或者鼠标右键点击检查 ）选择 notework 查看数据返回的内容。

通过开发者工具可以看到，网站是静态网页数据，请求url地址是可以直接获取数据内容的。

url = 'https://cs.lianjia.com/ershoufang/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) ''Chrome/81.0.4044.138 ''Safari/537.36 '}response = requests.get(url=url, headers=headers)print(response.text)

解析数据

网站是静态网页数据，那么就可以直接在开发者工具中 Elements 查看数据在哪

如上图所示，相关的数据内容都包含在 li 标签里面。通过 parsel 解析库，进行解析提取数据就可以了。

 selector = parsel.Selector(response.text)lis = selector.css('.sellListContent li')for li in lis:# 标题title = li.css('.title a::text').get()# 地址positionInfo = li.css('.positionInfo a::text').getall()community = ''address = ''if len(positionInfo):# 小区community = positionInfo[0]# 地名address = positionInfo[1]# 房子基本信息houseInfo = li.css('.houseInfo::text').get()# 房价print('数据类型:', type(li.css('.totalPrice span::text').get()))txt = li.css('.totalPrice span::text').get()Price = ''if isinstance(txt, str):Price = li.css('.totalPrice span::text').get() + '万'# 单价print('单价数据类型:', type(li.css('.unitPrice span::text').get()))txt = li.css('.unitPrice span::text').get()unitPrice = ''if isinstance(txt, str):unitPrice = li.css('.unitPrice span::text').get().replace('单价', '')# 发布信息followInfo = li.css('.followInfo::text').get()dit = {'标题': title,'小区': community,'地名': address,'房子基本信息': houseInfo,'房价': Price,'单价': unitPrice,'发布信息': followInfo,}print(dit)

保存数据（数据持久化）

使用csv模块，把数据保存到Excel里面

        # 创建文件f = open('长沙二手房数据.csv', mode='a', encoding='utf-8', newline='')csv_writer = csv.DictWriter(f, fieldnames=['标题', '小区', '地名', '房子基本信息','房价', '单价', '发布信息'])# 写入表头csv_writer.writeheader()''''''csv_writer.writerow(dit)

多页爬取

for page in range(1, 101):url = 'https://cs.lianjia.com/ershoufang/'downloadLianjia(url)def downloadLianjia(url):headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) ''Chrome/81.0.4044.138 ''Safari/537.36 '}response = requests.get(url=url, headers=headers)print(response.text)selector = parsel.Selector(response.text)lis = selector.css('.sellListContent li')for li in lis:# 标题title = li.css('.title a::text').get()# 地址positionInfo = li.css('.positionInfo a::text').getall()community = ''address = ''if len(positionInfo):# 小区community = positionInfo[0]# 地名address = positionInfo[1]# 房子基本信息houseInfo = li.css('.houseInfo::text').get()# 房价print('数据类型:', type(li.css('.totalPrice span::text').get()))txt = li.css('.totalPrice span::text').get()Price = ''if isinstance(txt, str):Price = li.css('.totalPrice span::text').get() + '万'# 单价print('单价数据类型:', type(li.css('.unitPrice span::text').get()))txt = li.css('.unitPrice span::text').get()unitPrice = ''if isinstance(txt, str):unitPrice = li.css('.unitPrice span::text').get().replace('单价', '')# 发布信息followInfo = li.css('.followInfo::text').get()dit = {'标题': title,'小区': community,'地名': address,'房子基本信息': houseInfo,'房价': Price,'单价': unitPrice,'发布信息': followInfo,}print(dit)# 创建文件f = open('长沙二手房数据.csv', mode='a', encoding='utf-8', newline='')csv_writer = csv.DictWriter(f, fieldnames=['标题', '小区', '地名', '房子基本信息','房价', '单价', '发布信息'])# 写入表头csv_writer.writeheader()''''''csv_writer.writerow(dit)

效果展示：

Python爬虫之链家二手房数据爬取相关推荐

python爬虫实例——某二手车数据爬取
某二手车网站数据爬取要求: 找到所要爬取的网站网址(url): 今天案例的网址(url):https://www.guazi.com/gy/dazhong/o1/#bread. 观察网站,点开检查, ...
Python爬虫|高德地图地铁数据爬取与制图
目录一.高德地图数据爬取 1.爬取思路 2.python核心代码二.Arcmap制图一.高德地图数据爬取 1.爬取思路首先,谷歌浏览器打开高德地图官网,点击上方菜单栏地铁进入地铁线路网站如下, ...
python爬虫案例-陶瓷公司数据爬取
用requests爬取要注意HTTPConnectionPool(host=xxx, port=xxx): Max retries exceeded with url...异常,出现这个异常的解决方法 ...
Python爬虫 —— 以北京天气数据爬取为例
本文以北京天气为例讲解数据爬取的整个流程,不涉及网络爬虫的原理,直接讲爬取代码怎么写! 1.首先找到你要爬取的网站url:'http://www.tianqihoubao.com/lishi/beij ...
PyQt5+Python+Excel链家二手房信息爬取、可视化以及数据存取
成果图: 第一步运行代码searsh.py,效果如下第二步选择你所需要爬取数据的城市,如湖北-武汉然后搜索,结果如下如果你想爬取更多信息,只需要点击下一页即可第三步,保存数据.可以将所显示的所 ...
Python爬虫应用实战-网站数据爬取及数据分析
实战一:中国大学排名前言由于上一篇文章中教会了大家如何存储数据,但是由于篇幅过大,就没有加入实战篇.想必大家也等着急了吧,所以今天就为大家带来两篇实战内容,希望可以帮助到各位更好的认识到爬虫与My ...
Python爬虫之淘宝数据爬取（商品名称，价格，图片，销量）
代码详细注释,仅供交流与参考,不作商业用途代码参考北京理工大学嵩天老师 import requests #导入第三方库 import re import osdef getHTMLText(url) ...
Python 爬虫实战，模拟登陆爬取数据
Python 爬虫实战,模拟登陆爬取数据从0记录爬取某网站上的资源连接: 模拟登陆爬取数据保存到本地结果演示: 源网站展示: 爬到的本地文件展示: 环境准备: python环境安装略安装r ...
爬虫系列之链家的信息爬取及数据分析
关于链家的数据爬取和分析已经实现 1.房屋数据爬取并下载 2.房屋按区域分析 3.房屋按经纪人分析 4.前十经纪人 5.经纪人最有可能的位置分析 6.实现以地区划分房屋目前存在的问题: 1.多线程 ...
python中国大学排名爬虫写明详细步骤-Python爬虫--2019大学排名数据抓取
Python爬虫--2019大学排名数据抓取准备工作输入:大学排名URL连接输出:大学排名信息屏幕输出所需要用到的库:requests,bs4 思路获取网页信息提取网页中的内容并放到数据结 ...

Python爬虫之链家二手房数据爬取