Python 使用selenium爬取房天下网站，房源动态信息

什么是Selenium

selenium 是一套完整的web应用程序测试系统，包含了测试的录制（selenium IDE）,编写及运行（Selenium Remote Control）和测试的并行处理（Selenium Grid）。Selenium的核心Selenium Core基于JsUnit，完全由JavaScript编写，因此可以用于任何支持JavaScript的浏览器上。

selenium不了解可以去百度

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from time import sleep
import json
import reclass FTspider(object):def __init__(self):# page = 1# start_urls =[base_urls + str(page)]# print(start_urls)# options = Options()# options.set_headless()# self.driver = webdriver.Chrome(options=options)self.driver = webdriver.Chrome()self.num = 1self.base_urls = "http://nc.newhouse.fang.com/house/s/b9{}".format(self.num)def xinfang_list(self):# 获取所有房源name = self.driver.find_elements_by_xpath('//*[@class="clearfix"]/div/a ')house_lst = []# print(name)for i in name:href = (i.get_attribute('href'))# self.driver.get(href)house_lst.append(href)data_list = []for url in house_lst:self.driver.get(url)# 获取楼盘动态try:fangyuan_url = self.driver.find_element_by_xpath("//*[@class='navleft tf']//a[contains(text(),'动态')]")except Exception as e:fangyuan_url = Nonehref1 = fangyuan_url.get_attribute('href')self.driver.get(href1)# 获取动态详情dongtai_url = self.driver.find_elements_by_xpath('//div[@id="gushi_all"]/ul/li[@id="xflpdt_A02_01"]//p//a')# dongtai_url = dongtai_url1.get_attribute('href')if dongtai_url == None:dongtai_url = Noneelse:passall_comment_dict = {"_id": url}dynamicJson = []floor_class = [j.get_attribute('href') for j in dongtai_url]for i in floor_class:self.driver.get(i)one_dongtai_url = self.driver.find_element_by_xpath("//div[@class='atc-wrapper']")data = {}data["source"] = "房天下"data["title"] = one_dongtai_url.find_element_by_xpath("./h1").text  # 标题if not data["title"]:continuetime = one_dongtai_url.find_element_by_xpath("./h2").textdata['publishDate'] = re.search(r"\d+.*", time, re.S).group()  # 时间content = one_dongtai_url.find_elements_by_xpath(".//div[@class='leftboxcom']//p[@style='text-indent:2em;']")if len(content) !=0:ori_content = ""for i in content:a = i.find_element_by_xpath(".").textori_content = ori_content + a + "\n"data["content"] = ori_contentelse:data["content"] = one_dongtai_url.find_element_by_xpath(".//div[@class='leftboxcom']|//div[@class='leftboxcom']//a").text  # 内容data_list.append(data)dynamicJson.append(data)dynamicJson = json.dumps(dynamicJson, ensure_ascii=False)all_comment_dict.update({"dynamicJson": dynamicJson})self.save_data(all_comment_dict)return data_listdef save_data(self, data_list):"""保存本地数据"""with open('动态3100000号终极(南昌).jsonlines', 'a', encoding='utf8') as f:f.write(json.dumps(data_list, ensure_ascii=False))f.write('\n')f.close()def __del__(self):# 退出浏览器self.driver.quit()# passdef run(self):while True:# get请求浏览网页self.driver.get(self.base_urls)# 解析信息self.xinfang_list(self.num += 1self.base_urls = "http://nc.newhouse.fang.com/house/s/b9{}".format(self.num)if self.num > 16:breakif __name__ == '__main__':GJS = FTspider()GJS.run()

后期更新，使用scrapy框架爬取房天下数据

Python 使用selenium爬取房天下网站，房源动态信息相关推荐

利用Python Scrapy框架爬取“房天下”网站房源数据
文章目录分析网页获取新房.二手房.租房数据新房数据租房数据: 二手房数据反反爬虫将数据保存至MongoDB数据库 JSON格式 CSV格式 MongoDB数据库分析网页 "房天 ...
爬虫实战-爬取房天下网站全国所有城市的新房和二手房信息（最新）
看到https://www.cnblogs.com/derek1184405959/p/9446544.html项目:爬取房天下网站全国所有城市的新房和二手房信息和其他博客的代码,因为网站的更新或者其 ...
Python爬虫案例3：爬取房天下房价等各种信息
爬取房天下网站,爬取的内容: 区域.小区名.总价.房型.面积.单价.朝向.楼层位置.装修情况.建筑时间.是否有电梯.产权类型.住宅类型.发布日期信息保存:保存在csv中数据结果: 1.先建立爬虫项 ...
[python爬虫] selenium爬取局部动态刷新网站（URL始终固定）
在爬取网站过程中,通常会遇到局部动态刷新情况,当你点击"下一页"或某一页时,它的数据就进行刷新,但其顶部的URL始终不变.这种局部动态刷新的网站,怎么爬取数据呢?某网站数据显示如下 ...
python爬虫——爬取房天下
python爬虫--爬取房天下话不多说,直接上代码! import requests as req import time import pandas as pd from bs4 import B ...
详解Python爬取房天下的推荐新楼盘
点击上方"程序员大咖",选择"置顶公众号" 关键时刻,第一时间送达! 最近一直在关注Python写爬虫相关的知识,尝试了采用requests + Beautif ...
详解Python 采用 requests + Beautiful Soup 爬取房天下新楼盘推荐
最近一直在关注Python写爬虫相关的知识,尝试了采用requests + Beautiful Soup来爬取房天下(原搜房网)的推荐新楼盘. 不用不知道,一用发现有惊喜也有惊吓,本文就一同记录下惊喜 ...
[python爬虫] Selenium爬取内容并存储至MySQL数据库
前面我通过一篇文章讲述了如何爬取CSDN的博客摘要等信息.通常,在使用Selenium爬虫爬取数据后,需要存储在TXT文本中,但是这是很难进行数据处理和数据分析的.这篇文章主要讲述通过Selenium ...
爬取房天下数据观察广州房租情况
新的一年,有房东提出了涨租,也有跳槽的小伙伴,考虑租房换房,趁着这个时点,再来说说租房的事,找到合适的房子是头等大事,接下来让我们通过爬取房天下数据来观察广州房租情况.(结果图在最后面,想看结果图的小 ...

Python 使用selenium爬取房天下网站，房源动态信息

什么是Selenium

selenium不了解可以去百度

后期更新，使用scrapy框架爬取房天下数据

Python 使用selenium爬取房天下网站，房源动态信息相关推荐

最新文章

热门文章