1. 房屋价格地图

1.1 项目思路主要分成三个步骤

  • 首先利用python爬取安居客网站上个区的房屋价格,并把房屋所在小区进行归类,求出小区均价。
  • 然后利用百度地图api中的地理编码,我们可以获取小区所在的经纬度,注意这里不是所有小区都能准确获取的,存在一定数量的小区无法获取精确的地理坐标。
  • 最后利用BDP线上分析可以绘制出如下小区均价地图。

效果如下:

1.2 项目目录

  • get_data:用于python爬取安居客房屋价格并整理
  • get_lnglat:用于百度地图api中地理编码,获取小区的经纬度并整理

2. python爬取安居客房屋价格并整理

import datetime
import reimport requests
from lxml import etree
import pandas as pd
import json
import math
import random
import time
import numpy as np
import urllib.request# 获取安居客网站中某市二手房楼盘的每个区域的网址
def get_different_area_wang_zhi(url):mozilla = ["Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.3; rv:11.0) like Gecko","Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)"]headers = {"Cache-Control": "max-age=0","User-Agent": "{}".format(random.choice(mozilla)),"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","Accept-Language": "zh-CN,zh;q=0.9"}response = requests.get(url, headers=headers).textre = etree.HTML(response)# 返回列表格式wang_zhi = re.xpath('//div[@id="content_Rd1"]/div[@class="clearfix"]/div[@class="details float_l"][1]/div[@class="areas"]/a/@href')return wang_zhi  # 第一个网址为全部的小区的网址,应该去掉# 得到一个区域的二手房楼盘的总页数(n)
def get_one_area_number(url):mozilla = ["Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.3; rv:11.0) like Gecko","Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)"]headers = {"Cache-Control": "max-age=0","User-Agent": "{}".format(random.choice(mozilla)),"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","Accept-Language": "zh-CN,zh;q=0.9"}response = requests.get(url, headers=headers).textre = etree.HTML(response)number = re.xpath('//div[@class="pagination"]/ul[@class="page"]/li[@class="page-item last"]/a/text()')if number == '':print('页面没有小区数据')else:n = int(number[0])return n# 爬取安居客的二手房小区信息
def anjuke_new(url, n, city_name):mozilla = ["Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.3; rv:11.0) like Gecko","Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)"]headers = {"Cache-Control": "max-age=0","User-Agent": "{}".format(random.choice(mozilla)),"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","Accept-Language": "zh-CN,zh;q=0.9"}l = []for i in range(1, n + 1):if i == 1:url_load = urlelse:url_load = url + "p" + str(i) + '/'address = []mian_ji = []year = []try:print("正在爬取{}的第{}页".format(url, i))res = requests.get(url_load, headers=headers).textf = etree.HTML(res)# 获取当前页所有小区的网址# 获取房屋所在小区名字name = f.xpath('//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//p[''@class="property-content-info-comm-name"]/text()')# 获取房屋所在小区地址# address_tmp = f.xpath(#     '//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//p[@class="property-content-info-comm-address"]/span/text()')# j = 0# while j < len(address_tmp):#     address.append(address_tmp[i] + address_tmp[i + 1] + address_tmp[i + 2])#     j += 3# 获取房屋均价avg_price = f.xpath('//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//div[@class="property-price"]/p[''@class="property-price-average"]/text()')mian_ji_tmp = f.xpath('//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//p[''@class="property-content-info-text"][1]/text()')for item in mian_ji_tmp:mian_ji.append(item.strip())# 获取房屋所在小区建设年份year_tmp = f.xpath('//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//p[@class="property-content-info-text"][4]/text()')for item in year_tmp:year.append(item.strip())except:print("爬取{}的第{}页错误".format(url, i))with open(r"./anjuke_house_error.txt", "a") as f:f.write("{}第{}页出错\n".format(url, i))continueprint("爬取{}的第{}页完成".format(url, i))for j in range(len(avg_price)):d = {}d["小区名称"] = name[j]# d["小区地址"] = address[j]d["房屋均价"] = avg_price[j]d["房屋面积"] = mian_ji[j]try:d["建造年代"] = year[j]except IndexError:d["建造年代"] = ''l.append(d)data = pd.DataFrame(l)return datadef processing_data(data, city_name):date = datetime.datetime.now()year = date.yearmonth = date.monthday = date.daydate_list = str('-').join([str(year), str(month), str(day)])flag = Falsedata_prv = ''try:data_prv = pd.read_csv('{}_anjuke_house.csv'.format(city_name), encoding='utf_8_sig')except FileNotFoundError:flag = Truel_name = []l = []for i in data.index:data_name = data.loc[i, '小区名称']if data_name in l_name:continueelse:data_tmp = data[data_name == data['小区名称']]price = []reg = '^\d+'for j in data_tmp.index:price.append(int(re.findall(reg, data_tmp.loc[j, '房屋均价'])[0]))avg_price = np.mean(np.array(price))d = {}d['小区名称'] = data_named['小区均价' + date_list] = avg_pricel.append(d)l_name.append(data_name)process_data = pd.DataFrame(l)process_data.to_csv('./{}_anjuke_house.csv'.format(city_name), index=False, encoding='utf_8_sig')if not flag:l_name = []data_prv['小区均价' + date_list] = ''for i in data.index:data_name = data.loc[i, '小区名称']if data_name in l_name:continueelse:data_tmp = data[data_name == data['小区名称']]price = []reg = '^\d+'for j in data_tmp.index:price.append(int(re.findall(reg, data_tmp.loc[j, '房屋均价'])[0]))avg_price = np.mean(np.array(price))if data_name in data_prv['小区名称'].values:data_prv.loc[data_prv[data_prv['小区名称'] == data_name].index, '小区均价' + date_list] = avg_pricel_name.append(data_name)else:d = {'小区名称': data_name, '小区均价' + date_list: avg_price}data_prv.append(d, ignore_index=True)l_name.append(data_name)data_prv.to_csv('./{}_anjuke_house.csv'.format(city_name), index=False, encoding='utf_8_sig')# 安居客二手房主函数调用
def anjuke_second_main(url):wang_zhi = get_different_area_wang_zhi(url)for item in wang_zhi:n = get_one_area_number(item)reg = '\/([A-Za-z]+)\/$'city_name = re.findall(reg, item)[0]data = anjuke_new(item, n, city_name)processing_data(data, city_name)time.sleep(random.randint(10, 15))# 爬取其他城市修改url,只需要修改城市名称的简称即可,例如西安是xa,厦门是xm,具体看安居客的网址。
if __name__ == "__main__":url = "https://wuhan.anjuke.com/?pi=PZ-baidu-pc-all-biaoti"# 用于储存文件anjuke_second_main(url)
  • get_different_area_wang_zhi()
    获取安居客网站中某市二手房楼盘的每个区域的网址。
  • get_one_area_number()
    获取二手房楼盘每个区域网址中一共有多少页,我发现安居客所显示的一共最多就是50页,但是这不一定是该区域的所有房屋价格,应该是一个不完全统计。
  • anjuke_new()
    获取二手房楼盘每个区域网址中能显示的所有房屋信息。
  • processing_data()
    对获取的数据进行处理,主要是把相同小区房屋的价格求平均以得到小区的均价。
  • anjuke_second_main()
    主函数

3. 利用百度地图api中的地理编码并整理

import json
import os
from urllib.request import urlopen
from urllib.parse import quoteimport pandas as pd
import requestsdef getfilepath(path):list_name = []for file in os.listdir(path):file_path = os.path.join(path, file)if os.path.isdir(file_path):passelse:if file_path.endswith('.csv'):list_name.append(file_path)return list_namedef getlnglat(address, city):flag = 1url = 'http://api.map.baidu.com/geocoding/v3/'output = 'json'ak = 'sQMfKlPK4yCefIC28eb3Hn3QTO9MzEUV'city = quote(city)address_url = quote(address)qing_qiu = quote('请求')uri = url + '?' + 'city=' + city + '&address=' + address_url + '&output=' + output + '&ak=' + ak + '&callback=showLocation' + '//GET' + qing_qiutry:while flag != 0:res = requests.get(uri).texttemp = json.loads(res)  # 将字符串转化为jsonflag = temp['status']except:print('发送请求失败')if temp['result']['level'] != '区县' and temp['result']['level'] != '城市':d = {'小区名称': address, 'lat': temp['result']['location']['lat'], 'lng': temp['result']['location']['lng']}return d  # 纬度 latitude,经度 longitudeelse:print('地址: ' + address + '  非地产小区: ' + temp['result']['level'] + ' lat ' + str(temp['result']['location']['lat']) + ' lng ' + str(temp['result']['location']['lng']))return 0def add_lnglat(path):city = '武汉市'area = ''if path.find('caidianz') != -1:area = '蔡甸区'if path.find('dongxihu') != -1:area = '东西湖区'if path.find('hannanz') != -1:area = '汉南区'if path.find('hanyang') != -1:area = '汉阳区'if path.find('hongshana') != -1:area = '洪山区'if path.find('huangpiz') != -1:area = '黄陂区'if path.find('jiangan') != -1:area = '江岸区'if path.find('jiangxiat') != -1:area = '江夏区'if path.find('qiaokou') != -1:area = '硚口区'if path.find('qingshan') != -1:area = '青山区'if path.find('wuchanga') != -1:area = '武昌区'if path.find('xinzhouz') != -1:area = '新洲区'if path.find('zhuankouk') != -1:area = '沌口'data = pd.read_csv(path, encoding='utf_8_sig')village_names = data['小区名称'].valuesfor village_name in village_names:coordinate = getlnglat(village_name, city + area)if coordinate != 0:data.loc[data[data['小区名称'] == coordinate['小区名称']].index, 'lat'] = coordinate['lat']data.loc[data[data['小区名称'] == coordinate['小区名称']].index, 'lng'] = coordinate['lng']data.to_csv(path, index=False, encoding='utf_8_sig')if __name__ == '__main__':list_name = getfilepath('./')for item in list_name:add_lnglat(item)
  • getfilepath()
    获取当前目录下面的所有csv文件。
  • getlnglat()
    获取一个小区地址所对应的经纬度,并对经纬度是否合理做判断。
  • add_lnglat()
    把经纬度信息添加到原先的csv文件中,方便后续的作图。

4. 利用BDP进行数据处理

网址https://me.bdp.cn/index.html#/
通过添加数据设置经纬度和把房屋均价用颜色标识即可实现。

python爬取安居客房屋价格用地图表示出来相关推荐

  1. Python爬取安居客经纪人信息

    Python爬取安居客经纪人信息 Python2.7.15 今天我们来爬取安居客经纪人的信息.这次我们不再使用正则,我们使用beautifulsoup.不了解的可以先看一下这个文档,便于理解.http ...

  2. Python爬取安居客新房信息

    由于是刚开始学习Python爬虫,做个简单的爬虫,提供一个学习思路. 由于水平有限,正则表达式写的实在是抠脚,就直接上BeautifulSoup了. BeautifulSoup的学习参考http:// ...

  3. python爬取安居客网站上北京二手房数据

    目标:爬取安居客网站上前10页北京二手房的数据,包括二手房源的名称.价格.几室几厅.大小.建造年份.联系人.地址.标签等. 网址为:https://beijing.anjuke.com/sale/ B ...

  4. 使用Python爬取安居客二手房房价数据

    作为一个Python新手,公司突然安排我爬取房价数据,真让人有点头大啊!幸好网上的大佬们经验丰富,给予了很多代码上的帮助.本文代码在网友pythoner111爬虫项目–爬取安居客二手房信息的基础上修改 ...

  5. Python爬取安居客房产经纪人信息

    引言 Python开源网络爬虫项目启动之初,我们就把网络爬虫分成两类:即时爬虫和收割式网络爬虫.为了使用各种应用场景,该项目的整个网络爬虫产品线包含了四类产品,如下图所示: Python和相关依赖库的 ...

  6. python爬取安居客二手房网站数据(转)

    之前没课的时候写过安居客的爬虫,但那也是小打小闹,那这次呢, 还是小打小闹 哈哈,现在开始正式进行爬虫书写 首先,需要分析一下要爬取的网站的结构: 作为一名河南的学生,那就看看郑州的二手房信息吧! 在 ...

  7. python爬取安居客二手房网站数据

    前言 本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理. PS:如有需要Python学习资料的小伙伴可以加点击下方链接自行获取 python免费学习资 ...

  8. Python爬取安居客房产经纪人信息采集

    为了使用各种应用场景,该项目的整个网络爬虫产品线包含了四类产品,如下图所示: Python和相关依赖库的安装 运行环境:Windows10 安装Python3.5.2 Lxml 3.6.0 下载网页内 ...

  9. Python爬虫实战-详细讲解爬取安居客房价数据

    最近在尝试用python爬取安居客房价数据,在这里给需要的小伙伴们提供代码,并且给出一点小心得. 首先是爬取之前应该尽可能伪装成浏览器而不被识别出来是爬虫,基本的是加请求头,但是这样的纯文本数据爬取的 ...

  10. python爬取房源数据_python爬取安居客二手房网站数据(实例讲解)

    是小打小闹 哈哈,现在开始正式进行爬虫书写首先,需要分析一下要爬取的网站的结构:作为一名河南的学生,那就看看郑州的二手房信息吧! 在上面这个页面中,我们可以看到一条条的房源信息,从中我们发现了什么,发 ...

最新文章

  1. retrofit 源码分析
  2. 将中缀表达式转化为后缀表达式
  3. VDI序曲二 RemotoAPP晋级篇
  4. 树形控件CTreeCtrl的使用详解(一)
  5. ASP.NET MVC 3: Razor视图引擎中 @: 和text 语法【转载】
  6. Linux 命令之 declare -- 声明或显示 shell 变量
  7. spring boot (整合redis)
  8. 如何用100美元和TensorFlow来造一个能“看”东西的机器人
  9. 微博python爬虫,每日百万级数据
  10. linux下tomcat8安装详解(附图解步骤)
  11. 开始学习ruby,对此语言的简介
  12. 异速联应用交付解决方案的优势
  13. html transition属性,Transition属性详解
  14. 基频和倍频的概念_基频峰,泛频峰,倍频峰,二倍频峰的区别
  15. 我的世界服务器启动显示非正常,大佬们,HMCL启动提示非正常退出,请帮我看看怎么回事。...
  16. 静态博客网站——vuepress功能进化
  17. 树莓派有线网络设置_树莓派设置固定IP之有线网和无线网方法
  18. Bypass一款不错的分流抢票助手工具
  19. linux下防火墙iptables用法规则详解
  20. 数字电子技术基础实验 实验一 门电路的逻辑功能及参数测试(含数据及思考题)

热门文章

  1. Wendy Shijia 的「 Escher‘s Gallery」可视化作品复现系列文章(三)
  2. staruml怎么画协作图_er图怎么画?轻松绘制专业er图的软件
  3. c语言编程科学计数法,c语言编程 科学计数法
  4. 计算机软件配置项csci
  5. Java使用百度翻译api
  6. PASCAL VOC 2012
  7. 操作记录-2020-11-13:精简代码处理ChIP_seq数据
  8. 华为设备OSPF配置命令
  9. Quartus II 11.0 破解补丁
  10. 基于JAVA高校实习实训管理系统计算机毕业设计源码+数据库+lw文档+系统+部署