爬取小猪网的短租房信息

爬取小猪网的短租房信息的实现

#小猪网爬虫2.0
#功能：实现爬取多页面，并将图片和CSV文件存入桌面文件夹
from PIL import Image
import requests
from bs4 import BeautifulSoup
import re
import os
from io import BytesIO
import csv
import pandas as pd
import time#寻找房源价格
def get_prices(soup):prices_list=[]prices=soup.select('#page_list > ul > li > div > div > span')for price in prices:prices_list.append(price.text)return prices_list#找到图片链接
def get_images(links):images_list=[]for link in links:link=str(link)regex=r' lazy_src="(.*?)"'regex=re.compile(regex)result=re.findall(regex,link)images_list.append(result)return images_list#查找地址
def get_address(soup):address_list=[]for i in range(24):address=soup.select('#page_list > ul > li:nth-of-type({}) > div.result_btm_con.lodgeunitname > div.result_intro > a > span'.format(i+1))address_list.append(address[0].text)return address_list#查找房东主页
def get_room_host_info(soup):host_info=[]for i in range(24):title=soup.select('#page_list > ul > li:nth-of-type({}) > a > img'.format(i+1))host=title[0].get('data-growing-title')host_info.append(host)return host_info#判断性别
def judeg_gender(gender):if gender[0]=='member_girl_ico':gender='女'else:gender='男'return gender#查找房东性别和名字
def get_gender_and_name(host_info):gender_list=[]name_list=[]for i in range(24):headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36"}url='https://sh.xiaozhu.com//fangzi//{}.html'.format(host_info[i])host_res=requests.get(url,headers=headers)host_soup=BeautifulSoup(host_res.text,'lxml')gender=host_soup.select('#floatRightBox > div.js_box.clearfix > div.w_240 > h6 > span')name=host_soup.select('#floatRightBox > div.js_box.clearfix > div.w_240 > h6 > a')gender=gender[0].get('class')gender=judeg_gender(gender)gender_list.append(gender)name_list.append(name[0].text)gender_and_name=[gender_list,name_list]time.sleep(2)return gender_and_name#主函数
def main(i):#定义Headers和URLheaders={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36"}url='https://sh.xiaozhu.com//search-duanzufang-p{}-0//'.format(i+1)#小猪网查找北京的房子#从网站上获取信息，放入BeautifulSoupres=requests.get(url,headers=headers)soup=BeautifulSoup(res.text,'lxml')links=soup.select('#page_list > ul > li > a')print('已获取小猪网的URL{}遍...'.format(i+1))prices_list=get_prices(soup)print('正在查找房源价格{}遍...'.format(i+1))images_list=get_images(links)print('正在查找图片链接{}遍...'.format(i+1))address_list=get_address(soup)print('正在查找房源地址{}遍...'.format(i+1))host_info=get_room_host_info(soup)print('搜寻房东页面的URL{}遍'.format(i+1))gender_and_name=get_gender_and_name(host_info)print('正在查找房东性别和姓名...{}遍'.format(i+1))gender_list=gender_and_name[0]name_list=gender_and_name[1]#将所有列表装入数据information=pd.DataFrame({'地址':address_list,'价格':prices_list,'名字':name_list,'性别':gender_list,'图片链接':images_list})#将数据写入CSV文件information.to_csv('Room_Data{}.csv'.format(i+1),index=0,sep=',')print('数据已经写入CSV文件{}遍'.format(i+1))if __name__=='__main__':for i in range(10):try:main(i)print('程序已执行{}遍'.format(i+1))time.sleep(10)print('sleeping...')except :print('对方反爬虫机制已启动，请前往小猪网滑动验证方块')answer=input('是否继续(Y/N):')if answer=='Y':main(i)else:print('程序已退出')

爬取小猪网的短租房信息相关推荐

爬取自如网站杭州市的租房信息
爬取自如网站杭州市的租房信息最近看到自如网的整体网页结构比较简洁,因此尝试获取一下杭州市的租房情况,简单做一个分析. 需要获取的内容如图所示 1.获取网页内容 web_url='http://hz. ...
爬虫练习- 爬取转转网二手市场商品信息
前言: 使用多进程爬虫方法爬取转转网二手市场商品信息,并将爬取的数据存储于MongoDB数据库中本文为整理代码,梳理思路,验证代码有效性--2020.1.18 环境: Python3(Anacond ...
北京房租到底有多高？ | 爬取北京海淀区一居室租房信息
图片来源:花瓣网文章来源人工智能与大数据生活如需转载,请联系原作者授权最近北京房租成了热门话题,到底北京的房租有多高? 本次实战是爬取北京海淀区一居室的租房信息,共爬取了300套房源信息,看一 ...
python爬虫爬取东方财富网股票走势+一些信息
一.目标我们的目标是爬取东方财富网(https://www.eastmoney.com/)的股票信息我的目标是爬取100张股票信息图片经过实际测试我的爬取范围为000001-000110,000 ...
简单python爬虫案例(爬取慕课网全部实战课程信息)
技术选型下载器是Requests 解析使用的是正则表达式效果图: 准备好各个包 # -*- coding: utf-8 -*- import requests #第三方下载器 import re ...
Python爬取抖音 App短视频信息
1. Charles获取视频接口爬取之前先将手机与PC至于同局域网并确保手机WIFI的代理端口为8888,然后打开Charles获取视频请求的链接,如图: 2. 手动上滑触发视频请求接口自动 ...
使用Scrapy框架爬取58同城的出租房信息
from scrapy.exceptions import DropItem from pymongo import MongoClient from scrapy.conf import setti ...
爬虫实践-爬取转转网二手市场商品信息
channel_extract.py: import requestsfrom lxml import etree # 请求URLstart_url = 'http://cs.58.com/sale. ...
Scrapy爬取当当网图书销售前100
scrapy爬取当当网图书畅销榜一.采集任务爬取当当网图书畅销榜信息,获取热销图书前500相关数据. 二.网页解析 1. 打开当当网,按照图书榜>图书畅销榜进入当当网图书畅销榜[http: ...

爬取小猪网的短租房信息

爬取小猪网的短租房信息相关推荐

最新文章

热门文章