爬取去哪儿网酒店信息

不说太多废话，就简单一句：你们你要爬哪里可以把地点改一下，还有时间改一下，爬取数量自己修改参数和代码，变化不大。有问题请留言，我不再次废话分析（这里我爬取的上海最近的酒店信息）

# coding=utf-8
import csv#用来储存文件的模块
import time
import requests
import json
import pandas as pd#excel出处理# 区域店铺id ct_Poi cateName抓取，传入参数为区域id
def crow_id(city):url = 'https://wxapp.qunar.com/api/hotel/hotellist'#目标网址headers = {"wx-v": "","content-type": "application/json","Connection": "Keep-Alive","Accept-Encoding": "gzip","wx-q": "","unionid": "ovaMOwE6dQvbGOmZjLLPaGSM5ZtU","openid": "oIjYJ0TuQcTF_WTWsKcUPR1cRJI0","wx-t": "","User-Agent": "Mozilla/5.0 (Linux; Android 6.0.1; OPPO A57 Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.91 Mobile Safari/537.36 MicroMessenger/6.7.2.1340(0x2607023A) NetType/WIFI Language/zh_CN","charset": "utf-8","referer": "https://servicewechat.com/wx799d4d93a341b368/114/page-frame.html","Host": "wxapp.qunar.com","Cookie": "QN48=tc_437f21c62a765ca0_165c198a408_e56b; QN1=qunar; QN66=smart_app; QN1=O5cv+luWLPthsvB1BKl0Ag==","Content-Length": "0",}#请求头和cookiep0 = {'http': 'http://101.132.122.230:3128'}p1 = {'http': 'http://114.113.126.83:80'}p2 = {'http': 'http://210.45.123.127:9999'}p3 = {'http': 'http://118.190.217.182:80'}p4 = {'http': 'http://120.27.14.125:80'}p5 = {'http': 'http://118.31.223.194:3128'}p6 = {'http': 'http://101.37.79.125:3128'}p7 = {'http': 'http://125.62.26.197:3128'}p8 = {'http': 'http://218.60.8.98:3129'}p9 = {'http': 'http://114.215.95.188:3128'}p10 = {'http': 'http://218.60.8.99:3129'}p11 = {'http': 'http://218.60.8.83:3129'}p12 = {'http': 'http://118.190.217.61:80'}p13 = {'http': 'http://203.86.26.9:3128'}p14 = {'http': 'http://114.113.126.87:80'}p15 = {'http': 'http://106.12.32.43:3128'}#爬取不同页网址p = p1page = 1#抓取我们需要的数据data = {"city": city,"cityUrl": "","page": page,"extra": "{}","sort": "","keywords": "","checkOutDate": "2020-10-29","checkInDate": "2020-10-29","locationAreaFilter": "","comprehensiveFilter": "[]","fixedComprehensiveFilter": "[]","SDKVersion": "2.2.4","wxUnionId": "ovaMOwE6dQvbGOmZjLLPaGSM5ZtU","wxOpenId": "oIjYJ0TuQcTF_WTWsKcUPR1cRJI0","bd_source": "smart_app","bd_origin": "pt-onl-ots-ggjd",}r = requests.post(url, headers=headers, params=data, proxies=p)result = json.loads(r.text)pages = result['data']['totalPage']# pages=586hotel = result['data']# attrs = hotel['attrs']print("当前总页数：",pages)print("Page:%d" %page)print(len(hotel), pages)df = pd.DataFrame(data=hotel['hotels'])df.to_csv('qunaer9.csv', mode='a', header=False)df.drop(df.index, inplace=True)if pages > 1:pages = pages - pagepage +=1while pages >=0:data2 = {"city": city,"cityUrl": "","page": page,"extra": "{}","sort": "","keywords": "","checkOutDate": "2020-11-2","checkInDate": "2020-11-1","locationAreaFilter": "","comprehensiveFilter": "[]","fixedComprehensiveFilter": "[]","SDKVersion": "2.2.4","wxUnionId": "ovaMOwE6dQvbGOmZjLLPaGSM5ZtU","wxOpenId": "oIjYJ0TuQcTF_WTWsKcUPR1cRJI0","bd_source": "smart_app","bd_origin": "pt-onl-ots-ggjd",}try:r = requests.post(url, headers=headers, params=data2, proxies=p)print(len(hotel), pages)print(page)result = json.loads(r.text)hotel = result['data']# attrs = hotel['attrs']df = pd.DataFrame(data=hotel['hotels'])df.to_csv('qunaer9.csv',mode='a',header=False)df.drop(df.index,inplace=True)except Exception as e:print(e)finally:print("Page：%d" %page)pages -= 1page = page+1time.sleep(3.1)if __name__ == '__main__':a = {"areaObj": {"上海": [{"city": '上海'}]}}datas = a['areaObj']b = datas.values()area_list = []for data in b:for d in data[0:]:area_list.append(d)l = 0old = time.time()for i in range(len(area_list)):print("开始抓取%s区域：" % (area_list[i]['city']))crow_id(area_list[i]['city'])

爬取去哪儿网酒店信息相关推荐

python爬取去哪儿网酒店信息
python爬取去哪儿网酒店信息利用selenium+python爬取去哪儿网酒店信息,获取酒店名称.酒店地址.第一条评论.评论数.最低价格等信息,写入excel表. 1.观察网页结构浏览器地址栏 ...
爬取去哪儿网酒店信息,再利用百度API将酒店地址的经纬度爬取！
Python3 的 selenium库可以模拟打开页面,获得加载完成的页面信息,一些基本用法,请自行度娘,就不再赘述了 1.将某城市的所有酒店链接爬取下来. 先打开两个酒店页面 ** https:// ...
【爬虫】用Python爬取去哪儿网热门旅游信息（并打包成旅游信息查询小工具）
以下内容为本人原创,欢迎大家观看学习,禁止用于商业用途,谢谢合作! ·作者:@Yhen ·原文网站:CSDN ·原文链接:https://blog.csdn.net/Yhen1/article/det ...
python selenium爬取去哪儿网的酒店信息——详细步骤及代码实现
目录准备工作一.webdriver部分二.定位到新页面三.提取酒店信息 ??这里要注意?? 四.输出结果五.全部代码准备工作 1.pip install selenium 2.配置浏览器驱 ...
python爬取酒店信息_python selenium爬取去哪儿网的酒店信息（详细步骤及代码实现）...
准备工作 1.pip install selenium 2.配置浏览器驱动.配置其环境变量 Selenium3.x调用浏览器必须有一个webdriver驱动文件 Chrome驱动文件下载chromed ...
python爬虫去哪儿网_大型爬虫案例：爬取去哪儿网
世界那么大,我想去看看.相信每到暑假期间,就会有很多人都想去旅游.但是去哪里玩,没有攻略这又是个问题.这次作者给大家带来的是爬取去哪网自由行数据.先来讲解一下大概思路,我们去一个城市旅行必定有一个出发 ...
使用Python requests和BeautifulSoup库爬取去哪儿网
功能说明:爬取去哪儿网城市下面若干条景点详细信息并将数据导入Excel表(使用xlwt库) 爬取去哪儿网的教程参考自 https://blog.csdn.net/gscsd_t/article/det ...
用Python爬取淘宝网商品信息
用Python爬取淘宝网商品信息转载请注明出处网购时经常会用到淘宝网点我去淘宝但淘宝网上的商品琳琅满目,于是我参照中国大学 MOOC的代码写了一个爬取淘宝网商品信息的程序代码如下: impor ...
使用python+selenium爬取同城旅游网机票信息
最近使用python+selenium爬取了同城旅游网机票信息相关主要代码如下,通过模拟人为操作,拿下了这个机票列表的html代码,然后就可以使用xpath或者re等方式从中提取需要的字段信息了. ...

爬取去哪儿网酒店信息

爬取去哪儿网酒店信息相关推荐

最新文章

热门文章