爬虫代码:

import time

import csv

import re

from urllib import request

import json

c=open(r'D:\安吉竹博园开元度假村.csv','a+',newline='',encoding='utf8')

fieldnames=['user','time','score','content']

writer=csv.DictWriter(c,fieldnames=fieldnames)

writer.writeheader()

def getResponse(url):

'''

请求头信息(data)通过下面这个网站(据说是手机端网页)获得,其中26683709是酒店的ID号,而酒店的ID号就在原始网站的网页链接中

https://m.ctrip.com/webapp/hotel/HotelDetail/dianping/26683709.html

携程上安吉竹博园开元度假村原始网站:https://hotels.ctrip.com/hotel/26683709.html?isFull=F&masterhotelid=26683709&hcityid=659#ctm_ref=hod_sr_lst_dl_n_1_6

'''

data = {"hotelId": 26683709, "pageIndex": 2, "tagId": 0, "pageSize": 10, "groupTypeBitMap": 2,"needStatisticInfo": 0, "order": 0, "basicRoomName": "", "travelType": -1,"head": {"cid": "09031174312350135405", "ctok": "", "cver": "1.0", "lang": "01", "sid": "8888","syscode": "09", "auth": "93C8AE20D20009DC90E6E10BB588DE61E67EBBC236DE15433FDDADFD95636F28", "extension": []}}

data = json.dumps(data).encode(encoding='utf-8')#封装请求信息.json.dumps()用于将字典形式的数据转化为字符串

header_dict = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',

"Content-Type": "application/json"}

url_request = request.Request(url=url, data=data, headers=header_dict)

url_response = request.urlopen(url_request)

return url_response

datas = []#存放生成的多个请求头

for j in range(6):

#使用"pageIndex":str(j + 1)进行翻页

data1 = {"hotelId": 26683709, "pageIndex":str(j + 1), "tagId": 0, "pageSize": 10, "groupTypeBitMap": 2,"needStatisticInfo": 0, "order": 0, "basicRoomName": "", "travelType": -1,"head": {"cid": "09031174312350135405", "ctok": "", "cver": "1.0", "lang": "01", "sid": "8888","syscode": "09", "auth": "93C8AE20D20009DC90E6E10BB588DE61E67EBBC236DE15433FDDADFD95636F28", "extension": []}}

datas.append(data1)

for k in datas[:6]:

print('正在抓取第' + k['pageIndex'] + "页")

time.sleep(3)

http_response = getResponse("http://m.ctrip.com/restapi/soa2/16765/gethotelcomment?_fxpcqlniredt=09031144211504567945")

html = http_response.read().decode('utf-8') #返回的是字符串格式的html网页

html = json.loads(html)#json.loads将字符串格式的数据转为字典格式,方便后面进行信息提取

comments = html['othersCommentList']

for i in comments:

user = i['id']

time1 = i['postDate']

score = i['ratingPoint']

content = i['content']

content = re.sub(" ", "", content)

content = re.sub("", "", content)

writer.writerow({'user': user, 'time': time1, 'score': score, 'content': content})

c.close()

import pandas as pd

pd = pd.read_csv('D:\\安吉竹博园开元度假村.csv',encoding='utf8')

print(pd.head(5))

pd.to_excel('D:\\安吉竹博园开元度假村.xlsx',encoding='utf8')

注释:

①同一酒店的手机端样式如下,可以在电脑打开手机端网页~然后按F12,查看网页元素,如果没有反映,刷新网页即可。找到并点击上面页面中的“网络”,一般来说,按F12后会自动定位到“网络”这个选项。然后在“过滤URL”框中输入“comment”,实现对URL的筛选,由于我们需要的是用户的评论数据,所以输入的关键词是comment。同一酒店,手机端网页

然后,找到并点击“请求”按钮,我们需要的爬虫参数全都出来了~如果没有出来,记得刷新网页!!!

②单个网页爬取到的数据如下,为了方便演示,这里只展示了一个用户的评论信息:

可以看到,我们需要的评论数据所在位置信息:

{……

"othersCommentList":[

“id":463405361,

"postDate":"2020-10-09",

"content":"早餐丰盛,可以吃到10点,睡个懒觉不影响,周边环境优美,竹博园很近就对面,开车去镇上15分钟路程",

……

"ratingPoint":5.0,

……

]

……}

{"ResponseStatus":{"Timestamp":"/Date(1604107459363+0800)/","Ack":"Success","Errors":[],"Extension":[{"Id":"request-id","Value":"ab464e9e-34d9-4844-b094-01cc37a0e02e"},{"Id":"CLOGGING_TRACE_ID","Value":"6206733633293836208"},{"Id":"RootMessageId","Value":"100025527-0a3d537c-445585-3179358"}]},

"tdk":{"title":"安吉竹博园开元度假村点评-安吉竹博园开元度假村怎么样-【携程旅行】","description":"携程旅行为您提供安吉竹博园开元度假村真实的服务点评、设施点评,帮您更好地入住安吉竹博园开元度假村,真实详细的安吉竹博园开元度假村点评、酒店图片信息,手机订安吉竹博园开元度假村,来携程旅行享受有房保证!","keywords":"安吉竹博园开元度假村怎么样,安吉竹博园开元度假村点评,携程酒店,手机订酒店,携程旅行"},

"statisticInfo":{"ratingAll":0.0,"commentDesc":"","recommendRate":100,"healthPoint":0.0,"environmentPoint":0.0,"servicePoint":0.0,"facilityPoint":0.0,"tagList":[{"id":"115","name":"早餐丰富","commentCount":69,"type":1},{"id":"104","name":"服务周到","commentCount":67,"type":1},{"id":"108","name":"环境不错","commentCount":44,"type":1},{"id":"132","name":"风景很好","commentCount":39,"type":1},{"id":"106","name":"位置好","commentCount":32,"type":1},{"id":"112","name":"设施齐全","commentCount":31,"type":1},{"id":"155582","name":"面朝安吉竹博园","commentCount":4,"type":1}],"tabList":[{"id":0,"name":"全部","count":0},{"id":3,"name":"有图","count":0},{"id":2,"name":"差评","count":0}]},

"nextGroupTypeBitMap":2,

"isLastPage":0,

"myCommentList":[],

"othersCommentList":[

{"id":463405361,

"baseRoomId":111904627,

"baseRoomName":"竹博园高级园景双床房",

"checkInDate":"2020-10",

"postDate":"2020-10-09",

"content":"早餐丰盛,可以吃到10点,睡个懒觉不影响,周边环境优美,竹博园很近就对面,开车去镇上15分钟路程",

"highlightPosition":"",

"feedbackList":[{"title":"酒店回复", "content":"尊敬的顾客:非常感谢您对安吉竹博园开元度假村的认可与支持!这好评看得我们心里美滋滋哒!“用心”为每一个客人服务,想你所想,入住期间有任何需求,一直以来是我们所坚持的 。欢迎下次入住体验哦。","source":2,"imageList":[]}],

"hasHotelFeedback":1,

"isCanFeedback":0,

"imageList":[

{"smallImage":"https://dimg04.c-ctrip.com/images/0236j1200086e6iwbE32D_C_150_150_Q50.jpg",

"bigImage":"https://dimg04.c-ctrip.com/images/0236j1200086e6iwbE32D_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/0231m1200086e83rb32B2_C_150_150_Q50.jpg",

"bigImage":"https://dimg04.c-ctrip.com/images/0231m1200086e83rb32B2_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/023271200086e7tpn38D5_C_150_150_Q50.jpg",

"bigImage":"https://dimg04.c-ctrip.com/images/023271200086e7tpn38D5_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/023251200086e72oi4813_C_150_150_Q50.jpg",

"bigImage":"https://dimg04.c-ctrip.com/images/023251200086e72oi4813_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/0231p1200086e6fpy33A3_C_150_150_Q50.jpg",

"bigImage":"https://dimg04.c-ctrip.com/images/0231p1200086e6fpy33A3_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/0233n1200086e5kcjCB9A_C_150_150_Q50.jpg",

"bigImage":"https://dimg04.c-ctrip.com/images/0233n1200086e5kcjCB9A_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/0234t1200086e5sfy6D69_C_150_150_Q50.jpg",

"bigImage":"https://dimg04.c-ctrip.com/images/0234t1200086e5sfy6D69_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/0234v1200086e4bg727E9_C_150_150_Q50.jpg",

"bigImage":"https://dimg04.c-ctrip.com/images/0234v1200086e4bg727E9_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/0236o1200086e5z2lB0AC_C_150_150_Q50.jpg",

"bigImage":"https://dimg04.c-ctrip.com/images/0236o1200086e5z2lB0AC_W_640_640_Q50.jpg"}],

"ratingPoint":5.0,

"ratingPointDesc":"超棒",

"travelType":"家庭亲子",

"userNickName":"西米露",

"userPicture":"/fd/headphoto/g6/M08/4A/0F/CggYtFbXq-qAVnCpAADk_ApbOhE423.jpg",

"commenterGrade":2,

"userCommentCount":10,

"userImageCount":24,

"userCommentUsefulCount":0,

"usefulNumber":0,

"canClickUseful":1,

"source":1,

"ctripBookRemark":"",

"orderId":"13431412171",

"isAnonymous":"N"}],

"hasHotelFeedback":1,

"isCanFeedback":0,

"imageList":[{"smallImage":"https://dimg04.c-ctrip.com/images/0232k1200086hx9grF96E_C_150_150_Q50.jpg","bigImage":"https://dimg04.c-ctrip.com/images/0232k1200086hx9grF96E_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/0233a1200086hx3sh92E0_C_150_150_Q50.jpg","bigImage":"https://dimg04.c-ctrip.com/images/0233a1200086hx3sh92E0_W_640_640_Q50.jpg"},

{"smallImage":"https://dimg04.c-ctrip.com/images/0236j1200086hx5a5E91F_C_150_150_Q50.jpg","bigImage":"https://dimg04.c-ctrip.com/images/0236j1200086hx5a5E91F_W_640_640_Q50.jpg"}],

"ratingPoint":5.0,

"ratingPointDesc":"超棒",

"travelType":"朋友出游",

"userNickName":"_WeChat26405",

"roomFilterList":[],

"basicRoomFilterList":[],

"totalCommentCount":0,

"enableShowCtripComment":false}

python爬取固定酒店评论_爬取携程上酒店评论数据相关推荐

  1. python爬携程上出境游数据_python爬取携程旅游评价信息词云图分析

    python爬取携程旅游评价信息词云图分析 前言 前面咱们已经分析过如何爬取携程旅游的相关信息,如果没有看过的,可以先跳转看一下前面的那篇博客:python 爬虫 一键爬取携程旅游团数据 这一篇呢,咱 ...

  2. Pyhton携程上酒店每个房型价钱详细的爬虫

    最近由于工作需要,做了一个携程酒店的爬虫,初步达到了一些成果. 由于python是半路出家(之前笔者是学习java的,并且现在也一直在学习.),在把项目分 享出来看能否给网友们一些帮助的同时,也希望有 ...

  3. python 爬取携程旅游景点评论

    python爬取携程旅游景点评论 爬取网址:https://you.ctrip.com/ 爬取评论全部代码 import requests import json import re import t ...

  4. python爬取携程景区用户评论

    python爬取携程景区用户评论(爬虫时遇到翻页但url不变问题时的解决办法) python爬取携程景区用户评论 Ajax页面加载的页面和普通页面的不同 解决办法 效果 python爬取携程景区用户评 ...

  5. 酒店卫生问题频起 MobData公布酒店行业最新用户画像 携程用户最有钱

    分析师:时嘉遥 核心观点: 1.酒店行业发展概况 2018年国内旅游人次有望突破60亿,大众旅游时代已然来临,刺激酒店业发展.星级酒店不断减少,共享住宿处于高速增长期. 2.酒店用户多维洞察 北上广依 ...

  6. 下取整函数的含义_向上取整⌈⌉和向下取整⌊⌋符号含义及应用

    向上取整, 运算称为 Ceiling,用数学符号 ⌈⌉  (上有起止,开口向下)表示,. 向下取整,运算称为 Floor,用数学符号⌊⌋(下有起止,开口向上)表示. 注意,向上取整和向下取整是针对有浮 ...

  7. 同大取大同小取小口诀图解_不等式取值口诀

    不等式就是用大于,小于,大于等于,小于等于连接而成的数学式子.不等式的取值口诀为同大取大,同小取小.大大小小没有解,大小小大取中间. 不等式取值口诀 同大取大,同小取小. 大大小小没有解,大小小大取中 ...

  8. 机器学习模型在携程海外酒店推荐场景中的应用

    导读 互联网企业的核心需求是"增长",移动互联时代下的在线旅游业也不例外.随着大数据.云计算和人工智能等技术的不断进步,通过算法和模型来实现增长已成为核心. 近年来推荐系统迅速崛起 ...

  9. 干货 | 机器学习模型在携程海外酒店推荐场景中的应用

    "关于作者:Louisa,携程算法工程师,热爱前沿算法和技术在个性化推荐和广告建模等业务的性能优化和落地. 大数据产业创新服务媒体 --聚焦数据 · 改变商业 导读 互联网企业的核心需求是& ...

  10. 滴滴打车比别人贵?携程订酒店比别人贵?商家拿你的数据去作恶你知道吗?

    在庞大的数据面前,人类越来越像一个提供输入的变量角色,任何试图伪装和保护自己的举动,在360度无死角的数据监控下都显得徒劳.你使用的APP,在试图了解和定义你. 律师李欣然(化名)第一次对某大型旅行票 ...

最新文章

  1. XamarinSQLite教程添加测试数据
  2. centos7 yum 安装 python3
  3. CKEditor上传视频(java)
  4. [C/C++]BKDRHash
  5. すぬけ君の塗り絵 / Snuke's Coloring(AtCoder-2068)
  6. Python 教你 4 行代码开发新闻网站通用爬虫
  7. 为什么从事软件开发测试?
  8. 19种音频格式介绍及音质压缩比的比较
  9. 高斯定理在神经网络上的投影
  10. 人工神经网络:径向基函数神经网络
  11. rtx3060ti、rtx3060和rtx2080ti 参数对比哪个好 差距大不大
  12. php zen kaku代表什么,Convert kana one from another (zen-kaku, han-kaku and more) - PHP 7 中文文档...
  13. 通俗地讲解目标检测中AP指标
  14. 山腰中的AMD,向前是唯一的选择
  15. 【CSS】相对长度单位 绝对长度单位,vw/vh , rem等
  16. Flink:集群异常问题记录
  17. 无法启动此程序,因为计算机中丢失 api-ms-win-crt-stdio-l1-1-0.dll
  18. Fabric.js 橡皮擦的用法(包含恢复功能)
  19. 开发日记之linux杀进程与挂进程去后台运行
  20. 产业新闻-2006.06.02-05

热门文章

  1. shiro原理_java:shiro高级篇——4
  2. SAN和NAS、ISCSI存储有什么区别,SAN和NAS设备哪个更好?
  3. 苹果恢复出厂设置报4013错误问题
  4. web3(httpd2.4)
  5. 图文详解,浪涌保护器(SPD)的参数解读与选用
  6. 阿里云ECS安骑士离线修复步骤
  7. MPU6050读取实验
  8. 球面投影全景图快速拼接
  9. 基于 MindStudio 完成 SE-ResNeXt101- PyTorch 模型开发
  10. java isprime函数,Java - isPrime函数