Python+selenium 抓取美团单页接口数据里的用户评论和用户名称以及打分数据

一、原理

selenium的原理很简单，就是模拟人对浏览器的操作，人是怎么操作的，在编写代码时就以这个为逻辑来进行编写。编写起来很是简单，并且也能够很容易纠错。
缺点是，速度比较慢，抓取起来耗时，并且经常容易弹出验证码，还未找出好办法解决

二、代码部分

代码部分没有加try…except语句，有需要的可以自己添加

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import re
import time
import random
import pandas as pddef spiders():rand = random.randint(5,10)#设置随机时间opt = Options()#规避网站后台可以根据window.navigator.webdriver返回值进行selenium的监测opt.add_experimental_option("excludeSwitches", ['enable-automation'])#为浏览器添加头部信息opt.add_argument('–user-agent="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36"') #开启隐身模式opt.add_argument("–incognito")#传参web = webdriver.Chrome(options=opt)web.get("https://www.meituan.com/meishi/2484377/")web.implicitly_wait(10)#找寻店铺基本信息shop_name = web.find_element_by_xpath("//*[@id=\"app\"]/section/div/div[2]/div[1]/div[1]").text[6::]shop_score = web.find_element_by_xpath("//*[@id=\"app\"]/section/div/div[2]/div[1]/div[2]/p").text[0]address = web.find_element_by_xpath("//*[@id=\"app\"]/section/div/div[2]/div[1]/div[3]/p[1]").text[3::]time.sleep(3)target = web.find_element_by_xpath("//*[@id=\"app\"]/section/div/div[3]/div[1]/div[3]/div[2]/div[2]/div[11]")#需要滑动到评论区才能让浏览器点击到按钮，否则会被遮盖web.execute_script("arguments[0].scrollIntoView();", target)l = []u = []s = []sum = 0for i in range(350):sum += 1print("正在抓取第{}页".format(sum))if web.find_element_by_xpath("//*[@id=\"app\"]/section/div/div[3]/div[1]/div[3]/div[2]/div[2]/div[11]/ul/li[8]/span").get_attribute("iconfont icon-btn_right disabled"):print("完成")else:name = web.find_elements_by_xpath("//div[@class=\"list clear\"]/div[@class=\"info\"]/div[@class=\"name\"]")comment = web.find_elements_by_xpath("//div[@class=\"list clear\"]/div[@class=\"info\"]/div[@class=\"desc\"]")star = web.find_elements_by_xpath("//div[@class=\"source\"]//ul[@class=\"stars-ul stars-light\"]")web.implicitly_wait(3)for u_name in name:l.append(u_name.text)dataframe = pd.DataFrame({"name":l})web.implicitly_wait(3)for u_commt in comment:u.append(u_commt.text)web.implicitly_wait(3)for stars in star:len = stars.get_attribute("style")tar = r" (\d{2,}\.\d{1,2})"star1 = re.findall(tar, len)if star1 != [] :star1 = "".join(star1)user_score = float(star1)/16.8else:user_score =  5.0s.append(user_score)#将拿下来的数据存储到DataFrame里dataframe = pd.DataFrame({"店铺名":shop_name,"店铺评分":shop_score,"地址":address,"用户名":l,"评分": s,"评论":u,})time.sleep(rand)web.find_element_by_xpath("//span[@class = \"iconfont icon-btn_right\"]").click()time.sleep(rand)#将数据存储到excel里，也可以存入csvdataframe.to_excel("美团评论.xlsx",encoding='utf_8_sig')#encoding最好使用utf_8_sig不然容易出现乱码print(spiders())

Python+selenium 抓取美团单页接口数据里的用户评论和用户名称以及打分数据相关推荐

[Python] python + selenium 抓取京东商品数据（商品名称，售价，店铺，分类，订单信息，好评率，评论等）
目录一.环境二.简介三.京东网页分析 1.获取商品信息入口--商品列表链接获取 2.获取商品信息入口--商品详情链接获取 3.商品详情获取 4.商品评论获取四.代码实现五.运行结果六.结语 ...
Python selenium抓取微博内容的示例代码
Selenium简介与安装 Selenium是什么? Selenium也是一个用于Web应用程序测试的工具.Selenium测试直接运行在浏览器中,就像真正的用户在操作一样.支持的浏览器包括IE. ...
python爬携程_用python selenium抓取携程信息
最近在学习selenium,遇到一个很奇怪的问题,debug了半天还是没弄明白,我是在测试抓取携程网站的机票信息我的代码: # -*- coding: utf-8 -*- from selenium ...
python爬取豆瓣读书_用python+selenium抓取豆瓣读书中最受关注图书并按照评分排序...
抓取豆瓣读书中的(http://book.douban.com/)最受关注图书,按照评分排序,并保存至txt文件中,需要抓取书籍的名称,作者,评分,体裁和一句话评论方法一:#coding=utf-8 ...
python爬取携程网游记_网页爬虫 - 用python selenium抓取携程信息
问题最近在学习selenium,遇到一个很奇怪的问题,debug了半天还是没弄明白,我是在测试抓取携程网站的机票信息我的代码: # -*- coding: utf-8 -*- from sele ...
用python+selenium抓取豆瓣电影中的正在热映前12部电影并按评分排序
抓取豆瓣电影(http://movie.douban.com/nowplaying/chengdu/)中的正在热映前12部电影,并按照评分排序,保存至txt文件 1 #coding=utf-8 2 f ...
python 爬取下一页_如何使用Beautifulsoup在python中抓取下一页
通过提取"转到最后一页"元素的page参数来确定最后一页.并通过^{}遍历每个维护web抓取会话的页面:import re import requests from bs4 imp ...
selenium抓取_使用Selenium的网络抓取电子商务网站
selenium抓取 In this article we will go through a web scraping process of an E-Commerce website. I hav ...
Python爬虫实战八之利用Selenium抓取淘宝匿名旺旺
其实本文的初衷是为了获取淘宝的非匿名旺旺,在淘宝详情页的最下方有相关评论,含有非匿名旺旺号,快一年了淘宝都没有修复这个. 很多人学习python,不知道从何学起. 很多人学习python,掌握了基本语 ...

Python+selenium 抓取美团单页接口数据里的用户评论和用户名称以及打分数据

一、原理

二、代码部分

Python+selenium 抓取美团单页接口数据里的用户评论和用户名称以及打分数据相关推荐

最新文章

热门文章