Pyppeteer+Python爬取京东商品详情demo

前言：近期总有刚接触Pyppeteer的同学提问，今日得闲索性写个demo，供大家参考。

安装Pyppeteer环节省略，自行查阅相关文档。

下述代码业务场景：用户输入关键字，脚本采集并存储与该关键字相应所有商品的SKU、标题、价格、店铺名、评论数、优惠活动、链接。

备注：未采集评论详情，本脚本已经拿到SKU再想拿评论详情不要太简单，我就不写了。（提供有偿服务）

代码块：

# coding:'utf-8'
import asyncio
import tkinter
import random
import pymysql
from pyppeteer import launch
from bs4 import BeautifulSoupdef screen_size():tk = tkinter.Tk()width = tk.winfo_screenwidth()height = tk.winfo_screenheight()tk.quit()return width, heightasync def awaiting(page):for i in range(6):await asyncio.sleep(1)await page.evaluate('window.scrollBy(2000, window.innerHeight)')return Noneasync def openPage(url, keyword):browser = await launch({"headless": False,'args': ['--disable0000-extensions','--hide-scrollbars','--disable-bundled-ppapi-flash','--mute-audio','--no-sandbox','--start-maximized','--disable-dev-shm-usage','--disable-setuid-sandbox','--disable-gpu','--disable-infobars']})context = await browser.createIncognitoBrowserContext()print("已打开浏览器")page = await context.newPage()await page.goto(url)width, height = screen_size()await page.setViewport({"width": width, "height":height})await page.setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36")await page.evaluate('''() =>{ Object.defineProperties(navigator,{ webdriver:{ get: () => false } }) }''')# await page.screenshot({'path': str(random.randint(10000, 99999)) + ".png"})await page.type("#key",keyword, {"delay": random.randint(100, 150)-50})await page.click(".button")await asyncio.sleep(random.randint(1, 2))await awaiting(page)pageNumber = await page.evaluate("""document.querySelector('#J_bottomPage > span.p-skip > em:nth-child(1) > b').innerText;""")print("共有%s页"%pageNumber)db = pymysql.connect(host='127.0.0.1', port=3306, user="MySQL用户名", passwd="MySQL密码", db="数据库名")cursor = db.cursor()for i in range(int(pageNumber)):await awaiting(page)count = await page.evaluate("""document.querySelector('.gl-warp.clearfix').childElementCount;""")print("本页共有%s条数据" % count)for y in range(count):# await page.evaluate('window.scrollBy(2000, window.innerHeight)')  # 滚动到底部await asyncio.sleep(1)y += 1sku = BeautifulSoup(await page.evaluate("""document.querySelector('#J_goodsList > ul > li:nth-child(%s) > div > div.p-price').innerHTML"""%y),"lxml")sku_ = sku.find('strong')['class'][0].strip("J_")price = await page.evaluate("""document.querySelector('#J_goodsList > ul > li:nth-child(%s) > div > div.p-price > strong > i').innerText;"""%y)soup = BeautifulSoup(await page.evaluate("""document.querySelector('#J_goodsList > ul > li:nth-child(%s) > div > div.p-name.p-name-type-2').innerHTML;"""%y),"lxml")link = soup.find('a')['href']title = await page.evaluate("""document.querySelector('#J_goodsList > ul > li:nth-child(%s) > div > div.p-name.p-name-type-2 > a > em').innerText;"""%y)commet = await page.evaluate("""document.querySelector('#J_goodsList > ul > li:nth-child(%s) > div > div.p-commit').innerText;"""%y)shop = await page.evaluate("""document.querySelector('#J_goodsList > ul > li:nth-child(%s) > div > div.p-shop > span > a').innerText;"""%y)tips = await page.evaluate("""document.querySelector('#J_pro_%s').innerText;"""%sku_)sql = """insert into JDshoplist(title,shop,link,price,commet,tips,keyword) values ("%s","%s","%s","%s","%s","%s","%s")"""%(title,shop,link,price,commet,tips,keyword)cursor.execute(sql)db.commit()print("已采集完第{}页的商品信息，本页共有{}条商品信息。".format(i,count))await page.click(".pn-next")cursor.close()db.close()if __name__ == '__main__':url = "https://www.jd.com/"keyword = "iphone13"loop = asyncio.get_event_loop()loop.run_until_complete(openPage(url, keyword))

数据库：

Pyppeteer+Python爬取京东商品详情demo相关推荐

Python爬取京东商品评论数据
一.前言本文将承接笔者的上篇文章Python爬取京东商品数据,使用京东商品唯一id字段"sku"拼接url,得到真实的京东商品评论网址,进而解释数据并保存到文件中,所爬取到的数据 ...
Python爬取京东商品评论
京东商城的页面不是静态页面,其评论信息存放于json文件中,由ajax控制,所以我们在浏览器上看到的网页源代码和用Python爬下来的是不一样的,所以我们真正要爬取的是评论数据所存放的json文件. ...
Python爬取京东商品评论和图片下载
Python爬取京东商品评论和图片下载代码仅供学习使用,切勿扩散并且频繁爬取网站贴上代码和注释,方便自己下次看 import requests import time import json im ...
爬取京东商品详情页信息
之前写过爬取京东商品导航信息,现在献上爬取京东商品详情页信息. #爬取京东商品详情页信息 #2017/7/30import requests from bs4 import BeautifulSoup ...
python爬取京东商品数据要先登录_手把手教你用python 爬取京东评论
本次python实战,主要目标是利用 Python爬取京东商品评论数,如上图所示:爬取"Python之父"推荐的小蓝书,这些信息主要包括用户名.书名.评论等信息. 爬取的网址url ...
Python爬取京东商品数据
一.前言由于京东反爬技术较强,使用常规方法爬取其数据行不通,且使用逆向分析技术又具有一定难度,所以本文将直接使用selenium爬取京东商品数据.若不知道怎么安装和配置selenium,请点击查阅笔 ...
Python 爬取京东商品评论 + 词云展示
利用python爬虫爬取京东商品评论数据,并绘制词云展示. 原文链接:https://yetingyun.blog.csdn.net/article/details/107943979 创作不易,未经 ...
Python爬取京东商品信息
*使用Python爬取京东华为手机前十页的所有商品的链接.名称.价格.评价数以及店铺名称. 1.前期准备 (1)下载驱动我使用的是谷歌浏览器,所以要下载谷歌驱动,用来告诉电脑在哪打开浏览器. 驱动文 ...
python爬取京东商品信息代码_爬取京东商品信息
利用 BeautifulSoup + Requests 爬取京东商品信息并保存在Excel中一.查看网页信息打开京东商城,随便输入个商品,就选固态硬盘吧先看看 URL 的规律,可以看到我们输入的 ...

Pyppeteer+Python爬取京东商品详情demo

Pyppeteer+Python爬取京东商品详情demo相关推荐

最新文章

热门文章