Python+Selenium — 爬取京东搜索商品页数据

用 request 下载京东搜索商品页面源码后，发现得到的数据只有30条，怀疑京东搜索页面加载方式应该是动态渲染的，所以打算采用 Selenium 驱动谷歌浏览器来爬取搜索页面。

代码如下：

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from lxml import etree
import time
import csvclass JDSelenium():def __init__(self,keyword,page,timeout=10,service_args=['--load-images=false', '--disk-cache=true']):self.keyword = keywordself.timeout = timeoutself.page = pageself.chrome_options = webdriver.ChromeOptions()self.chrome_options.add_argument('--headless')self.browser = webdriver.Chrome(chrome_options=self.chrome_options,service_args=service_args)self.browser.set_page_load_timeout(self.timeout)# 可以选择加载浏览器# self.browser = webdriver.Chrome()self.wait = WebDriverWait(self.browser,self.timeout)self.url = r'https://search.jd.com/Search?keyword={keyword}&enc=utf-8&page={page}'self.file = open('{keyword}.csv'.format(keyword=self.keyword),'w',newline='')self.write = self.create_writer()self.count = 0def close(self):self.browser.close()self.file.close()def create_writer(self):fieldnames = ['Title','Store','Price','Comments']writer = csv.DictWriter(self.file,fieldnames=fieldnames)   writer.writeheader()return writerdef process_request(self,page):try:self.browser.get(self.url.format(keyword=self.keyword,page=page))self.wait.until(EC.presence_of_element_located((By.CLASS_NAME,"pn-next")))self.browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')time.sleep(5)return etree.HTML(self.browser.page_source)#出现异常可以选择关闭，也可以选择继续执行except Exception as e:print(e)# self.process_request(page)self.close()def process_item(self,response):products = response.xpath('//*[@id="J_goodsList"]/ul//li[@class="gl-item"]')for product in products:item = {}if '自营' in product.xpath('.//div[@class="p-icons"]//i//text()'):#这里只爬取自营店的数据#分析页面发现有些商品虽是自营店的，但没有出现店名，所以采用 join() 拼接列表，防止出现空列表而报错的情况item['Store'] = ''.join(product.xpath('.//div[@class="p-shop"]/span/a/text()'))item['Title'] = ''.join(product.xpath('.//div[contains(@class,"p-name")]/a/em//text()'))item['Price'] = product.xpath('.//div[@class="p-price"]/strong/i/text()')[0]item['Comments'] = product.xpath('.//div[@class="p-commit"]/strong/a/text()')[0]self.count += 1self.write.writerow(item) else:continuedef run(self):for page in range(1,self.page+1):print('------正在爬取第{page}页------'.format(page=page))response = self.process_request(2*page-1)self.process_item(response)print('数据保存完成')self.close()print('共爬取到{count}条数据'.format(count=self.count))if __name__ == '__main__':jd = JDSelenium(keyword='电脑',page=1)jd.run()

这里没有通过 selenium 获取需要的信息，而是等页面加载完成后通过网页源代码抽取需要的信息。其实本来是想写成 Scrapy+Selenium 的，但嫌弃 scrapy 框架太麻烦了，所以整合到一块了。

ps:由于请求链接后，需要等页面加载一部分然后滚动页面，最后再等待加载完成，耗费一点时间，所以爬取过程有点慢。*

Python+Selenium — 爬取京东搜索商品页数据相关推荐

Python + selenium 爬取淘宝商品列表及商品评论 2021-08-26
Python + selenium 爬取淘宝商品列表及商品评论[2021-08-26] 主要内容登录淘宝获取商品列表获取评论信息存入数据库需要提醒主要内容通过python3.8+ sel ...
用selenium爬取京东平台商品列表,爬取商品名称、价格、店铺信息
#用selenium爬取京东平台商品列表,爬取商品名称.价格.店铺信息from selenium import webdriver from selenium.webdriver.common.by ...
selenium爬取淘宝商品基础数据以及商品详情（茶叶数据）
selenium爬取淘宝商品基础数据以及商品详情目录网页分析确定要爬取的数据分析网页构成爬取流程登入爬取基础数据以及商品详情爬取基础数据爬取商品详情淘宝滑动验证码保存EXCEL中 ...
python selenium 爬取去哪儿网的数据
python selenium 爬取去哪儿网的数据完整代码下载:https://github.com/tanjunchen/SpiderProject/tree/master/selenium+qu ...
Selenium实战之Python+Selenium爬取京东商品数据
实战目标:爬取京东商品信息,包括商品的标题.链接.价格.评价数量. 代码核心在于这几个部分: 其一:使用元素定位来获取页面上指定需要抓取的关键字: 其二:将页面上定位得到的数据永久存储到本地文件中. ...
JS+Selenium+excel追加写入，使用python成功爬取京东任何商品
之前一直是requests库做爬虫,这次尝试下使用selenium做爬虫,效率不高,但是却没有限制,文章是分别结合大牛的selenium爬虫以及excel追加写入操作而成,还有待优化,打算爬取更多信息 ...
利用python爬虫爬取京东商城商品图片
笔者曾经用python第三方库requests来爬取京东商城的商品页内容,经过解析之后发现只爬到了商品页一半的图片.(这篇文章我们以爬取智能手机图片为例) 当鼠标没有向下滑时,此时查看源代码的话,就会 ...
Python selenium 爬取淘宝商品
Catalog 翻页获取动态页面信息提取商品信息完整代码翻页 from urllib import parse domain = 'https://s.taobao.com/search?' ...
python +Selenium 爬取淘宝商品评论
第一步现在淘宝防爬取做的比较好,如果直接爬的话总是出现登陆界面.从而获取不到信息. 解决办法,新建淘宝.py import json from selenium import webdriver ...

Python+Selenium — 爬取京东搜索商品页数据

Python+Selenium — 爬取京东搜索商品页数据相关推荐

最新文章

热门文章