3每天Python小例-爬取淘宝网页商品

代码是从https://github.com/gxcuizy/Python/tree/master/%E4%BB%8E%E9%9B%B6%E5%AD%A6Python-%E6%8E%98%E9%87%91%E6%B4%BB%E5%8A%A8/day21上找的

#!/usr/bin/env python
# -*- coding: utf-8 -*-"""
selenium模拟浏览器抓取淘宝商品信息
author: gxcuizy
date: 2018-11-13
"""from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleepclass GetTaobaoGoods(object):"""模拟浏览器，抓取淘宝商品"""def __init__(self, goods):"""初始化变量"""self.driver = webdriver.Chrome()self.taobao_url = 'https://www.taobao.com'self.goods = goodsdef search_goods(self):"""进入浏览器，搜索商品"""# 全屏self.driver.maximize_window()# 打开淘宝首页self.driver.get(self.taobao_url)# 输入商品，回车搜索search = self.driver.find_element_by_name('q')search.send_keys(self.goods)search.send_keys(Keys.RETURN)# 登陆淘宝self.login()def login(self):"""登陆淘宝"""if 'login' in self.driver.current_url:print('请扫码登陆……')while True:if 'login' in self.driver.current_url:sleep(1.5)else:print('恭喜，登陆成功！')breakdef scroll_to_button(self):# 滑动至底部for i in range(0, 4):# 每次滑动1000像素height = 1000 * ijs_code = "window.scrollBy(0," + str(height) + ")"self.driver.execute_script(js_code)sleep(2)def get_goods_info(self):# 获取商品列表goods_list = self.driver.find_elements_by_class_name('J_MouserOnverReq')for goods_info in goods_list:goods = {}# 商品图片img_element = goods_info.find_element_by_class_name('img')goods_img = 'https:' + img_element.get_attribute('data-src')goods.update({'img': goods_img})# 商品价格price_element = goods_info.find_element_by_css_selector('.g_price strong')goods_price = price_element.textgoods.update({'price': goods_price})# 购买人数count_element = goods_info.find_element_by_class_name('deal-cnt')goods_sale = count_element.textgoods.update({'sale': goods_sale})# 商品名称title_element = goods_info.find_element_by_class_name('title')goods_title = title_element.textgoods.update({'title': goods_title})# 店铺shop_element = goods_info.find_element_by_class_name('shop')goods_shop = title_element.textgoods.update({'shop': goods_shop})# 所在地location_element = goods_info.find_element_by_class_name('location')goods_location = location_element.textgoods.update({'location': goods_location})# 链接href_element = goods_info.find_element_by_css_selector('.title a')goods_href = href_element.get_attribute('href')goods.update({'href': goods_href})print(goods)def run(self):"""执行脚本"""# 打开网页搜索商品self.search_goods()page_num = 1sleep(30)while True:print('正在获取第%s页的商品……' % page_num)# 滑动底部self.scroll_to_button()# 抓取商品self.get_goods_info()print('第%s页的商品抓取结束！' % page_num)# 查找下一页next_element = self.driver.find_element_by_class_name('next')next_page = next_element.find_element_by_tag_name('a')if next_page:next_page.click()page_num += 1sleep(2)else:break# 退出浏览器self.driver.quit()# 程序主入口
if __name__ == '__main__':search_goods = '三只松鼠'tao_bao = GetTaobaoGoods(search_goods)tao_bao.run()

1.它的用的是谷歌浏览器，如果要用火狐请把webdriver.Chrome()改为webdriver.Firefox()
2.我一直试结果一直卡在next_element = self.driver.find_element_by_class_name(‘next’)这，然后研究了一下，如果class的值中间有空格，只能用其中一个，他这里的网页中class=item next，于是我又换了一种表达方法，next_element = self.driver.find_element_by_css_selector("[class=‘item next’]")，结果还是不行。后来才发现用手机扫二维码登陆，还要过一段时间才能确认，而该程序还没等我确认完就开始处理页面了，也就是页面还没出来就开始爬取，于是我在while True:前加了sleep(30)，等我确认完才开始爬取，这样就得到结果了。
每天一个小例，加油！

3每天Python小例-爬取淘宝网页商品相关推荐

简单使用Python爬虫爬取淘宝网页商品信息
最近在学习爬虫,本人还是入门级的小白,自己跟着老师写了一些代码,算是自己的总结,还有一些心得,跟大家分享一下,如果不当,还请各位前辈斧正. 这是代码: # 导入库 import requests im ...
【python爬虫】爬取淘宝网商品信息
相信学了python爬虫,很多人都想爬取一些数据量比较大的网站,淘宝网就是一个很好的目标,其数据量大,而且种类繁多,而且难度不是很大,很适合初级学者进行爬取.下面是整个爬取过程: 第一步:构建访问的u ...
python + selenium多进程爬取淘宝搜索页数据
python + selenium多进程爬取淘宝搜索页数据 1. 功能描述按照给定的关键词,在淘宝搜索对应的产品,然后爬取搜索结果中产品的信息,包括:标题,价格,销量,产地等信息,存入mongodb ...
用Python爬取淘宝网商品信息
用Python爬取淘宝网商品信息转载请注明出处网购时经常会用到淘宝网点我去淘宝但淘宝网上的商品琳琅满目,于是我参照中国大学 MOOC的代码写了一个爬取淘宝网商品信息的程序代码如下: impor ...
python不登陆爬取淘宝数据_python登录并爬取淘宝信息代码示例
本文主要分享关于python登录并爬取淘宝信息的相关代码,还是挺不错的,大家可以了解下. #!/usr/bin/env python # -*- coding:utf-8 -*- from selen ...
python使用requests库爬取淘宝指定商品信息
python使用requests库爬取淘宝指定商品信息在搜索栏中输入商品通过F12开发者工具抓包我们知道了商品信息的API,同时发现了商品数据都以json字符串的形式存储在返回的html内解析u ...
python+scrapy简单爬取淘宝商品信息
python结合scrapy爬取淘宝商品信息一.功能说明: 已实现功能: 通过scrapy接入selenium获取淘宝关键字搜索内容下的商品信息. 待扩展功能: 爬取商品中的全部其他商品信息. 二. ...
爬取淘宝任意商品数据，你上你也行
文章目录构造url 分析页面结构爬取多页数据最后构造url 第一页url https://s.taobao.com/search?q="面膜" 第二页url https:/ ...
网络爬虫爬取淘宝页面商品信息
网络爬虫爬取淘宝页面商品信息最近在MOOC上看嵩老师的网络爬虫课程,按照老师的写法并不能进行爬取,遇到了一个问题,就是关于如何"绕开"淘宝登录界面,正确的爬取相关信息.通过百度找 ...

3每天Python小例-爬取淘宝网页商品

3每天Python小例-爬取淘宝网页商品相关推荐

最新文章

热门文章