用 python 爬取某珠宝网站

GitHub：https://github.com/121812/python
转载请注明出处

详细信息请看：http://www.forever121.cn/?p=138

下面为源代码

import os
import re
import time
import request
import requests
import xlsxwriter
from  selenium import webdriver
from  selenium.webdriver.support.ui import WebDriverWait
url = 'https://www.dior.cn/zh_cn/products/search?page=3&query=珠宝'
#url = 'http://www.baidu.com'chrome="C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument('lang=zh_CN.UTF-8')
options.add_argument('user-agent="Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Mobile Safari/537.36"')
browser = webdriver.Chrome(chrome_options=options)
browser.maximize_window()
wait = WebDriverWait(browser,2)def get_source(url):browser.get(url)num = 0try:while True:print(num)num += 1login = browser.find_element_by_xpath('//*[@id="main"]/div/div[2]/div[3]/button/span/span')login.click()time.sleep(1)if num == 20:breakexcept:passhtml = browser.page_sourcereturn htmldef find_combination(html):combination = re.findall('<div class="product-legend"><span class="title-with-level product-title century-std size-s"><span class="multiline-text multiline-text--is-china">(.*?)</span>.*?<img src="(.*?)"', return combinationdef find_jpg(combination):jpg = []txt = []a = 0b = 4e = 1num = 1file = xlsxwriter.Workbook('迪奥.xls')table = file.add_worksheet('迪奥')table.write(0, 0, '总体')table.write(0, 1, '名称')table.write(0, 2, '质量')table.write(0, 3, '用料')table.write(0, 4, '图片')for i in combination:i = list(i)down = i[1]save_jpg = requests.get(down)time.sleep(1)print('正在下载：%s'%down)save = open('F:\\python测试\\data\\迪奥\\%s'%num + '.jpg', 'wb')save.write(save_jpg.content)save.close()name = re.findall('^(.*?)750', i[0])quality = re.findall('750/1000(.*?)$', i[0])try:table.write(e,a , '%s'%i[0])table.write(e,1, '%s'%name[0])table.write(e,2, '750/1000')table.write(e,3, '%s'%quality[0])except:passtable.insert_image(e,b, 'F:\\python测试\\data\\迪奥\\%s.jpg'%num)e += 1num += 1file.close()find_jpg(find_combination(get_source(url)))
print('下载完成')

用 python 爬取某珠宝网站相关推荐

教你用Python爬取表情包网站下的全部表情图片
教你用Python爬取表情包网站下的全部表情图片又是我啦~~~ 最近上网的时候老看到有人用Python爬取表情包,心痒痒自己也整了一个. 使用到的扩展库:BeautifulSoup, request ...
python 爬取猫眼电影网站数据
完整代码下载:https://github.com/tanjunchen/SpiderProject/tree/master/maoyan python 爬取 movie.douban.com 网站 ...
python爬取安居客网站上北京二手房数据
目标:爬取安居客网站上前10页北京二手房的数据,包括二手房源的名称.价格.几室几厅.大小.建造年份.联系人.地址.标签等. 网址为:https://beijing.anjuke.com/sale/ B ...
手把手教你用python爬取人人贷网站借款人信息
P2P是近年来很热的一个行业,由于这个行业在国内兴起才不久,国内的很多学者对这个行业都兴趣盎然,在大学研究互联网金融的学者更是有一大群.小编是学金融出身,深知数据在做学术研究的重要性,之前有不少学互联 ...
[Python]爬取游民星空网站每周精选壁纸（1080高清壁纸）网络爬虫
一.检查首先进入该网站的https://www.gamersky.com/robots.txt页面给出提示: 弹出错误页面注: 网络爬虫:自动或人工识别robots.txt,再进行内容爬取约束 ...
Python爬取某音乐网站
本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理. 爬取某音乐网站,我们先搜索歌曲,然后随意点进一首歌,然后在新弹出的歌曲页面按F12开始抓包,并刷新 ...
利用python爬取实习僧网站上的数据
最近在找实习,就顺便想到用python爬取一些职位信息看看,有哪些岗位比较缺人. #_*_coding:utf-8_*_import requests from bs4 import Beautifu ...
Python爬取斗鱼直播网站信息
一.需求爬取斗鱼直播网站信息,如直播名字.主播名字.热度.图片和房间网址,将这些数据保存到csv文件中,并单独创建文件夹保存图片. 斗鱼直播网址:https://www.douyu.com/g_LO ...
python爬取10个网站_十个Python爬虫武器库示例，十个爬虫框架，十种实现爬虫的方法！...
一般比价小型的爬虫需求,我是直接使用requests库 + bs4就解决了,再麻烦点就使用selenium解决js的异步加载问题.相对比较大型的需求才使用框架,主要是便于管理以及扩展等. 1.Scr ...

用 python 爬取某珠宝网站

用 python 爬取某珠宝网站相关推荐

最新文章

热门文章

用 python 爬取 某珠宝网站

用 python 爬取 某珠宝网站相关推荐

最新文章

热门文章

用 python 爬取某珠宝网站

用 python 爬取某珠宝网站相关推荐