Python爬虫系列之爬取某奢侈品小程序店铺商品数据

小程序爬虫接单、app爬虫接单、网页爬虫接单、接口定制、网站开发、小程序开发> 点击这里联系我们 <

微信请扫描下方二维码

代码仅供学习交流,请勿用于非法用途

一、准备数据库

create database zr;use zr;# 商品表
create table zr_goodslist(id int primary key auto_increment comment 'id',pid varchar(30) unique comment 'pid',sku varchar(30) default null comment 'sku',name varchar(50) default null comment 'name',sellingPoint varchar(200) default null comment 'sellingPoint',descption text default null comment 'desc',mainimg text default null comment 'mainimg',imageList text default null comment 'imageList',video text default null comment 'video',brand varchar(30) default null comment 'brand',status varchar(8) default null comment 'status',stock varchar(10) default null comment 'stock',source varchar(10) default null comment 'source',refDetail text default null comment 'refDetail',convert_size varchar(100) default null comment 'convert_size',marketPrice varchar(15) default null comment 'marketPrice',salePrice varchar(15) default null comment 'salePrice',price varchar(15) default null comment 'price',discount varchar(15) default null comment 'discount',marketingDesc varchar(300) default null comment 'marketingDesc',grade varchar(10) default null comment 'grade',brandType varchar(15) default null comment 'brandType',categoryOne varchar(20) default null comment 'categoryOne',categoryTwo varchar(20) default null comment 'categoryTwo',categoryThree varchar(20) default null comment 'categoryThree',viewNumStatus varchar(10) default null comment 'viewNumStatus',openBargain varchar(30) default null comment 'openBargain',directDesc text default null comment 'directDesc',degree text default null comment 'degree',degreeDesc text default null comment 'degreeDesc',degreeExt text default null comment 'degreeExt',coefficient text default null comment 'coefficient',firstPutOn varchar(50) default null comment 'firstPutOn',proc_view_num varchar(15) default null comment 'proc_view_num',correctNum varchar(15) default null comment 'correctNum',bargainBasePrice varchar(15) default null comment 'bargainBasePrice',onSale varchar(10) default null comment 'onSale',onSaleCountDown varchar(15) default null comment 'onSaleCountDown',bargainLock varchar(50) default null comment 'bargainLock',bargainDownTime varchar(35) default null comment 'bargainDownTime',isBargain varchar(10) default null comment 'isBargain',bargainPrice varchar(15) default null comment 'bargainPrice',bargainNum varchar(15) default null comment 'bargainNum',color_forming varchar(30) default null comment 'color_forming',tile_size varchar(30) default null comment 'tile_size',overall_weight varchar(30) default null comment 'overall_weight',size_prompt varchar(30) default null comment 'size_prompt',defect text default null comment 'defect',style text default null comment 'style',accessories text default null comment 'accessories',material text default null comment 'material',lengths text default null comment 'lengths',main_material text default null comment 'main_material',sizes text default null comment 'sizes',fabric text default null comment 'fabric'
)engine=INNODB charset=utf8;

二、代码实现

# -*- coding:utf-8 -*-
import requests
from queue import Queue
import threading
import json
import MySQLdb
import configparsertotals = 0
cf = configparser.ConfigParser()
try:cf.read("config.ini")
except Exception as e:print("程序目录下不存在config.ini配置文件~")exit(0)def getConf(sec, key):try:return cf.get(sec, key)except Exception as e:print("未得到以下配置:" + sec + " - " + key)exit(0)# -------------------------------------------------
threadNums = int(getConf("app-sys", "threadNums"))
retry = 3
timeout = 20
# 数据库账号
mysql_user = getConf("Mysql-Database", "user")
# 数据库密码
mysql_password = getConf("Mysql-Database", "password")
# 数据库名称
mysql_database = getConf("Mysql-Database", "database")
# 表名称
mysql_table = getConf("Mysql-Database", "table")
headers = {"User-Agent": "Mozilla/5.0 (Linux; Android 5.1.1; DUK-AL20 Build/LMY48Z; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/52.0.2743.100 Safari/537.36 MicroMessenger/7.0.10.1580(0x27000A59) Process/appbrand3 NetType/WIFI Language/zh_CN ABI/arm32","content-type": "application/json;charset=utf-8",
}
host = "https://img.*******.com/"
attrsList = []class zrSpider(threading.Thread):def __init__(self, brandQueue, index, *args, **kwargs):super(zrSpider, self).__init__(*args, **kwargs)self.brandQueue = brandQueueself.index = indexdef getGoodsList(self, brandId, page):url = "https://search.*******.com/V4.7.0/product/list"data = {"page": page,"pageSize": 20,"sort": "","ppath": "4:" + str(brandId),"newShare": 0,"selfbiz": 1,"version": "5.3.0","debug": "false","mt": "WX-micro","inWechat": 1,"from": "micro","deviceId": "deviceId"}resp = postHtml(url, data)if resp:try:return resp['data']['list']except Exception as e:passreturndef getGoodsDetail(self, id):global attrsListurl = "https://api.*******.com/V5.3.0/product/newDetail"data = {"id": str(id),"version": "5.3.0","debug": "false","mt": "WX-micro","inWechat": 1,"from": "micro","deviceId": "deviceId"}resp = postHtml(url, data)if resp:try:if str(resp['code']) != "100000":returnexcept Exception as e:returndetail = {}productAttr = {}# brand = {}try:detail = resp['data']['detail']productAttr = resp['data']['productAttr']# brand = resp['data']['brand']except Exception as e:return# try:#     for product in productAttr:#         attrsList.append(product['name'])#     print(list(set(attrsList)))#     print(len(list(set(attrsList))))# except Exception as e:#     pass# returntry:goods = []try:goods.append(detail['id'])except Exception as e:goods.append("")try:goods.append(detail['sku'])except Exception as e:goods.append("")try:goods.append(detail['name'])except Exception as e:goods.append("")try:goods.append(detail['sellingPoint'])except Exception as e:goods.append("")try:goods.append(detail['desc'])except Exception as e:goods.append("")try:goods.append(host + detail['imageList'][0])except Exception as e:goods.append("")try:imageList = detail['imageList']imgs = []for image in imageList:imgs.append(host + image)goods.append(str(imgs).replace("'", "\""))except Exception as e:goods.append("")try:goods.append(detail['video'])except Exception as e:goods.append("")try:goods.append(detail['brand'])except Exception as e:goods.append("")try:goods.append(detail['status'])except Exception as e:goods.append("")try:goods.append(detail['stock'])except Exception as e:goods.append("")try:goods.append(detail['source'])except Exception as e:goods.append("")try:goods.append(detail['refDetail'])except Exception as e:goods.append("")try:goods.append(detail['convert_size'])except Exception as e:goods.append("")try:goods.append(detail['marketPrice'])except Exception as e:goods.append("")try:goods.append(detail['salePrice'])except Exception as e:goods.append("")try:goods.append(detail['price'])except Exception as e:goods.append("")try:goods.append(detail['discount'])except Exception as e:goods.append("")try:goods.append(detail['marketingDesc'])except Exception as e:goods.append("")try:goods.append(detail['grade'])except Exception as e:goods.append("")try:goods.append(detail['brandType'])except Exception as e:goods.append("")try:goods.append(detail['categoryOne'])except Exception as e:goods.append("")try:goods.append(detail['categoryTwo'])except Exception as e:goods.append("")try:goods.append(detail['categoryThree'])except Exception as e:goods.append("")try:goods.append(detail['viewNumStatus'])except Exception as e:goods.append("")try:goods.append(detail['openBargain'])except Exception as e:goods.append("")try:goods.append(detail['directDesc'])except Exception as e:goods.append("")try:goods.append(detail['degree'])except Exception as e:goods.append("")try:goods.append(detail['degreeDesc'])except Exception as e:goods.append("")try:goods.append(detail['degreeExt'])except Exception as e:goods.append("")try:goods.append(detail['coefficient'])except Exception as e:goods.append("")try:goods.append(detail['firstPutOn'])except Exception as e:goods.append("")try:goods.append(detail['proc_view_num'])except Exception as e:goods.append("")try:goods.append(detail['correctNum'])except Exception as e:goods.append("")try:goods.append(detail['bargainBasePrice'])except Exception as e:goods.append("")try:goods.append(detail['onSale'])except Exception as e:goods.append("")try:goods.append(detail['onSaleCountDown'])except Exception as e:goods.append("")try:goods.append(detail['bargainLock'])except Exception as e:goods.append("")try:goods.append(detail['bargainDownTime'])except Exception as e:goods.append("")try:goods.append(detail['isBargain'])except Exception as e:goods.append("")try:goods.append(detail['bargainPrice'])except Exception as e:goods.append("")try:goods.append(detail['bargainNum'])except Exception as e:goods.append("")return goodsexcept Exception as e:returnreturndef pipLine(self, data):print("------------------------- insert ------------------------- ")print(data)print("---------------------------------------------------------- ")try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database,charset='utf8')cursor = conn.cursor()cursor.execute("insert " + mysql_table + "(pid, sellingPoint, descption, mainimg, imageList, video, brand, status, stock, source, refDetail, convert_size, marketPrice, salePrice, price, discount, marketingDesc, grade,categoryTwo, categoryThree, viewNumStatus, openBargain, directDesc, degree, degreeDesc, degreeExt, coefficient, firstPutOn, proc_view_num, correctNum, bargainBasePrice, onSale, onSaleCountDown, bargainLock, bargainPrice, color_forming, tile_size, overall_weight, size_prompt, defect, style, accessories, material, lengths, main_material, sizes, fabric) values('%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s')" % (str(data[0]), str(data[1]), str(data[2]), str(data[3]), str(data[4]), str(data[5]), str(data[6]), str(data[7]), str(data[8]), str(data[9]), str(data[10]), str(data[11]), str(data[12]), str(data[13]), str(data[14]), str(data[15]), str(data[16]), str(data[17]), str(data[18]), str(data[19]), str(data[20]), str(data[21]), str(data[22]), str(data[23]), str(data[24]), str(data[25]), str(data[26]), str(data[27]), str(data[28]), str(data[29]), str(data[30]), str(data[31]), str(data[32]), str(data[33]), str(data[34]), str(data[35]), str(data[36]), str(data[37]), str(data[38]), str(data[39]), str(data[40]), str(data[41]), str(data[42]), str(data[43]), str(data[44]), str(data[45]), str(data[46]), str(data[47]), str(data[48]), str(data[49]), str(data[50]), str(data[51]), str(data[52]), str(data[53])))conn.commit()except Exception as e:print(e)passdef getTotalPage(self, brandId):url = "https://search.*******.com/V4.7.0/product/list"data = {"page": 1,"pageSize": 20,"sort": "","ppath": "4:" + str(brandId),"newShare": 0,"selfbiz": 1,"version": "5.3.0","debug": "false","mt": "WX-micro","inWechat": 1,"from": "micro","deviceId": "deviceId"}resp = postHtml(url, data)if resp:try:count = int(resp['data']['count'])return count // 20 if count % 20 == 0 else (count // 20) + 1except Exception as e:passreturn 1def checkGoodsExists(self, pid):try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database,charset='utf8')cursor = conn.cursor()cursor.execute("select * from " + mysql_table + " where pid = '%s'" % str(pid))return len(cursor.fetchall()) > 0except Exception as e:print(e)passreturn Falsedef update(self, data):print("------------------------- update ------------------------- ")print(data)print("---------------------------------------------------------- ")try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database,charset='utf8')cursor = conn.cursor()cursor.execute("update " + mysql_table + " set sku = '%s', name = '%s', sellingPoint = '%s', descption = '%s', *****= '%s', imageList = '%s', video = '%s', brand = '%s', status = '%s', stock = '%s', source = '%s', refDetail = '%s', convert_size = '%s', marketPrice = '%s', salePrice = '%s', ***** = '%s', discount = '%s', marketingDesc = '%s', grade = '%s', brandType = '%s', categoryOne = '%s', categoryTwo = '%s', categoryThree = '%s', viewNumStatus = '%s', openBargain = '%s', directDesc = '%s', degree = '%s', degreeDesc = '%s', degreeExt = '%s', coefficient = '%s', firstPutOn = '%s', *****= '%s', correctNum = '%s', bargainBasePrice = '%s', onSale = '%s', onSaleCountDown = '%s', bargainLock = '%s', bargainDownTime = '%s', isBargain = '%s', bargainPrice = '%s', bargainNum = '%s', color_forming = '%s', tile_size = '%s', *****= '%s', size_prompt = '%s', defect = '%s', style = '%s', accessories = '%s', material = '%s', lengths = '%s', fabric = '%s' where pid = '%s'" % (str(data[1]), str(data[2]), str(data[3]), str(data[4]), str(data[5]), str(data[6]), str(data[7]), str(data[8]), str(data[9]), str(data[10]), str(data[11]), str(data[12]), str(data[13]), str(data[14]), str(data[15]), str(data[16]), str(data[17]), str(data[18]), str(data[19]), str(data[20]), str(data[21]), str(data[22]), str(data[23]), str(data[24]), str(data[25]), str(data[26]), str(data[27]), str(data[28]), str(data[29]), str(data[30]), str(data[31]), str(data[32]), str(data[33]), str(data[34]), str(data[35]), str(data[36]), str(data[37]), str(data[38]), str(data[39]), str(data[40]), str(data[41]), str(data[42]), str(data[43]), str(data[44]), str(data[45]), str(data[46]), str(data[47]), str(data[48]), str(data[49]), str(data[50]), str(data[51]), str(data[52]), str(data[53]), str(data[0])))conn.commit()except Exception as e:passdef run(self):print("线程:%d 启动~" % self.index)while True:if self.brandQueue.empty():breakbrandQueue = self.brandQueue.get()brand_id = str(brandQueue['id'])totalPage = self.getTotalPage(brand_id)for page in range(1, totalPage + 1):goodsList = self.getGoodsList(brand_id, page)if goodsList and len(goodsList) > 0:for goods in goodsList:goodsId = goods['id']datas = self.getGoodsDetail(goodsId)exists = self.checkGoodsExists(goodsId)if exists:# 更新self.update(datas)else:self.pipLine(datas)def postHtml(url, data):for i in range(retry):try:resp = requests.post(url, data=json.dumps(data), json=data, headers=headers, timeout=timeout)return json.loads(resp.content.decode("utf-8"))except Exception as e:passreturndef getHtml(url):for i in range(retry):try:resp = requests.get(url, headers=headers, timeout=timeout)return json.loads(resp.content.decode("utf-8"))except Exception as e:passreturndef getBrandQueue():brandQueue = Queue(0)url = "https://api.*******.com/V5.3.0/site/currentBrand"data = {"version": "5.3.0","debug": "false","mt": "WX-micro","inWechat": 1,"from": "micro","deviceId": "deviceId"}resp = postHtml(url, data)if resp:brandList = []try:brandList = resp['data']['list']except Exception as e:returnfor brand in brandList:brandQueue.put(brand)return brandQueuedef main():print("初始化爬虫~")brandQueue = getBrandQueue()print("类目获取完毕~")for i in range(threadNums):z = zrSpider(brandQueue, i)z.start()if __name__ == '__main__':main()

小程序爬虫接单、app爬虫接单、网页爬虫接单、接口定制、网站开发、小程序开发 > 点击这里联系我们 <

Python爬虫系列之爬取某奢侈品小程序店铺商品数据相关推荐

  1. Python爬虫系列之爬取微信公众号新闻数据

    Python爬虫系列之爬取微信公众号新闻数据 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发 > 点击这里联系我们 < 微信请扫描下方二维码 代码仅供学习交流 ...

  2. Python爬虫系列之爬取某优选微信小程序全国店铺商品数据

    Python爬虫系列之爬取某优选微信小程序全国商品数据 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发 > 点击这里联系我们 < 微信请扫描下方二维码 代码仅 ...

  3. Python爬虫系列之爬取某社区团微信小程序店铺商品数据

    Python爬虫系列之爬取某社区团微信小程序店铺商品数据 如有问题QQ请> 点击这里联系我们 < 微信请扫描下方二维码 代码仅供学习交流,请勿用于非法用途 数据库仅用于去重使用,数据主要存 ...

  4. [Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(三) —— 数据的持久化——使用MongoDB存储爬取的数据

    上一篇:[Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(二) -- 编写一个基本的 Spider 爬取微博用户信息 在上一篇博客中,我们已经新建了一个爬虫应用,并简单实现了爬取一位微 ...

  5. 为了部落 来自艾泽拉斯勇士的python爬虫学习心得 爬取大众点评上的各种美食数据并进行数据分析

    为了希尔瓦娜斯 第一个爬虫程序 csgo枪械数据 先上代码 基本思想 问题1 问题2 爬取大众点评 URL分析 第一个难题 生成csv文件以及pandas库 matplotlib.pyplot库 K- ...

  6. Python 爬虫系列:爬取全球机场信息

    前言 最近公司需要全球机场信息,用来做一些数据分析.刚好发现有个网站上有这个信息,只是没有机场的经纬度信息,不过有了机场信息,经纬度信息到时候我们自己补上去就行 网站元素分析 我们找到了有这些信息的网 ...

  7. Python爬虫系列:爬取小说并写入txt文件

    导语: 哈喽,哈喽~都说手机自带的浏览器是看小说最好的一个APP,不须要下载任何软件,直接百度就ok了. 但是小编还是想说,如果没有网,度娘还是度娘吗?能把小说下载成一个.txt文件看不是更香吗?这能 ...

  8. Python爬虫系列之爬取猫眼电影,没办法出门就补一下往期电影吧

    前言 今天给大家介绍利用Python爬取并简单分析猫眼电影影评.让我们愉快地开始吧~ 开发工具 Python版本:3.6.4 相关模块: requests模块: pyecharts模块: jieba模 ...

  9. python爬虫(八、爬取图片社的小姐姐图片并下载)

    爬取网页 Ⅰ.先抓取下这个网页,套模板就好了\color{Red}Ⅰ.先抓取下这个网页,套模板就好了Ⅰ.先抓取下这个网页,套模板就好了 def ask(url):head = {"User- ...

最新文章

  1. JAVA - HashMap和HashTable
  2. java正则 括号_java正则表达式获取大括号小括号内容并判断数字和小数亲测可用...
  3. Airbnb 宣布弃用 React Native!
  4. Java面试之Synchronized无法禁止指令重排却能保证有序性
  5. 华为S5300系列交换机V100R005SPH020升级补丁
  6. 有哪些关于iPhone使用的小技巧?
  7. vscode中查看二进制文件
  8. 定时器实现原理——时间轮
  9. 河南科技大学Oracle作业,机械制图作业集第四章答案河南科技大学.ppt
  10. 用图层蒙版抠图并合成(每天一个PS小项目)
  11. 第十三周作业-必做3
  12. iOS开发证书、bundle ID、App ID、描述文件、p12文件,及企业证书打包发布详述
  13. 西北大学第四届程序设计竞赛新生赛(同步赛)(J)
  14. 解决虚拟机ubuntu20.04不能连外网问题
  15. PHP系统常量及判断某常量是否被定义
  16. 孤岛惊魂5 for Android,孤岛惊魂5手机版
  17. hive 正则表达式验证电话号码
  18. 命令模式在开源代码中的应用
  19. 解读百度《搜索引擎优化指南》
  20. Android UI设计之五自定义DrawView组件,实现数字签名效果

热门文章

  1. 【应用随机过程】07. 平稳过程
  2. maven 打包跳过单元测试
  3. linux怎么查找接口,Linux终端命令接口(五)查找与搜索
  4. [CocosCreator]使用龙骨DragonBone
  5. 线上比赛相关规则补充说明
  6. PEDOT:PSS/甘油酸胆碱([Ch][Glyce])离子液体混合材料
  7. Guava 系列 - Guava基础
  8. HAC Ada Compiler(开源Ade编译器)
  9. 中科大统计学习(刘东)作业1
  10. 互联网、因特网、万维网、广域网、局域网、以太网的区别