Python爬虫系列之爬取某社区团微信小程序店铺商品数据

如有问题QQ请> 点击这里联系我们 <

微信请扫描下方二维码

代码仅供学习交流,请勿用于非法用途

  • 数据库仅用于去重使用,数据主要存储于excel

一、准备数据库

set names utf8;
drop database if exists sqt;
create database sqt;use sqt;CREATE TABLE `goods_list` (`id` int(10) NOT NULL AUTO_INCREMENT COMMENT 'ID',`goods_id` bigint(20) NOT NULL COMMENT '唯一ID',`sj_area` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '上架区域',`goods_brand` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '商品品牌',`goods_code` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '商品编码',`spu_id` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'SPU-ID',`gys_code` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '供应商编码(留空)',`gys_name` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '供应商简称',`goods_name` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '商品名称',`attrs` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '规格',`sc_price` decimal(10,2) DEFAULT NULL COMMENT '市场价',`ysj_price` decimal(10,2) DEFAULT NULL COMMENT '预售价',`pt_fei` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '平台费(=预售价*10%,小数点1位)',`bzj` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT '3年' COMMENT '保证金(留空)',`gys_js_price` decimal(10,2) DEFAULT NULL COMMENT '供应商结算价=预售价-门店提成-平台费',`shop_ghj_price` decimal(10,2) DEFAULT NULL COMMENT '门店供货价=预售价-门店提成',`GMV` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'GMV=预售价*限定数量',`sc_riqi` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT '2020年' COMMENT '生产日期(默认2020年)',`zcfs` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '贮存方式(默认值:01常温)',`bzq` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '保质期(默认值:3年)',`ghfs` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT '次日' COMMENT '供货方式(默认值:次日)',`xd_num` int(10) DEFAULT NULL COMMENT '限定数量',`xg_num` int(10) DEFAULT NULL COMMENT '',`sj_bq` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '设计标签(采集分类)',`cate1` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '一级分类(同设计标签)',`cate2` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '二级分类',`guanzhu_num` int(10) DEFAULT NULL COMMENT '关注人数',`xs_nums` int(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '销售数量',`xs_e_price` decimal(10,2) DEFAULT NULL COMMENT '销售额(销售数量*预售价)',`sq_time` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '售罄时间(分钟)',`sj_time` int(10) DEFAULT NULL COMMENT '上架时间',`xj_time` int(10) DEFAULT NULL COMMENT '下架时间',`start_time` int(10) DEFAULT NULL COMMENT '开始销售时间',`end_time` int(10) DEFAULT NULL COMMENT '结束销售时间',`qy_address` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '区域',`imageb_url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '详情页地址(域名+ID)',`sy_image` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '首页图片',`haibao_image` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '海报图片地址(2张滚动的图片)',`images` text COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '详情图片',`sp_image` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '首页视频',`state` enum('0','1') COLLATE utf8mb4_unicode_ci DEFAULT '0' COMMENT '状态值:0=下架,1=上架',
--  `prTitle` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'prTitle',
--  `prDetail` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'prDetail',
--  `tmBuyStart` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'tmBuyStart',
--  `tmPickUp` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'tmPickUp',`createtime`  int(10) DEFAULT NULL COMMENT '创建时间',PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='完整商品表';

二、代码实现

# -*- coding:utf-8 -*-
import requests
import json
from queue import Queue
import threading
import os
import time
import configparser
import MySQLdb
from bs4 import BeautifulSoupretry = 3
timeout = 20
headers = {"content-type": "application/json","authorization": "请替换为自己的authorization","ver": "2.20.0","user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 MicroMessenger/7.0.11(0x17000b21) NetType/WIFI Language/zh_CN","referer": "https://servicewechat.com/wxbbdca62c011eeb38/202/page-frame.html","x-tingyun-id": "请替换为自己的x-tingyun-id",
}
cf = configparser.ConfigParser()
# 间隔启动判断
intervalStartTime = 29
try:cf.read(os.getcwd() + "/conf.ini", encoding="utf-8-sig")
except Exception as e:print("程序目录下不存在conf.ini配置文件~")exit(0)def getConf(sec, key):try:return cf.get(sec, key)except Exception as e:print(e)print("未得到以下配置:" + sec + " - " + key)exit(0)keywords = ""
try:keywords = getConf("app-sys", "keywords").split(",")
except Exception as e:print("keywords参数错误!")exit(0)
threadNums = 1
try:threadNums = int(getConf("app-sys", "threadNums"))
except Exception as e:print("threadNums参数错误!")exit(0)
# 启动时间点
startTime = getConf("app-sys", "start")
startTimes = []
try:startTimes = startTime.split(",")
except Exception as e:passunexcept = getConf("app-sys", "unexcept")
unexcepts = []
try:unexcepts = unexcept.split(",")
except Exception as e:pass
# 数据库账号
mysql_user = getConf("Mysql-Database", "user")
# 数据库密码
mysql_password = getConf("Mysql-Database", "password")
# 数据库名称
mysql_database = getConf("Mysql-Database", "database")
# 主机地址
mysql_host = getConf("Mysql-Database", "host")
# 端口
mysql_port = getConf("Mysql-Database", "port")def execSQl(sql):try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, host=mysql_host, database=mysql_database, charset='utf8')cursor = conn.cursor()cursor.execute(sql)conn.commit()return Trueexcept Exception as e:return Falsedef querySQL(sql):try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, host=mysql_host, database=mysql_database, charset='utf8')cursor = conn.cursor()cursor.execute(sql)return cursor.fetchall()except Exception as e:return Falsedef getHtml(url):for i in range(retry):try:resp = requests.get(url, headers=headers, timeout=timeout)return json.loads(resp.content.decode("utf-8"))except Exception as e:continuereturndef postHtml(url, data):for i in range(retry):try:resp = requests.post(url, headers=headers, data=json.dumps(data), timeout=timeout)return json.loads(resp.content.decode("utf-8"))except Exception as e:continuereturndef getCurrDate():return str(time.strftime('%Y{y}%m{m}%d{d}').format(y='年', m='月', d='日'))def dateTots(s):try:return int(time.mktime(time.strptime(s, "%Y/%m/%d %H:%M:%S")))except Exception as e:return 0class shtSpider(threading.Thread):def __init__(self, categoryQueue, index, city, partnerId, grouponId, *args, **kwargs):super(shtSpider, self).__init__(*args, **kwargs)self.categoryQueue = categoryQueueself.city = citydef getTotalPages(self, categoryId):url = "https://api.*****.net/mc/diamondV2/list-merchandise"data = {"diamondId": str(categoryId),"grouponId": self.grouponId,"partnerId": self.partnerId,"p": "1","size": "10"}resp = postHtml(url, data)try:return int(resp['data']['totalPages'])except Exception as e:returndef getGoodsList(self, categoryId, page):url = "https://api.*****.net/mc/diamondV2/list-merchandise"data = {"diamondId": str(categoryId),"grouponId": self.grouponId,"partnerId": self.partnerId,"p": int(page),"size": 10}resp = postHtml(url, data)try:return resp['data']['grouponMerchandiseList']except Exception as e:returndef getGoodsDetail(self, merchandiseId, merchtypeId, categoryName):url = "https://api.*****.net/mc/merchandise/detail"data = {"grouponId": self.grouponId,"partnerId": self.partnerId,"merchandiseId": str(merchandiseId),"merchtypeId": str(merchtypeId)}goods_id = str(merchtypeId) + str(merchandiseId)resp = postHtml(url, data)if resp:try:data = resp['data']datas = {}try:datas['goods_id'] = int(appflag + str(goods_id))except Exception as e:returntry:datas['sj_area'] = self.city + "十|荟}团"except Exception as e:datas['sj_area'] = ""try:goods_name = str(data['title'])if "【" != goods_name[0]:pname = goods_name.split(" ")if len(pname) > 1:goods_name = goods_name.replace(pname[0], "【" + pname[0] + "】")datas['goods_name'] = goods_nameexcept Exception as e:datas['goods_name'] = ""try:datas['attrs'] = ""except Exception as e:datas['attrs'] = ""datas['ckbz_dw'] = "大件"try:datas['sc_price'] = float(data['originprice'])except Exception as e:datas['sc_price'] = 0.00try:datas['ysj_price'] = float(data['activityprice'])except Exception as e:datas['ysj_price'] = 0.00try:datas['shop_tc'] = str(float('%.1f' % float(datas['ysj_price'] * 0.1)))except Exception as e:datas['shop_tc'] = "0.0"try:datas['pt_fei'] = str(float('%.1f' % float(datas['ysj_price'] * 0.1)))except Exception as e:datas['pt_fei'] = "0.0"try:datas['gys_js_price'] = float("%.2f" % float(datas['ysj_price'] - float(datas['shop_tc']) - float(datas['pt_fei'])))except Exception as e:datas['gys_js_price'] = 0.00try:datas['shop_ghj_price'] = float(datas['ysj_price'] - float(datas['shop_tc']))except Exception as e:datas['shop_ghj_price'] = 0.00try:datas['xs_e_price'] = float("%.2f" % (int(data['waterQuantity']) * datas['ysj_price']))except Exception as e:datas['xs_e_price'] = 0.00try:datas['sj_time'] = dateTots(data['startTime'])except Exception as e:datas['sj_time'] = 0try:datas['xj_time'] = dateTots(data['endTime'])except Exception as e:datas['xj_time'] = 0try:datas['start_time'] = dateTots(data['startTime'])except Exception as e:datas['start_time'] = 0try:datas['end_time'] = dateTots(data['endTime'])except Exception as e:datas['end_time'] = 0try:datas['qy_address'] = self.city + "十?荟d团"except Exception as e:datas['qy_address'] = ""try:datas['imageb_url'] = detailPre + str(datas['goods_id'])except Exception as e:datas['imageb_url'] = ""try:datas['sy_image'] = data['itemimage']except Exception as e:datas['sy_image'] = ""try:images = data['carouselFileList'][:2]imagesList = []for image in images:try:rrr = image['url']if "?" in rrr:rrr = rrr[:rrr.find("?")]imagesList.append(rrr)except Exception as e:imagesList.append(image)datas['haibao_image'] = ",".join(imagesList)except Exception as e:datas['haibao_image'] = ""try:description = data['description']descriptionImgs = []descriptionSoup = BeautifulSoup(description, "html.parser")descriptionSImgs = descriptionSoup.find_all("img")for descriptionSImg in descriptionSImgs:try:descriptionImgs.append(descriptionSImg['src'])except Exception as e:passdatas['images'] = ",".join(descriptionImgs)except Exception as e:datas['images'] = ""datas['state'] = "0"datas['createtime'] = int(time.time())try:datas['gys_name'] = data['supplierName']except Exception as e:datas['gys_name'] = ""return datasexcept Exception as e:passreturndef checkGoodsExists(self, pid):try:sql = "select * from goods_list where goods_id = %d" % int(pid)res = querySQL(sql)return len(res) > 0except Exception as e:return Falsedef update(self, data):print("update ----------------------------------------------------")print(data)try:sql = "update goods_list set `sj_area` = '%s', `goods_name` = '%s', `attrs` = '%s', `ckbz_dw` = '%s', `sc_price` = %f, `ysj_price` = %f, `shop_tc` = '%s', `pt_fei` = '%s', `gys_js_price` = %f, `shop_ghj_price` = %f, `shujuhd3` = '%s', `GMV` = '%s', `c_address` = '%s', `sc_riqi` = '%s', `zcfs` = '%s', `bzq` = '%s', `ghfs` = '%s', `xd_num` = %d, `xg_num` = %d, `hd_attrs` = '%s', `sj_bq` = '%s', `cate1` = '%s', `cate2` = '%s', `pq_beizhu` = '%s', `xs_nums` = %d, `xs_e_price` = %f, `sj_time` = %d, `xj_time` = %d, `start_time` = %d, `end_time` = %d, `qy_address` = '%s', `imageb_url` = '%s', `sy_image` = '%s', `haibao_image` = '%s', `images` = '%s', `state` = '%s', `createtime` = %d, `gys_name` = '%s' where goods_id = %d" % (data['sj_area'], data['goods_name'], data['attrs'], data['ckbz_dw'], data['sc_price'], data['ysj_price'], data['shop_tc'], data['pt_fei'], data['gys_js_price'], data['shop_ghj_price'], data['shujuhd3'], data['GMV'], data['c_address'], data['sc_riqi'], data['zcfs'], data['bzq'], data['ghfs'], data['xd_num'], data['xg_num'], data['hd_attrs'], data['sj_bq'], data['cate1'], data['cate2'], data['pq_beizhu'], data['xs_nums'], data['xs_e_price'], data['sj_time'], data['xj_time'], data['start_time'], data['end_time'], data['qy_address'], data['imageb_url'], data['sy_image'], data['haibao_image'], data['images'], data['state'], data['createtime'], data['gys_name'], data['goods_id'])execSQl(sql)except Exception as e:print(e)passdef run(self):while True:if self.categoryQueue.empty():breakcategory = self.categoryQueue.get()categoryName = category['title']totalPage = self.getTotalPages(category['categoryId'])if totalPage:for i in range(1, totalPage + 1):goodsList = self.getGoodsList(category['categoryId'], i)if goodsList and len(goodsList) > 0:for goods in goodsList:merchandiseId = goods['merchandiseid']merchtypeid = goods['merchtypeid']data = self.getGoodsDetail(merchandiseId, merchtypeid, categoryName)if data:existsStatus = self.checkGoodsExists(data['goods_id'])if existsStatus:self.update(data)else:self.add(data)def getCategoryQueue(partnerId, grouponId):categoryQueue = Queue(0)url = "https://api.*****.net/mc/groupClassify/v3/categoryList"data = {"partnerId": str(partnerId),"grouponId": str(grouponId),"isPartner": 0}resp = postHtml(url, data)try:categories = resp['data']for category in categories:title = category['title']if title not in unexcepts:categoryQueue.put(category)except Exception as e:print("登录过期~")time.sleep(10)exit(0)return categoryQueuedef getKeysList():keysList = []if keywords and len(keywords) > 0:for keyword in keywords:keysList.append(keyword)return keysListdef getNearTeam(lat, lng):url = "https://api.*****.net/partner/near"data = {"lat": lat,"lng": lng}res = postHtml(url, data)try:return str(res['data']['list'][0]['partnerId']), str(res['data']['list'][0]['grouponId'])except Exception as e:passreturndef parser():global threadNumskeys = getKeysList()if keys and len(keys) > 0:for key in keys:try:city, lat, lng = key.split("-")partnerId, grouponId = getNearTeam(lat, lng)categoryQueue = getCategoryQueue(partnerId, grouponId)threadNums = threadNums if threadNums < categoryQueue.qsize() else categoryQueue.qsize()ths = []for i in range(threadNums):s = shtSpider(categoryQueue, i, city, partnerId, grouponId)ths.append(s)s.start()for t in ths:t.join()except Exception as e:print("关键词:%s 格式错误,正确格式为:地区-纬度-经度" % str(key))def getCurrTime():return str(time.strftime('%H:%M'))def main():print("启动时任务爬虫!")parser()if __name__ == '__main__':main()

小程序爬虫接单、app爬虫接单、网页爬虫接单、接口定制、网站开发、小程序开发 > 点击这里联系我们 <

Python爬虫系列之爬取某社区团微信小程序店铺商品数据相关推荐

  1. Python爬虫系列之爬取微信公众号新闻数据

    Python爬虫系列之爬取微信公众号新闻数据 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发 > 点击这里联系我们 < 微信请扫描下方二维码 代码仅供学习交流 ...

  2. Python爬虫系列之爬取某奢侈品小程序店铺商品数据

    Python爬虫系列之爬取某奢侈品小程序店铺商品数据 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发> 点击这里联系我们 < 微信请扫描下方二维码 代码仅供学 ...

  3. Python爬虫系列之爬取某优选微信小程序全国店铺商品数据

    Python爬虫系列之爬取某优选微信小程序全国商品数据 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发 > 点击这里联系我们 < 微信请扫描下方二维码 代码仅 ...

  4. Python 爬虫系列:爬取全球机场信息

    前言 最近公司需要全球机场信息,用来做一些数据分析.刚好发现有个网站上有这个信息,只是没有机场的经纬度信息,不过有了机场信息,经纬度信息到时候我们自己补上去就行 网站元素分析 我们找到了有这些信息的网 ...

  5. Python爬虫系列:爬取小说并写入txt文件

    导语: 哈喽,哈喽~都说手机自带的浏览器是看小说最好的一个APP,不须要下载任何软件,直接百度就ok了. 但是小编还是想说,如果没有网,度娘还是度娘吗?能把小说下载成一个.txt文件看不是更香吗?这能 ...

  6. Python爬虫系列之爬取猫眼电影,没办法出门就补一下往期电影吧

    前言 今天给大家介绍利用Python爬取并简单分析猫眼电影影评.让我们愉快地开始吧~ 开发工具 Python版本:3.6.4 相关模块: requests模块: pyecharts模块: jieba模 ...

  7. [Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(四) —— 应对反爬技术(选取 User-Agent、添加 IP代理池以及Cookies池 )

    上一篇:[Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(三) -- 数据的持久化--使用MongoDB存储爬取的数据 最近项目有些忙,很多需求紧急上线,所以一直没能完善< 使用 ...

  8. Python爬虫实战(1) | 爬取豆瓣网排名前250的电影(下)

    在Python爬虫实战(1) | 爬取豆瓣网排名前250的电影(上)中,我们最后爬出来的结果不是很完美,这对于"精益求精.追求完美的"程序猿来说怎么能够甘心 所以,今天,用pyth ...

  9. python 爬虫实例 电影-Python爬虫教程-17-ajax爬取实例(豆瓣电影)

    Python爬虫教程-17-ajax爬取实例(豆瓣电影) ajax: 简单的说,就是一段js代码,通过这段代码,可以让页面发送异步的请求,或者向服务器发送一个东西,即和服务器进行交互 对于ajax: ...

最新文章

  1. 视频系列:RTX实时射线追踪(下)
  2. 自制的一个eclipse 插件jcodesmith(代码生成器)
  3. vue热更新失效_vue cli@3项目style标签里面热更新失效的问题
  4. python scapy 函数_【python|scapy】sprintf输出时raw_string转string
  5. android trace文件分析ANR
  6. ROC与AUC的定义与使用详解
  7. mysql自动写入创建时间_mysql 自动记录数据插入及最后修改时间
  8. Linux tree命令以树形结构显示文件目录结构
  9. html页面怎么解决跨域问题,前端web开发html如何避免js的跨域访问
  10. Java基础篇(02):特殊的String类,和相关扩展API
  11. 2、计算浮点数相除的余数
  12. 清除用友所有单据锁定的SQL语句
  13. JS中的语音识别——Speech Recognition API
  14. 专注NLP、推荐等AI算法招聘群,慢者无,包括几乎所有公司最新信息
  15. java获取结果集_Java-如何获取结果集上的列名
  16. python卡通滤镜_用Python实现Instagram滤镜,变成百变女神!
  17. 谷歌地球尝试验证时检测到错误_深思考丨验证码为何越来越难了?
  18. android输入法剪切板历史记录,干货分享 讯飞输入法剪切板使用技巧知多少
  19. 1.一个人赶着鸭子去每个村庄卖,每经过一个村子卖去所赶鸭子的一半又一只。 这样他经过了七个村子后还剩两只鸭子,问他出发时共赶多少只鸭子?经过每个村子卖出多少只鸭子?2.角谷定理。
  20. USRP X310入门

热门文章

  1. kali Linux下wifi密码安全测试(1)虚拟机下usb无线网卡的挂载_商洛学院司徒荆_新浪博客
  2. 74 - 键盘驱动程序的完善
  3. html js 延迟加载,JS延迟加载方法
  4. java创建文件和文件夹时报错java.io.FileNotFoundException:XXX (Is a directory)
  5. linux 开机企鹅LOGO修改
  6. iphone中背景图的设置方法
  7. 【Swift】LeedCode Nim 游戏
  8. iOS进阶_下载管理器(封装下载用工具类)
  9. 2020 双节假期后的感悟
  10. 另一台电脑中res文件无法用adams打开