Python爬虫系列之爬取某优选微信小程序全国商品数据

小程序爬虫接单、app爬虫接单、网页爬虫接单、接口定制、网站开发、小程序开发 > 点击这里联系我们 <

微信请扫描下方二维码

代码仅供学习交流,请勿用于非法用途

  • 数据库仅用于去重使用,数据主要存储于excel

一、准备数据库

drop database if exists shop;create database shop default charset utf8;use shop;create table `store`(`id` int primary key auto_increment,`store_id` varchar(18) not null comment 'store_id',`area_id` varchar(18) not null comment 'area_id',UNIQUE KEY `area_id` (`area_id`, `store_id`)
)engine=INNODB charset=utf8;create table `goods`(`id` int primary key auto_increment,`goods_id` varchar(18) not null unique comment 'goods_id'
)engine=INNODB charset=utf8;

二、代码实现

import requests
import json
from queue import Queue
import threading
import time
import xlrd
import xlwt
from xlutils.copy import copy
import MySQLdb
import datetime'''@Author     :王磊@Date       :2019/9/20@Description:优选微信小程序全国店铺商品数据爬取
'''# -----------------------------------------------------------
threadNum = 1
excelPath = "c:/users/it1002/Desktop/data/excel"
imgPath = "c:/users/it1002/Desktop/data/img"
# 数据库账号
mysql_user = "root"
# 数据库密码
mysql_password = "root"
# 数据库名称
mysql_database = "shop"
# -----------------------------------------------------------headers = {"content-type": "application/x-www-form-urlencoded","User-Agent": "Mozilla/5.0 (Linux; Android 5.1.1; OPPO R11 Build/NMF26X; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36 MicroMessenger/7.0.6.1460(0x27000634) Process/appbrand0 NetType/WIFI Language/zh_CN","Host": "mall-store.xsyxsc.com",
}headers_ = {"User-Agent": "Mozilla/5.0 (Linux; Android 5.1.1; OPPO R11 Build/NMF26X; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36 MicroMessenger/7.0.6.1460(0x27000634) Process/appbrand0 NetType/WIFI Language/zh_CN",
}userKey = "8be3d5dd-23e1-4b04-bf12-efca98d69d12"areaList = []class xsyxSpider(threading.Thread):def __init__(self, loaQueue, *args, **kwargs):super(xsyxSpider, self).__init__(*args, **kwargs)self.excelPath = ""self.loaQueue = loaQueueself.excelTitle = ['prId', 'acId', 'preId', 'sku', 'dailySaleTime', 'tmBuyStart', 'tmBuyEnd', 'tmShowStart', 'tmShowEnd', 'adUrl', 'prName', 'tmPickUp', 'limitQty', 'ulimitQty', 'marketAmt', 'saleAmt', 'prType', 'areaId', 'shelfLife', 'folQty', 'daySaleQty', 'saleQty', 'vesName', 'attrs', 'prDetail', 'shTitle', 'prBrief', 'primaryUrls', 'detailUrls', 'consumerNum', 'hasImgTxt', 'prTitle', 'yieldly', 'brName', 'specialSale', 'brId', 'status']def getDate(self):return str(datetime.date.today())def initExcel(self, areaName):self.excelPath = excelPath + "/" + self.getDate() + "-" + str(areaName) + ".xls"f = xlwt.Workbook()sheet1 = f.add_sheet(u'double', cell_overwrite_ok=True)for i in range(0, len(self.excelTitle)):sheet1.write(0, i, self.excelTitle[i])f.save(self.excelPath)def writeExcel(self, data):workbook = xlrd.open_workbook(self.excelPath)sheets = workbook.sheet_names()worksheet = workbook.sheet_by_name(sheets[0])rows_old = worksheet.nrowsnew_workbook = copy(workbook)new_worksheet = new_workbook.get_sheet(0)for j in range(0, len(data)):try:new_worksheet.write(rows_old, j, data[j])except Exception as e:continuenew_workbook.save(self.excelPath)def getShopList(self, mapX, mapY):url = "https://mall-store.xsyxsc.com/mall-store/store/getNearStoreList"data = {"mapX": str(mapX),"mapY": str(mapY),"userKey": userKey,}shopListResp = postHtml(url, data)try:return shopListResp['data']except Exception as e:return Nonedef updateStore(self, storeId):url = "https://user.xsyxsc.com/api/user/user/updateCurrStoreId?userKey=" + userKey + "&storeId=" + str(storeId)updateStoreResp = getHtml(url)print("切换店铺响应:" + str(updateStoreResp))try:return updateStoreResp['rspCode'] == 'success'except Exception as e:passreturn Falsedef getTs(self):return str(time.time()).replace(".", "")[: -4]def getGoodsList(self, storeId, areaId):url = "https://mall.xsyxsc.com/user/product/indexData?storeId=" + str(storeId) + "&areaId=" + str(areaId) + "&ts=" + str(self.getTs()) + "&userKey=" + userKeygoodsListResp = getHtml(url)try:return goodsListResp['data']['products']except Exception as e:return Nonedef saveImg(self, url, productId):try:img = requests.get(url)path = imgPath + "/" + str(productId) + ".jpg"with open(path, 'wb') as f:f.write(img.content)return pathexcept Exception as e:return Nonedef getGoodsDetail(self, productId, activityId, storeId, areaId):url = "https://mall.xsyxsc.com/user/product/productInfo?productId=" + str(productId) + "&activityId=" + str(activityId) + "&storeId=" + str(storeId) + "&productType=CHOICE&areaId=" + str(areaId) + "&userKey=" + userKeygoodsDetailResp = getHtml(url)try:goods_ = goodsDetailResp['data']goods = []try:goods.append(str(goods_['prId']))except Exception as e:goods.append("")try:goods.append(str(goods_['acId']))except Exception as e:goods.append("")try:goods.append(str(goods_['preId']))except Exception as e:goods.append("")try:goods.append(str(goods_['sku']))except Exception as e:goods.append("")try:goods.append(str(goods_['dailySaleTime']))except Exception as e:goods.append("")try:goods.append(str(goods_['tmBuyStart']))except Exception as e:goods.append("")try:goods.append(str(goods_['tmBuyEnd']))except Exception as e:goods.append("")try:goods.append(str(goods_['tmShowStart']))except Exception as e:goods.append("")try:goods.append(str(goods_['tmShowEnd']))except Exception as e:goods.append("")try:goods.append(str(goods_['adUrl']))except Exception as e:goods.append("")try:goods.append(str(goods_['prName']))except Exception as e:goods.append("")try:goods.append(str(goods_['tmPickUp']))except Exception as e:goods.append("")try:goods.append(str(goods_['limitQty']))except Exception as e:goods.append("")try:goods.append(str(goods_['ulimitQty']))except Exception as e:goods.append("")try:goods.append(str(goods_['marketAmt']))except Exception as e:goods.append("")try:goods.append(str(goods_['saleAmt']))except Exception as e:goods.append("")try:goods.append(str(goods_['prType']))except Exception as e:goods.append("")try:goods.append(str(goods_['areaId']))except Exception as e:goods.append("")try:goods.append(str(goods_['shelfLife']))except Exception as e:goods.append("")try:goods.append(str(goods_['folQty']))except Exception as e:goods.append("")try:goods.append(str(goods_['daySaleQty']))except Exception as e:goods.append("")try:goods.append(str(goods_['saleQty']))except Exception as e:goods.append("")try:goods.append(str(goods_['vesName']))except Exception as e:goods.append("")try:goods.append(str(goods_['attrs']))except Exception as e:goods.append("")try:goods.append(str(goods_['prDetail']))except Exception as e:goods.append("")try:goods.append(str(goods_['shTitle']))except Exception as e:goods.append("")try:goods.append(str(goods_['prBrief']))except Exception as e:goods.append("")try:currImgPath = self.saveImg(goods_['primaryUrls'][0], productId)goods.append(currImgPath)except Exception as e:goods.append("")try:goods.append(str(goods_['detailUrls']))except Exception as e:goods.append("")try:goods.append(str(goods_['consumerNum']))except Exception as e:goods.append("")try:goods.append(str(goods_['hasImgTxt']))except Exception as e:goods.append("")try:goods.append(str(goods_['prTitle']))except Exception as e:goods.append("")try:goods.append(str(goods_['yieldly']))except Exception as e:goods.append("")try:goods.append(str(goods_['brName']))except Exception as e:goods.append("")try:goods.append(str(goods_['specialSale']))except Exception as e:goods.append("")try:goods.append(str(goods_['brId']))except Exception as e:goods.append("")try:goods.append(str(goods_['status']))except Exception as e:goods.append("")return goodsexcept Exception as e:passreturn Nonedef addStore(self, store):try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database, charset='utf8')cursor = conn.cursor()cursor.execute("insert into store(store_id, area_id) ""values('%s', '%s')" %(store['store_id'], store['area_id']))conn.commit()return Trueexcept Exception as e:return Falsedef addGoods(self, goods):try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database, charset='utf8')cursor = conn.cursor()cursor.execute("insert into goods(goods_id) ""values('%s')" %(goods['goods_id']))conn.commit()return Trueexcept Exception as e:return Falsedef initDb(self):try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database, charset='utf8')cursor = conn.cursor()cursor.execute("delete from goods")conn.commit()cursor.execute("delete from store")conn.commit()return Trueexcept Exception as e:return Falsedef run(self):dbStatus = self.initDb()if dbStatus:while True:if self.loaQueue.empty():breakloa = self.loaQueue.get()area = loa[0]mapX = loa[1]mapY = loa[2]self.initExcel(area)print("当前选中区域:" + area + ", 经纬度:mapX : " + str(mapX) + ", mapY : " + str(mapY))shopList = self.getShopList(mapX, mapY)if not shopList:print("该区域没有任何数据")for shop in shopList:storeId = shop['storeId']areaId =shop['areaId']if areaId not in areaList:areaList.append(areaId)store = {}store['store_id'] = storeIdstore['area_id'] = areaIdstoreStatus = self.addStore(store)if storeStatus:print("当前区域id:" + str(areaId))updateStatus = self.updateStore(storeId)if updateStatus:goodsList = self.getGoodsList(storeId, areaId)for goods in goodsList:productId = goods['prId']activityId = goods['acId']goodsBean = {}goodsBean['goods_id'] = productIdgoodsStatus = self.addGoods(goodsBean)if goodsStatus:goods_ = self.getGoodsDetail(productId, activityId, storeId, areaId)self.writeExcel(goods_)time.sleep(1)def getLoaQueue():loaQueue = Queue(0)with open("loas_.txt", "r", encoding="utf-8") as f:for line in f:line = line.replace("\n", "").replace(" ", "").split(",")area = line[0]lng = line[1]lat = line[2]loaQueue.put([area, lng, lat])return loaQueuedef postHtml(url, data):while True:try:resp = requests.post(url, data=data, headers=headers, timeout=10)return json.loads(resp.content.decode("utf-8"))except Exception as e:continuedef getHtml(url):while True:try:resp = requests.get(url, headers=headers_, timeout=10)return json.loads(resp.content.decode("utf-8"))except Exception as e:continuedef main():loaQueue = getLoaQueue()for i in range(threadNum):x = xsyxSpider(loaQueue)x.start()if __name__ == '__main__':main()

小程序爬虫接单、app爬虫接单、网页爬虫接单、接口定制、网站开发、小程序开发 > 点击这里联系我们 <

Python爬虫系列之爬取某优选微信小程序全国店铺商品数据相关推荐

  1. Python爬虫系列之爬取微信公众号新闻数据

    Python爬虫系列之爬取微信公众号新闻数据 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发 > 点击这里联系我们 < 微信请扫描下方二维码 代码仅供学习交流 ...

  2. Python爬虫系列之爬取某奢侈品小程序店铺商品数据

    Python爬虫系列之爬取某奢侈品小程序店铺商品数据 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发> 点击这里联系我们 < 微信请扫描下方二维码 代码仅供学 ...

  3. Python爬虫系列之爬取某社区团微信小程序店铺商品数据

    Python爬虫系列之爬取某社区团微信小程序店铺商品数据 如有问题QQ请> 点击这里联系我们 < 微信请扫描下方二维码 代码仅供学习交流,请勿用于非法用途 数据库仅用于去重使用,数据主要存 ...

  4. Python 爬虫系列:爬取全球机场信息

    前言 最近公司需要全球机场信息,用来做一些数据分析.刚好发现有个网站上有这个信息,只是没有机场的经纬度信息,不过有了机场信息,经纬度信息到时候我们自己补上去就行 网站元素分析 我们找到了有这些信息的网 ...

  5. Python爬虫系列:爬取小说并写入txt文件

    导语: 哈喽,哈喽~都说手机自带的浏览器是看小说最好的一个APP,不须要下载任何软件,直接百度就ok了. 但是小编还是想说,如果没有网,度娘还是度娘吗?能把小说下载成一个.txt文件看不是更香吗?这能 ...

  6. Python爬虫系列之爬取猫眼电影,没办法出门就补一下往期电影吧

    前言 今天给大家介绍利用Python爬取并简单分析猫眼电影影评.让我们愉快地开始吧~ 开发工具 Python版本:3.6.4 相关模块: requests模块: pyecharts模块: jieba模 ...

  7. Python爬虫实战:爬取YY上漂亮小姐姐视频

    目录 1.目标 2.确定数据所在的url 3.发送网络请求 4.数据解析 5.数据保存 6.爬取其他页数据 1.目标 本次目标是爬取YY(https://www.yy.com/)主页分类中小视频板块, ...

  8. Python爬虫实战之爬取糗事百科段子

    Python爬虫实战之爬取糗事百科段子 完整代码地址:Python爬虫实战之爬取糗事百科段子 程序代码详解: Spider1-qiushibaike.py:爬取糗事百科的8小时最新页的段子.包含的信息 ...

  9. [Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(四) —— 应对反爬技术(选取 User-Agent、添加 IP代理池以及Cookies池 )

    上一篇:[Python 爬虫] 使用 Scrapy 爬取新浪微博用户信息(三) -- 数据的持久化--使用MongoDB存储爬取的数据 最近项目有些忙,很多需求紧急上线,所以一直没能完善< 使用 ...

最新文章

  1. zookeeper模拟监控服务节点宕机
  2. 012_logback中的DBAppender
  3. mysql_rollback_MySQL的rollback--事务回滚
  4. mysql与ftp连接过慢的原因
  5. Android在全球的市场份额跃居全球第一
  6. P4169-[Violet]天使玩偶/SJY摆棋子【CDQ分治】
  7. 二叉树后序遍历的四种方法
  8. 工作185:解决vue+el-element二级联动,选项选择后不显示的问题
  9. python合并路径和文件名_Python实例 分割路径和文件名
  10. Go条件语句、switch和循环语句
  11. SpriteBuilderamp;Cocos2D使用CCEffect特效实现天黑天亮过度效果
  12. scrollLeft,scrollTop,滚动代码的总结
  13. Java StringTokenizer类使用方式
  14. 一个迅速崛起的国产开源OCR项目
  15. 基于ArcGIS模型构建器工具的土地利用现状重分类流程及常见问题
  16. python求函数零点,在函数零点问题中求解参数范围
  17. freertos和嵌入式linux区别,嵌入式工程师,你知道为什么要学RTOS?为什么要选用FreeRTOS?...
  18. 2022年全球与中国智能白板市场现状及未来发展趋势
  19. (一)通用定时器的相关介绍
  20. 【FOFA】fofa搜索引擎的常用搜索语法

热门文章

  1. php防止刷流量攻击
  2. 物联网网络安全_物联网网络及其安全
  3. es6 map()和filter()详解【转】
  4. 科普贴:SGLTE、SVLTE、CSFB、SRLTE
  5. django channels
  6. 第二篇:python基础之核心风格
  7. 如何让自己每天都过得很充实很开心--论制定计划目标的重要性
  8. 进入全盛的字节,下一步还会如何起舞?
  9. hdu 2542 补兵(二分,模拟)
  10. MySQL持续霸榜,《高性能MySQL》第4版追新巨献!