Python爬虫系列之多多买菜小程序数据爬取

小程序爬虫接单、app爬虫接单、网页爬虫接单、接口定制、网站开发、小程序开发> 点击这里联系我们 <

微信请扫描下方二维码

代码仅供学习交流,请勿用于非法用途,如有侵犯请联系删除,代码仅供参考学习

直接上代码

# -*- coding:utf-8 -*-
import requests
import json
import time
from general import getAntiContent
import random
import configparser
import MySQLdb
import osaccesstoken = ""
headers = {"content-type": "application/json;charset=UTF-8","accesstoken": accesstoken,"referer": "https://servicewechat.com/wxd9813e0a0d4d4156/49/page-frame.html","user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 MicroMessenger/7.0.17(0x17001124) NetType/WIFI Language/zh_CN","code-version": "0.0.43","verifyauthtoken": "","p-appname": "mobile-xcx-vegetable",
}
retry = 3
timeout = 20
provinceMap = {}
cf = configparser.ConfigParser()
try:cf.read(os.getcwd() + "/conf.ini", encoding="utf-8-sig")
except Exception as e:print("程序目录下不存在conf.ini配置文件~")exit(0)keywords = ""
try:keywords = getConf("app-sys", "keywords").split(",")
except Exception as e:print("keywords参数错误!")exit(0)
# 启动时间点
startTime = getConf("app-sys", "start")
startTimes = []
try:startTimes = startTime.split(",")if startTimes is not None and len(startTimes) == 1 and startTimes[0] == "":startTimes = []
except Exception as e:pass
# 数据库账号
mysql_user = getConf("Mysql-Database", "user")
# 数据库密码
mysql_password = getConf("Mysql-Database", "password")
# 数据库名称
mysql_database = getConf("Mysql-Database", "database")
# 主机地址
mysql_host = getConf("Mysql-Database", "host")
# 端口
mysql_port = getConf("Mysql-Database", "port")def querySQL(sql):try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, host=mysql_host, database=mysql_database, charset='utf8')cursor = conn.cursor()cursor.execute(sql)return cursor.fetchall()except Exception as e:return Falsedef getCurrDate():return str(time.strftime('%Y{y}%m{m}%d{d}').format(y='年', m='月', d='日'))def tsToDate(ts):if ts:timeArray = time.localtime(int(ts))return str(time.strftime("%Y-%m-%d %H:%M:%S", timeArray))return ""def getCurrentTime():return str(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))def getCityMaps():cityMaps = {}if keywords and isinstance(keywords, list) and len(keywords) > 0:for keyword in keywords:try:arr = keyword.split("-")cityMaps[arr[0]] = {"city": arr[1], "scity": arr[2], "key": arr[3], }except Exception as e:passreturn cityMapsdef iniProvinceMap():global provinceMapurl = "https://api.pinduoduo.com/api/mc/v1/user/regions"data = {"open_app_source": 1089,"anti_content": getAntiContent(),"region_id": 1,"xcx_version": "0.0.64"}res = postHtml(url, json.dumps(data))try:regions = res['regions']for region in regions:try:provinceMap[region['region_name']] = regionexcept Exception as e:passreturn Trueexcept Exception as e:passreturn Falsedef searchCity(region_id, cityName):url = "https://api.pinduoduo.com/api/mc/v1/user/regions"data = {"open_app_source": 1089,"anti_content": getAntiContent(),"region_id": int(region_id),"xcx_version": "0.0.64"}res = postHtml(url, json.dumps(data))try:regions = res['regions']for region in regions:try:if cityName in region['region_name']:return regionexcept Exception as e:passexcept Exception as e:passdef searchPoi(provinceId, cityId, districtId, key):url = "https://api.pinduoduo.com/api/mc/v1/search_poi"data = {"open_app_source": 1089,"anti_content": getAntiContent(),"provinceId": int(provinceId),"query": str(key),"cityId": int(cityId),"districtId": int(districtId),"xcx_version": "0.0.64"}res = postHtml(url, json.dumps(data))try:poi_list = res['poi_list']return poi_listexcept Exception as e:passdef getStore(provinceId, cityId, key):url = "https://api.pinduoduo.com/api/mc/v1/user/regions"data = {"open_app_source": 1089,"anti_content": getAntiContent(),"region_id": int(cityId),"xcx_version": "0.0.64"}res = postHtml(url, json.dumps(data))try:regions = res['regions']for region in regions:try:districtId = region['region_id']poiList = searchPoi(provinceId, cityId, districtId, key)if poiList and isinstance(poiList, list) and len(poiList) > 0:for poi in poiList:try:poiId = poi['poi_id']store = searchStore(poiId)if store:return storeexcept Exception as e:passexcept Exception as e:passexcept Exception as e:passdef getGoodsDetail(store_id, goods_id, city):url = "https://api.pinduoduo.com/api/mc/v0/goods_detail"data = {"open_app_source": 1089,"anti_content": getAntiContent(),"store_id": str(store_id),"goods_id": str(goods_id),"xcx_version": "0.0.64"}res = postHtml(url, json.dumps(data))try:datas = {}try:datas['goods_id'] = int(appflag + str(res['goods_id']))except Exception as e:returntry:datas['area'] = cityexcept Exception as e:datas['area'] = ""try:goods_name = str(res['goods_name'])if "【" not in goods_name and "】" not in goods_name:pname = goods_name.split(" ")if len(pname) > 1:goods_name = goods_name.replace(pname[0], "【" + pname[0] + "】")datas['goods_name'] = goods_nameexcept Exception as e:datas['goods_name'] = ""try:datas['sc_price'] = float("%.2f" % (float(res['market_price']) / 100))except Exception as e:datas['sc_price'] = 0.00try:datas['ysj_price'] = float("%.2f" % (float(res['price']) / 100))except Exception as e:datas['ysj_price'] = 0.00try:datas['xg_num'] = res['regular_limit']except Exception as e:datas['xg_num'] = 0try:datas['xs_nums'] = sellNumexcept Exception as e:datas['xs_nums'] = 0try:datas['start_time'] = int(res['pre_sale_time'])except Exception as e:datas['start_time'] = 0try:datas['end_time'] = int(res['end_sale_time'])except Exception as e:datas['end_time'] = 0try:datas['qy_address'] = city + "多多买菜"except Exception as e:datas['qy_address'] = ""try:datas['imageb_url'] = detailPre + str(datas['goods_id'])except Exception as e:datas['imageb_url'] = ""try:sy_image = res['image_url']if "?" in sy_image:sy_image = sy_image[:sy_image.find("?")]datas['sy_image'] = sy_imageexcept Exception as e:datas['sy_image'] = ""return datasexcept Exception as e:passdef checkGoodsExists(pid):try:conn = MySQLdb.connect(user=mysql_user, password=mysql_password, database=mysql_database, charset='utf8',host=mysql_host)cursor = conn.cursor()cursor.execute("select * from goods_list where goods_id = %d" % (int(pid)))return len(cursor.fetchall()) > 0except Exception as e:return Falsedef add(data):print("insert ----------------------------------------------------")print(data)try:conn = MySQLdb.connect(user=mysql_user, host=mysql_host, password=mysql_password, database=mysql_database,charset='utf8')cursor = conn.cursor()sql = ""cursor.execute(sql)conn.commit()except Exception as e:passdef update(data):print("update ----------------------------------------------------")print(data)try:conn = MySQLdb.connect(user=mysql_user, host=mysql_host, password=mysql_password, database=mysql_database,charset='utf8')cursor = conn.cursor()sql = ""cursor.execute(sql)conn.commit()except Exception as e:passdef parser(storeId, city):page = 0url = "https://api.pinduoduo.com/api/mc/v0/goods_list"while True:try:data = {"open_app_source": 1089,"anti_content": getAntiContent(),"store_id": int(storeId),"list_id": "0d95f10a-620f-4d29-a087-894ff90239a4","offset": page * 10,"count": 10,"xcx_version": "0.0.64"}res = postHtml(url, json.dumps(data))has_more = res['has_more']goods_list = res['goods_list']for goods in goods_list:try:goodsId = goods['goods_id']datas = getGoodsDetail(storeId, goodsId, city)existsStatus = checkGoodsExists(datas['goods_id'])if existsStatus:update(datas)else:add(datasexcept Exception as e:passif has_more:page += 1time.sleep(getSleepTime())else:breakexcept Exception as e:breakdef main():global provinceMapcityMaps = getCityMaps()if cityMaps:for cityMap in cityMaps:try:province = provinceMap[cityMap]provinceId = province['region_id']bcity = cityMaps[cityMap]cityName = bcity['city']key = bcity['key']scity = bcity['scity']acity = searchCity(provinceId, cityName)cityId = acity['region_id']store = getStore(provinceId, cityId, key)if store:storeId = store['store_id']parser(storeId, scity)else:print("关键词组:%s 未搜索到任何店铺!" % (cityMap + " - " + cityName + " - " + key))except Exception as e:passelse:print("获取城市列表失败!")else:print("登录过期!")if __name__ == '__main__':main()

Python爬虫系列之多多买菜小程序数据爬取相关推荐

  1. Python爬虫系列之MeiTuan网页美食版块商家数据爬取

    Python爬虫系列之MeiTuan网页美食版块商家数据爬取 小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发> 点击这里联系我们 < 微信请扫描下方二维码 代 ...

  2. Python爬虫系列之肯德基宅急送小程序kbcts、kbsv算法

    Python爬虫系列之肯德基宅急送小程序kbcts.kbsv算法 如有疑问> 点击这里与我交流 < 微信请扫描下方二维码 代码仅供学习交流,请勿用于非法用途 直接上代码 import re ...

  3. 「Python爬虫系列讲解」十二、基于图片爬取的 Selenium 爬虫

    本专栏是以杨秀璋老师爬虫著作<Python网络数据爬取及分析「从入门到精通」>为主线.个人学习理解为主要内容,以学习笔记形式编写的. 本专栏不光是自己的一个学习分享,也希望能给您普及一些关 ...

  4. python爬虫和数据分析的书籍_豆瓣书籍数据爬取与分析

    前言 17年底,买了清华大学出版社出版的<Hadoop权威指南>(第四版)学习,没想到这本书质量之差,超越我的想象,然后上网一看,也是骂声一片.从那个时候其就对出版社综合实力很感兴趣,想通 ...

  5. 【Python爬虫系列教程 31-100】通过scrapy框架、爬取汽车之家宝马5系图片,学习Images管道

    现在爬取的汽车之家宝马5系车的图片,可以看到在这个网址里面,放了车的不同部位的图片,下面就要将他们爬下来并且保存到不同的文件夹. 首先用到的是pycharm IDE,这个软件可以提高编写代码的效率,因 ...

  6. 【Python爬虫】5行代码破解验证码+网页数据爬取全步骤详细记录

    文章目录 前言 一.抓包分析 二.编写模块代码 1.引入库 2.获取验证码图片 3.识别验证码 4.爬取列表页 5.爬取详情页 6.完整代码 总结 1.TIPS 2.如需交流,可在代码头找到我,或者用 ...

  7. 【python爬虫专项(19)】blibli弹幕数据爬取(以全站搜索蔡徐坤的视频为例)

    blibli任意搜索关键字,相关视频的弹幕数据采集 参考网址:B站蔡徐坤 爬虫逻辑:[分页url采集]-[视频页面url采集]-[视频页面数据采集 / cid信息 / 弹幕xml数据采集] 弹幕xml ...

  8. Python爬虫入门教程 15-100 石家庄政民互动数据爬取

    写在前面 今天,咱抓取一个网站,这个网站呢,涉及的内容就是 网友留言和回复,特别简单,但是网站是gov的.网址为 http://www.sjz.gov.cn/col/1490066682000/ind ...

  9. Python爬虫新手入门教学(十八):爬取yy全站小视频

    前言 本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理. Python爬虫.数据分析.网站开发等案例教程视频免费在线观看 https://space. ...

最新文章

  1. 2022-2028年中国PGA树脂行业全景调研及投资前景展望报
  2. ps里面的批处理教程
  3. java rsa ssh2_给定两个SSH2密钥,如何检查它们是否属于Java中的同一密钥对?
  4. STM32开发 -- DMA详解
  5. boost::hana::size用法的测试程序
  6. Android之JNI的使用
  7. 电脑重新分区扩大c盘_两种方法,给电脑C盘增加10G的容量,电脑焕然一新
  8. Java单元测试技巧之PowerMock
  9. 2.1线性表的类型定义
  10. 微信小程序_小程序开发框架
  11. 【英语学习】【Level 08】U03 My Choice L6 Stories that make an impact
  12. chromedriver放在哪个目录下_Windows下ThinkPHP与Linux互通
  13. main run方法没用_多线程:解决Runnable接口无start()方法的问题
  14. python登录网页版易信_易信网页版下载|易信网页版登陆客户端官方最新版 2.1.1103.0 - 系统天堂...
  15. 苹果爸爸发飙,封杀 React Native?
  16. 17. Django进阶:缓存
  17. EXTJS弹出框关闭 隐藏 显示都没问题
  18. 玩转5G之--网络布线2 详细解说
  19. linux polkitd 漏洞,Ubuntu Linux中的特权提升漏洞Dirty Sock分析(含PoC)
  20. WordPress 前端投稿/编辑发表文章插件 DJD Site Post(支持游客和已注册用户)汉化版 免费下载...

热门文章

  1. Caused by: com.mysql.cj.exceptions.InvalidConnectionAttributeException: The server time zone value '
  2. 发了几个算法的vb.net代码
  3. 微mysql命令行_MySQL之命令行简单操作MySQL(二)
  4. 数据挖掘之朴素贝叶斯算法
  5. 解决动态加载java.lang.UnsatisfiedLinkError: dlopen failed * is 32-bit instead of 64-bit 报错
  6. centos65 安装 脸书的proxygen
  7. 无线电改变了我们的生活
  8. 【分享】酒店7S管理实施内容
  9. 丁鹿学堂:git入门指南,简单易用
  10. python版本分类及区别_python新版本与旧版本的区别