Python爬虫_某宝网案例

一、导入第三方库,确定url,定义headers ,伪装爬虫代码

import requests
url = 'https://s.taobao.com/search?q=%E6%98%BE%E5%8D%A1&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20220811&ie=utf8'
headers = {'cookie':'cna=zIsvG8QofGgCAXAc0HQF5jMC; ariaDefaultTheme=undefined; t=9ac1f71719420207d1f87d27eb676a4c; _m_h5_tk=adcc3c021e3b87caf717886de2956b4f_1660197714179; _m_h5_tk_enc=1af4dc9e2bf60884ef3d0e255253f6b2; xlly_s=1; cookie2=16aa0d04efd876db9a0a6ea3a6201798; _tb_token_=e8f30e5eeeaee; _samesite_flag_=true; sgcookie=E100lJaxeK%2FAPyj3QKfLcL9nnFAvbSQ1NVa%2Fj5KnkOmbyuRuRVi5UIhuo%2F950QL5HA5pu7UW1W7o5e1gKyskjeASeiG%2Fu8b%2Bx2w%2BNK1TNfbC3%2BY%3D; unb=3403337303; uc1=cookie15=Vq8l%2BKCLz3%2F65A%3D%3D&cookie14=UoeyDt7VJs5rtg%3D%3D&existShop=false&pas=0&cookie16=W5iHLLyFPlMGbLDwA%2BdvAGZqLg%3D%3D&cookie21=VFC%2FuZ9ainBZ; uc3=lg2=W5iHLLyFOGW7aA%3D%3D&id2=UNQ3HL3rNGIh9Q%3D%3D&vt3=F8dCv4G1KArg9Z5EDnI%3D&nk2=py7xJGsI3wn8W4Q%3D; csg=abea7184; lgc=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; cancelledSubSites=empty; cookie17=UNQ3HL3rNGIh9Q%3D%3D; dnk=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; skt=7eb00df2545b28f1; existShop=MTY2MDE4Nzg5Mw%3D%3D; uc4=id4=0%40UgP8IaO4dk7rKbnRwpAL1RCASure&nk4=0%40pRj%2BYG91XDR4VZfDtp5sZkTvbfnKjg%3D%3D; tracknick=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; _cc_=UIHiLt3xSw%3D%3D; _l_g_=Ug%3D%3D; sg=f3f; _nk_=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; cookie1=URmvlmqe9vvqj4%2FetXdyS32Np7aof75Ji3WJNOrxmAo%3D; enc=Wc21Ym4ZtT2bAKugjrg4mga24om36KJRqmV58dwu1eCI9NiOMGxoPn%2BuEfXDf82wAhxp6sq2XAkI8TAxsuD0CQ%3D%3D; JSESSIONID=110B64FBCE3C522DA285BDE7FEF11591; tfstk=cun5BPOtj_fSjuRbgz928VtWelqCZadghwVxFImyTdyXp5M5i5ja1Iq4G_qUp-1..; l=eB_Q_LVPLdI5ulzEBOfwnurza77tsIRAguPzaNbMiOCPO-1p5S3FW6YRMrT9CnGVh6kvR3k0hWaBBeYBqIv4n5U62j-lasDmn; isg=BBoasw0KLOE0w6BNINhb8iDla8A8S54lfo04kySTwK14l7rRDNnmNRflY2MLRxa9','referer':'https://s.taobao.com/search?q=%E6%98%BE%E5%8D%A1&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.jianhua.201856-taobao-item.2&ie=utf8&initiative_id=tbindexz_20170306','sec-ch-ua':'"Chromium";v="104", " Not A;Brand";v="99", "Microsoft Edge";v="104"','sec-ch-ua-mobile':'?0','sec-ch-ua-platform':'"Windows"','sec-fetch-dest':'document','sec-fetch-mode':'navigate','sec-fetch-site':'same-origin','sec-fetch-user':'?1','upgrade-insecure-requests':'1','user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36 Edg/104.0.1293.47',
}

注:选中文本后,ctrl+R,采用正则表达式:(.*?):(.*)全部替换为'$1':'$2',这样我们就将每个字段添加了单引号。
替换过后,务必将多余的空格删除,否则会报错

二、版本一完整代码(数据保存至CSV文件)

import re
import json
import pprint
import requests
import csv # 写入csv文件中with open('taobao.csv','w',encoding='ANSI',newline='') as filename :# 定义表头csvwriter = csv.DictWriter(filename,fieldnames=['标题','价格','店铺','购买人数','地点','商品详情页','店铺链接','图片链接'])# 写入表头csvwriter.writeheader()url = 'https://s.taobao.com/search?q=%E6%98%BE%E5%8D%A1&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20220811&ie=utf8'headers = {'cookie':'cna=zIsvG8QofGgCAXAc0HQF5jMC; ariaDefaultTheme=undefined; t=9ac1f71719420207d1f87d27eb676a4c; _m_h5_tk=adcc3c021e3b87caf717886de2956b4f_1660197714179; _m_h5_tk_enc=1af4dc9e2bf60884ef3d0e255253f6b2; xlly_s=1; cookie2=16aa0d04efd876db9a0a6ea3a6201798; _tb_token_=e8f30e5eeeaee; _samesite_flag_=true; sgcookie=E100lJaxeK%2FAPyj3QKfLcL9nnFAvbSQ1NVa%2Fj5KnkOmbyuRuRVi5UIhuo%2F950QL5HA5pu7UW1W7o5e1gKyskjeASeiG%2Fu8b%2Bx2w%2BNK1TNfbC3%2BY%3D; unb=3403337303; uc1=cookie15=Vq8l%2BKCLz3%2F65A%3D%3D&cookie14=UoeyDt7VJs5rtg%3D%3D&existShop=false&pas=0&cookie16=W5iHLLyFPlMGbLDwA%2BdvAGZqLg%3D%3D&cookie21=VFC%2FuZ9ainBZ; uc3=lg2=W5iHLLyFOGW7aA%3D%3D&id2=UNQ3HL3rNGIh9Q%3D%3D&vt3=F8dCv4G1KArg9Z5EDnI%3D&nk2=py7xJGsI3wn8W4Q%3D; csg=abea7184; lgc=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; cancelledSubSites=empty; cookie17=UNQ3HL3rNGIh9Q%3D%3D; dnk=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; skt=7eb00df2545b28f1; existShop=MTY2MDE4Nzg5Mw%3D%3D; uc4=id4=0%40UgP8IaO4dk7rKbnRwpAL1RCASure&nk4=0%40pRj%2BYG91XDR4VZfDtp5sZkTvbfnKjg%3D%3D; tracknick=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; _cc_=UIHiLt3xSw%3D%3D; _l_g_=Ug%3D%3D; sg=f3f; _nk_=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; cookie1=URmvlmqe9vvqj4%2FetXdyS32Np7aof75Ji3WJNOrxmAo%3D; enc=Wc21Ym4ZtT2bAKugjrg4mga24om36KJRqmV58dwu1eCI9NiOMGxoPn%2BuEfXDf82wAhxp6sq2XAkI8TAxsuD0CQ%3D%3D; JSESSIONID=110B64FBCE3C522DA285BDE7FEF11591; tfstk=cun5BPOtj_fSjuRbgz928VtWelqCZadghwVxFImyTdyXp5M5i5ja1Iq4G_qUp-1..; l=eB_Q_LVPLdI5ulzEBOfwnurza77tsIRAguPzaNbMiOCPO-1p5S3FW6YRMrT9CnGVh6kvR3k0hWaBBeYBqIv4n5U62j-lasDmn; isg=BBoasw0KLOE0w6BNINhb8iDla8A8S54lfo04kySTwK14l7rRDNnmNRflY2MLRxa9','referer':'https://s.taobao.com/search?q=%E6%98%BE%E5%8D%A1&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.jianhua.201856-taobao-item.2&ie=utf8&initiative_id=tbindexz_20170306','sec-ch-ua':'"Chromium";v="104", " Not A;Brand";v="99", "Microsoft Edge";v="104"','sec-ch-ua-mobile':'?0','sec-ch-ua-platform':'"Windows"','sec-fetch-dest':'document','sec-fetch-mode':'navigate','sec-fetch-site':'same-origin','sec-fetch-user':'?1','upgrade-insecure-requests':'1','user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36 Edg/104.0.1293.47',}response = requests.get(url=url,headers=headers)# print(response.text)html_data = re.findall('g_page_config = (.*);',response.text)[0]# print(html_data)json_data = json.loads(html_data) # 将原本的json数据格式转换为了python字典# pprint.pprint(json_data)# 产品标题 raw_title 在'mods' 'itemlist' 'data' 'auctions'标签内data = json_data['mods']['itemlist']['data']['auctions']for index in data :dict = {'标题' : index['raw_title'], # 将标题取出写入字典中'价格' : index['view_price'],'店铺' : index['nick'],'购买人数' : index['view_sales'],'地点' : index['item_loc'],'商品详情页' : 'https:' + index['detail_url'],'店铺链接' : index['shopLink'],'图片链接' : 'https:' + index['pic_url']}csvwriter.writerow(dict) # 数据写入csv文件print(dict)

三、版本二完整代码(数据保存至sqlite3数据库)

import re
import json
import pprint
import requests
import csv # 写入csv文件中
import sqlite3  #进行SQLite数据库操作dbpath = 'taobao.db'
def getdata() :init_db(dbpath)conn = sqlite3.connect(dbpath)cur = conn.cursor()  # 获取游标url = 'https://s.taobao.com/search?q=%E6%98%BE%E5%8D%A1&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20220811&ie=utf8'headers = {'cookie':'cna=zIsvG8QofGgCAXAc0HQF5jMC; ariaDefaultTheme=undefined; t=9ac1f71719420207d1f87d27eb676a4c; _m_h5_tk=adcc3c021e3b87caf717886de2956b4f_1660197714179; _m_h5_tk_enc=1af4dc9e2bf60884ef3d0e255253f6b2; xlly_s=1; cookie2=16aa0d04efd876db9a0a6ea3a6201798; _tb_token_=e8f30e5eeeaee; _samesite_flag_=true; sgcookie=E100lJaxeK%2FAPyj3QKfLcL9nnFAvbSQ1NVa%2Fj5KnkOmbyuRuRVi5UIhuo%2F950QL5HA5pu7UW1W7o5e1gKyskjeASeiG%2Fu8b%2Bx2w%2BNK1TNfbC3%2BY%3D; unb=3403337303; uc1=cookie15=Vq8l%2BKCLz3%2F65A%3D%3D&cookie14=UoeyDt7VJs5rtg%3D%3D&existShop=false&pas=0&cookie16=W5iHLLyFPlMGbLDwA%2BdvAGZqLg%3D%3D&cookie21=VFC%2FuZ9ainBZ; uc3=lg2=W5iHLLyFOGW7aA%3D%3D&id2=UNQ3HL3rNGIh9Q%3D%3D&vt3=F8dCv4G1KArg9Z5EDnI%3D&nk2=py7xJGsI3wn8W4Q%3D; csg=abea7184; lgc=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; cancelledSubSites=empty; cookie17=UNQ3HL3rNGIh9Q%3D%3D; dnk=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; skt=7eb00df2545b28f1; existShop=MTY2MDE4Nzg5Mw%3D%3D; uc4=id4=0%40UgP8IaO4dk7rKbnRwpAL1RCASure&nk4=0%40pRj%2BYG91XDR4VZfDtp5sZkTvbfnKjg%3D%3D; tracknick=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; _cc_=UIHiLt3xSw%3D%3D; _l_g_=Ug%3D%3D; sg=f3f; _nk_=%5Cu9006%5Cu98CE%5Cu8FFD%5Cu98CEzgf; cookie1=URmvlmqe9vvqj4%2FetXdyS32Np7aof75Ji3WJNOrxmAo%3D; enc=Wc21Ym4ZtT2bAKugjrg4mga24om36KJRqmV58dwu1eCI9NiOMGxoPn%2BuEfXDf82wAhxp6sq2XAkI8TAxsuD0CQ%3D%3D; JSESSIONID=110B64FBCE3C522DA285BDE7FEF11591; tfstk=cun5BPOtj_fSjuRbgz928VtWelqCZadghwVxFImyTdyXp5M5i5ja1Iq4G_qUp-1..; l=eB_Q_LVPLdI5ulzEBOfwnurza77tsIRAguPzaNbMiOCPO-1p5S3FW6YRMrT9CnGVh6kvR3k0hWaBBeYBqIv4n5U62j-lasDmn; isg=BBoasw0KLOE0w6BNINhb8iDla8A8S54lfo04kySTwK14l7rRDNnmNRflY2MLRxa9','referer':'https://s.taobao.com/search?q=%E6%98%BE%E5%8D%A1&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.jianhua.201856-taobao-item.2&ie=utf8&initiative_id=tbindexz_20170306','sec-ch-ua':'"Chromium";v="104", " Not A;Brand";v="99", "Microsoft Edge";v="104"','sec-ch-ua-mobile':'?0','sec-ch-ua-platform':'"Windows"','sec-fetch-dest':'document','sec-fetch-mode':'navigate','sec-fetch-site':'same-origin','sec-fetch-user':'?1','upgrade-insecure-requests':'1','user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36 Edg/104.0.1293.47',}response = requests.get(url=url,headers=headers)# print(response.text)html_data = re.findall('g_page_config = (.*);',response.text)[0]# print(html_data)json_data = json.loads(html_data) # 将原本的json数据格式转换为了python字典# pprint.pprint(json_data)# 产品标题 raw_title 在'mods' 'itemlist' 'data' 'auctions'标签内data = json_data['mods']['itemlist']['data']['auctions']for value in data :sql = '''insert into taobao(rawtitle,viewprie,nick,viewsales,itemloc,detailurl,shoplink,picurl)values('%s','%s','%s','%s','%s','%s','%s','%s')'''% (value['raw_title'],value['view_price'],value['nick'],value['view_sales'],value['item_loc'],value['detail_url'],value['shopLink'],value['pic_url'])cur.execute(sql)conn.commit()cur.close()conn.close()# 初始化创建数据库
def init_db(dbpath) :sql = '''create table taobao(id integer primary key autoincrement,rawtitle varchar,viewprie numeric,nick varchar,viewsales varchar,itemloc varchar,detailurl text,shoplink text,picurl text) '''conn = sqlite3.connect(dbpath)cursor = conn.cursor() # 获取数据库游标cursor.execute(sql)conn.commit()conn.close()getdata()
print("保存完成!")

Python爬虫_某宝网案例相关推荐

  1. Python爬虫_案例分析(二)

    Python爬虫_案例分析(二) 一.电影天堂案例 import scrapy from scrapy_movie.items import ScrapyMovieItem class MvSpide ...

  2. Python爬虫进阶——urllib模块使用案例【淘宝】

    Python爬虫基础--HTML.CSS.JavaScript.JQuery网页前端技术 Python爬虫基础--正则表达式 Python爬虫基础--re模块的提取.匹配和替换 Python爬虫基础- ...

  3. Python爬虫_音乐案例

    Python爬虫_音乐案例 [案例目的]:下载音乐 [第三方库]:1.requests 2.perttytable [开发环境]:1.Python3.8 2.PyCharm 2022.1 # http ...

  4. python爬虫爬取音乐_利用python爬虫实现爬取网易云音乐热歌榜

    利用python爬虫实现爬取网易云音乐热歌榜 发布时间:2020-11-09 16:12:28 来源:亿速云 阅读:102 作者:Leah 本篇文章给大家分享的是有关利用python爬虫实现爬取网易云 ...

  5. python爬虫_网易音乐歌单

    小白flag7 python爬虫_网易音乐歌单 准备 import os #存放文件处理 import time #程序运行时间计算 import sys #错误信息返回 预留 from seleni ...

  6. python爬虫爬猎聘网获取多条职责描述中有Linux需求的招聘信息

    python爬虫爬猎聘网获取多条职责描述中有Linux需求的招聘信息 下列是我爬虫的作业 摘 要 随着现代化社会的飞速发展,网络上巨大信息量的获取给用户带来了许多的麻烦.由于工作和生活节奏的需求,人们 ...

  7. 用Python爬虫来爬写真网图片

    用Python爬虫来爬写真网图片 1.我们先要知道Python爬虫的原理 基本的Python爬虫原理很简单,分为三步 获取网页源码 通过分析源码并通过代码来获取其中想要的内容 进行下载或其他操作 话不 ...

  8. 在当当买了python怎么下载源代码-python爬虫爬取当当网

    [实例简介]python爬虫爬取当当网 [实例截图] [核心代码] ''' Function: 当当网图书爬虫 Author: Charles 微信公众号: Charles的皮卡丘 ''' impor ...

  9. python 爬取淘宝网课

    python爬取淘宝网课,打开web控制台,发现有个链接可以下载到对应的内容,下载的格式是m3u8,用文本打开里面是许多.ts链接,当然百度后得知可以直接下个vlc然后下载,但是还是想用python试 ...

最新文章

  1. 互联网还留给我们这些出路
  2. 认识ASP.NET配置文件Web.config
  3. linux正则表达式BRE
  4. 我眼中未来的计算机作文600字,我的电脑作文600字
  5. C#调用WebService出现“基础连接已经关闭:接收时发生错误”错误
  6. 17.matlab中各种文件的I/O操作2——fopen操作
  7. 想靠写程序赚更多钱,写到两眼通红,写得比别人都又快又好好几倍,结果又能如何?...
  8. c语言case后面多字符,多SWITCH-CASE结构时的C语言对象方式化解
  9. 200827C阶段一_C++基础
  10. 大数据之-Hadoop3.x_MapReduce_WordCount案例集群运行---大数据之hadoop3.x工作笔记0093
  11. 罗永浩抖音直播首秀:3小时1.1亿;微软曝三屏折叠机专利;Linux Mint 20仅提供64位版本 | 极客头条...
  12. Linux日志被删处理方法
  13. Android的常用零碎代码块
  14. 【软件下载】Axure10正式版(含汉化包)
  15. linux让别人电脑蓝屏,愚人节必备,教你制作整人神器,用代码实现计算机蓝屏...
  16. Android Studio高级使用技巧
  17. 质量管理8D报告详解,附报告模板
  18. Android NDK不得不说的秘密
  19. 公用电信网间互联管理规定
  20. 最新版!国内IT软件外包公司汇总~

热门文章

  1. 485转 232设计
  2. 瑞熙贝通大型仪器共享预约平台建设方案
  3. 如何成为某个领域的达人?
  4. 100集华为HCIE安全培训视频教材整理 | 防火墙用户管理与认证技术(一)
  5. 通达信交易接口代码的执行过程分享
  6. 搜苹果ipad版_iPad抠图比PC更给力 iPad版PS的自动抠图神了-iPad,PS ——快科技(驱动之家旗下媒体)-...
  7. 木姐说副业利用短视频,操作驾考赚钱项目副业,轻松实现财务自由
  8. python correlate_如何解释numpy.correlate和numpy.corrcoef值?
  9. 弱电控制强电:继电器的使用方法-道合顺大数据infinigo
  10. 项目人生,人生项目--王如龙语录