python爬虫爬取淘宝网页

首先进行相关的分析

要想爬取相关的信息，必须指导如下信息：

1、访问接口

2、翻页操作

首先进行搜索，得到相关的网址：https://s.taobao.com/search?q=书包&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.50862.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170501

然后进行查看第二页的网址：https://s.taobao.com/search?q=书包&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.50862.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170501&bcoffset=4&ntoffset=4&p4ppushleft=1%2C48&s=44

继续查看第三页的网址：https://s.taobao.com/search?q=书包&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.50862.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170501&bcoffset=4&ntoffset=4&p4ppushleft=1%2C48&s=88

进行仔细的观察就知道其中的奥妙所在

所以我们整个程序的设计结构如下:

1、提交商品搜索请求，循环获取页面

2、对于每个页面，提取商品名称和价格信息

3、将信息输出到屏幕上

主要的函数有

1、爬取网页

def getHTMLText(url):try:r=requests.get(url,timeout=30)r.raise_for_status()r.encoding=r.apparent_encodingreturn r.textexcept:print("")

2、进行信息提取

def parsePage(ilt,html):try:#在爬取下来的网页中进行查找价格plt=re.findall(r'\"view_price\"\:\"[\d\.]*\"',html)#在爬取下来的网页中查找物品tlt=re.findall(r'\"raw_title\"\:\".*?\"',html)for i in range(len(plt)):price=eval(plt[i].split(':')[1])title=eval(tlt[i].split(':')[1])ilt.append([price,title])except:print("")

3、进行输出

def printGoodsList(ilt):tplt = "{:4}\t{:8}\t{:16}"print(tplt.format("序号", "价格", "商品名称"))count = 0for g in ilt:count = count + 1print(tplt.format(count, g[0], g[1]))

4、主函数

def main():goods='书包'depth=2start_url='https://s.taobao.com/search?q='+goodsinfoList=[]for i in range(depth):try:#str函数的作用是将其中的内容转换为字符串url=start_url+'&s='+str(44*i)html=getHTMLText(url)parsePage(infoList , html)except:continueprintGoodsList(infoList)

下面贴出完整的代码

import requests
import re#获取页面函数、
def getHTMLText(url):try:r=requests.get(url,timeout=30)r.raise_for_status()r.encoding=r.apparent_encodingreturn r.textexcept:print("")
#对获取页面进行解析
def parsePage(ilt,html):try:#在爬取下来的网页中进行查找价格plt=re.findall(r'\"view_price\"\:\"[\d\.]*\"',html)#在爬取下来的网页中查找物品tlt=re.findall(r'\"raw_title\"\:\".*?\"',html)for i in range(len(plt)):price=eval(plt[i].split(':')[1])title=eval(tlt[i].split(':')[1])ilt.append([price,title])except:print("")
#进行打印
def printGoodsList(ilt):tplt = "{:4}\t{:8}\t{:16}"print(tplt.format("序号", "价格", "商品名称"))count = 0for g in ilt:count = count + 1print(tplt.format(count, g[0], g[1]))def main():goods='书包'depth=2start_url='https://s.taobao.com/search?q='+goodsinfoList=[]for i in range(depth):try:#str函数的作用是将其中的内容转换为字符串url=start_url+'&s='+str(44*i)html=getHTMLText(url)parsePage(infoList , html)except:continueprintGoodsList(infoList)main()

python爬虫爬取淘宝网页相关推荐

简单使用Python爬虫爬取淘宝网页商品信息
最近在学习爬虫,本人还是入门级的小白,自己跟着老师写了一些代码,算是自己的总结,还有一些心得,跟大家分享一下,如果不当,还请各位前辈斧正. 这是代码: # 导入库 import requests im ...
利用Python爬虫爬取淘宝商品做数据挖掘分析实战篇，超详细教程
项目内容本案例选择>> 商品类目:沙发: 数量:共100页 4400个商品: 筛选条件:天猫.销量从高到低.价格500元以上. 项目目的 1. 对商品标题进行文本分析词云可视化 2. ...
python爬虫 — 爬取淘宝商品信息
(一)确定需要爬取的信息在爬取前首先确定需要获取的信息,打开taobao,在搜索框中输入,需要获取的商品的信息,比如ipad,点击搜索就可以看到许多的ipad,选择其中的一款商品,比如第一个可以 ...
python爬虫爬取淘宝，罗兰电钢琴和雅马哈电钢琴（参考崔大）
淘宝网上有很多商品,这些商品的信息就是一个很不错的数据来源,于是我参考资料后依葫芦画瓢弄了一个爬虫程序来爬一爬梦寐以求的电钢琴. 声明一下:电钢琴和电子琴是两种不同的琴,我在正则表达式里面设置了只要含 ...
python爬虫爬取淘宝商品并保存至mongodb数据库
使用工具介绍 python3.8 selenium(请确保你已经成功安装了谷歌驱动chromedriver) mongodb数据库 mongo-compass 谷歌浏览器分析请求链接打开淘宝首页的 ...
python爬虫爬取淘宝搜索页面商品信息数据
主要使用的库: requests:爬虫请求并获取源码 re:使用正则表达式提取数据 json:使用JSON提取数据 pandas:使用pandans存储数据以下是源代码: #!coding=utf- ...
Python爬虫爬取淘宝、天猫某商品页面相关信息实例
一.爬取天猫店铺的相关信息 URL="https://detail.tmall.com/item.htm?spm=a230r.1.14.8.4a1a115fb1rHn5&id=617 ...
使用python爬虫——爬取淘宝图片和知乎内容
本文主要内容: 目标:使用python爬取淘宝图片:使用python的一个开源框架pyspider(非常好用,一个国人写的)爬取知乎上的每个问题,及这个问题下的所有评论最简单的爬虫--如下pytho ...
网络爬虫爬取淘宝页面商品信息
网络爬虫爬取淘宝页面商品信息最近在MOOC上看嵩老师的网络爬虫课程,按照老师的写法并不能进行爬取,遇到了一个问题,就是关于如何"绕开"淘宝登录界面,正确的爬取相关信息.通过百度找 ...

python爬虫爬取淘宝网页

python爬虫爬取淘宝网页相关推荐

最新文章

热门文章