python 使用爬虫下载京东图片

首先打开京东商城-手机专栏https://list.jd.com/list.html?cat=9987,653,655&page=1&sort=sort_rank_asc&trans=1&JL=6_0_0#J_main

打开页面后来分析要提取的图片地址，提取地址后再用urllib.request.urlretrieve()来保存图片到本地

右键---查看页面源代码--

ctrl +f 输入第一个图片的名称--Apple iPhone 7 (A1660) 128G 黑色移动联通电信4G手机

定位到第一个图位置并找到一个具有唯一性特征的标识信息，用于信息提取 <div id="plist"

然后找到最后一个图片的位置并找下面具有唯一标识的信息 <div class="page clearfix">

所以第一个pat='<div id="plist".+? <div class="page clearfix">'

然后是最关键的图片源代码分析

<img width="220" height="220" data-img="1" data-lazy-img="//img13.360buyimg.com/n7/jfs/t3961/190/2233466155/341332/2e3803d1/58a55d2aN18488958.jpg">

<img width="220" height="220" data-img="1" data-lazy-img="//img10.360buyimg.com/n7/jfs/t19855/305/881807243/378198/d7130fdd/5b0d0fd8N88e7901d.jpg">

<img width="220" height="220" data-img="1" src="//img12.360buyimg.com/n7/jfs/t2611/360/858752078/90212/68466704/5728910cNd55ac232.jpg">

<img width="220" height="220" data-img="1" src="//img10.360buyimg.com/n7/jfs/t18406/198/1607027948/289456/3e86953e/5acdb065N8f39d863.jpg">

根据上面的对比分析得出下面pat

pat2='<img width="220" height="220" data-img="1" src="//(.+?\.jpg)">'

pat3='<img width="220" height="220" data-img="1" data-lazy-img="//(.+?\.jpg)">'

根据上面的分析下面编写完整代码

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
import os
import sys
import urllib.requestdef craw(url,page):req=urllib.request.Request(url)req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0")html1=urllib.request.urlopen(req).read()html1 = str(html1)##匹配元素1---父节点
#    pat1 = '<div id="plist".+? <div class="page clearfix">';
#    result1 = re.compile(pat1).findall(html1);
#    result1 = result1[0];##匹配元素2--子节点pat2='<img width="220" height="220" data-img="1" src="//(.+?\.jpg)">'pat3='<img width="220" height="220" data-img="1" data-lazy-img="//(.+?\.jpg)">'  ##提取连接地址list--并合并listimagelist=re.compile(pat2).findall(html1)imagelist1=re.compile(pat3).findall(html1)imagelist2=imagelist+imagelist1x=1for imgurl in imagelist2:#设置地址跟爬取图片的地址名称imagename="/home/urllib/test/image/"+str(page)+str(x)+".jpg"imgurl= "http://" +imgurlprint(imgurl)try:#保存图片urllib.request.urlretrieve(imgurl,filename=imagename)except urllib.error.URLError as e:if hasattr(e, "code"):x+=1if hasattr(e, "reason"):x+=1x+=1for i in range(1,3):url = 'https://list.jd.com/list.html?cat=9987,653,655&page='+str(i)craw(url,i)

执行python test.py 结果

在对应的路径下爬取保存了相应图片，爬取成功

另一种网址的爬取方法代码类似

#!/usr/bin/env python
# -*- coding: utf-8 -*-import re
import urllib.requestdef craw(url,page):req=urllib.request.Request(url)req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36")html1=urllib.request.urlopen(req).read()html1=str(html1)#pat1 = '<div id="plist".+? <div class="page clearfix">';#result1 = re.compile(pat1).findall(html1);#result1 = result1[0];# pat1='<img width="220" height="220" data-img="1" src="//(.+?\.jpg)">'
#   pat2='<img width="220" height="220" data-img="1" data-lazy-img="//(.+?\.jpg)">'
#   imagelist=re.compile(pat).findall(html1)pat1='<img width="220" height="220" class="err-product" data-img="1" source-data-lazy-img="//(.+?\.jpg)" />'
#   pat2='<img width="220" height="220" class="" data-img="1" source-data-lazy-img="" data-lazy-img="done" src="//(.+?\.jpg)">'imagelist=re.compile(pat1).findall(html1)
#   imagelist2=re.compile(pat2).findall(html1)
#   imagelist=imagelist1+imagelist2x=1for imageurl in imagelist:imagename="/home/urllib/test/image/"+str(page)+str(x)+".jpg"imageurl="http://"+imageurlprint(imageurl)try:urllib.request.urlretrieve(imageurl,filename=imagename)except urllib.error.URLError as e:if hasattr(e,"code"):x+=1if hasattr(e,"reason"):x+=1x+=1#for i in range(1,3):
#   url="https://list.jd.com/list.html?cat=670,671,672&page="+str(i)+"&sort=sort_totalsales15_desc&trans=1&JL=6_0_0#J_main"
#   craw(url,i)for i in range(1,4):if (i%2) != 0:url="https://search.jd.com/Search?keyword=%E6%89%8B%E6%9C%BA&enc=utf-8&page="+str(i)            #搜索界面的网址，page是奇数craw(url,i)

转载于:https://blog.51cto.com/superleedo/2123315

python 使用爬虫下载京东图片相关推荐

python保存爬虫下载的图片和视频
本次是在django中完成静态文件目录拼接路径图片路径 image=os.path.join(STATICFILES_DIRS[0],"audio") # os.path.j ...
python实现爬虫下载美女图片
本文转自http://blog.csdn.net/hello_katty/article/details/46887937,所有权力归原作者所有. 本次爬取的贴吧是百度的美女吧,给广大男同胞们一些激励 ...
python网络爬虫_爬图片
python网络爬虫_爬图片 1.安装 Beautifulsoup4 #解析返回的html与json数据pip install Beautifulsoup4 使用 : 运行后输入要 ...
Python3.x爬虫下载网页图片
Python3.x爬虫下载网页图片一.选取网址进行爬虫本次我们选取pixabay图片网站 url=https://pixabay.com/ 二.选择图片右键选择查看元素来寻找图片链接的规则通过查 ...
Python 简单爬虫下载小说txt
Python 简单爬虫下载小说txt #第一次写爬虫代码欢迎交流指正我们范例爬取的对象是笔趣阁的<圣墟> (最近非常火的连载小说) ##为什么选择笔趣阁# 主要是因为笔趣阁的源代码没有 ...
Python实用案例，Python脚本，Python实现批量下载百度图片
往期回顾 Python实现自动监测Github项目并打开网页 Python实现文件自动归类 Python实现帮你选择双色球号码 Python实现每日更换"必应图片"为"桌 ...
如何用python搜索要用的素材_一篇文章教会你利用Python网络爬虫获取素材图片
[一.项目背景] 在素材网想找到合适图片需要一页一页往下翻,现在学会python就可以用程序把所有图片保存下来,慢慢挑选合适的图片. [二.项目目标] 1.根据给定的网址获取网页源代码. 2.利用正则 ...
python爬取素材图片代码_一篇文章教会你利用Python网络爬虫获取素材图片
[一.项目背景] 在素材网想找到合适图片需要一页一页往下翻,现在学会python就可以用程序把所有图片保存下来,慢慢挑选合适的图片. [二.项目目标] 1.根据给定的网址获取网页源代码. 2.利用正则 ...
手把手教你用Python网络爬虫获取壁纸图片
点击上方"Python爬虫与数据挖掘",进行关注回复"书籍"即可获赠Python从入门到进阶共10本电子书今日鸡汤桃之夭夭,灼灼其华. /1 前言/ ...

python 使用爬虫下载京东图片

python 使用爬虫下载京东图片相关推荐

最新文章

热门文章