python批量下载兰科植物网站的图片，并重命名文件

该程序为下载 http://www.orchidspecies.com/ 兰花网站图片，并以名字命名图片的小爬虫。

requests,chardet 第三方模块需要自己下载。

# -*- coding: utf-8 -*-
import re,os,requests,urllib2,chardet,time,sys
stdi,stdo,stde=sys.stdin,sys.stdout,sys.stderr
reload(sys)
sys.stdin,sys.stdout,sys.stderr=stdi,stdo,stde
sys.setdefaultencoding('utf-8')
#获取网页源代码(提取所需内容)
def get_content(url,reg):headers = {'User-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.48'}request = requests.get(url,timeout=20,headers = headers)content = request.textwant=reg.findall(content)return want#获取网页源代码（用于转码）-为了解决http://www.orchidspecies.com/indexcattleyo.htm乱码
def for_change(url,reg):headers={'User-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.48'}request=urllib2.Request(url,headers=headers)req=urllib2.urlopen(request,timeout=20)res=req.read()enc=chardet.detect(res)['encoding']print u'该网页使用'+enc+u'编码'content=res.decode(enc).encode('utf-8')want=reg.findall(content)return want#创建文件夹
def create_folder(path):if not os.path.exists(path):os.mkdir(path)#保存图片
def download_image(imageurl,imagename):data=requests.get(imageurl,timeout=20).contentwith open(imagename,'wb') as f:f.write(data)#写入记事本备份
def create_txt(txtname,data):with open(txtname,'a') as f:f.write(data)#下载每个种
def load_picture(everyurl,url,path,n):p3=Truex=1a3=re.compile(r'src="(.+?\.\w{3})"',re.I)#获取每个种的网址和名字if everyurl.find('">')!=-1:picurl=everyurl.split('">')[0]name=' '.join(everyurl.split('">')[1].strip().split())name=name.replace(' x ',u' × ').replace('<P>','').replace("?","").replace("!","")if name.find(u' × ')!=-1:name=name.split()[0]+' '+name.split()[1]+' '+name.split()[2]else:name=name.split()[0]+' '+name.split()[1]#创建种的文件夹if not os.path.exists(path+name):os.mkdir(path+name)print name#获取图片网址并下载while p3:    try:u4=get_content(url+'/'+picurl,a3)p3=Falsefor u5 in u4:p4=Trueif u5 not in('orphotdir/scent.jpg','orphotdir/deepshade.jpg','orphotdir/partialshade.jpg','orphotdir/partialsun.jpg','orphotdir/sun.jpg','orphotdir/tempcold.jpg','orphotdir/tempcool.jpg','orphotdir/tempint.jpg','orphotdir/temphot.jpg','orphotdir/spring.jpg','orphotdir/summer.jpg','orphotdir/fall.jpg','orphotdir/winter.jpg'):while p4:try:imageurl=url+'/'+u5imagename=path+name+"\\%s %s-%s.jpg" % (name,str(n),str(x))download_image(imageurl,imagename)print str(n)+'-'+str(x)x+=1p4=Falseexcept:print str(n)+'-'+str(x)+' is not download,please wait 10 second!'time.sleep(10)p3=Falseexcept:txtname=u'出错.txt'data=url+'/'+picurl+'    '+name+'    '+time.strftime('%Y-%m-%d %X', time.localtime())+'\n'with open(txtname,'a') as f:f.write(data)print u'第'+str(n)+u'个种网页获取失败，请稍候10秒'time.sleep(10)if __name__ == '__main__':path='D:\\orchid_only\\'create_folder(path)n=0 #计数alll=[] #存放所有种网址#提取一级网址url="http://www.orchidspecies.com"a1=re.compile(r'SIZE=2><A href="(index\w.+?)">',re.I)p1=Trueprint urlwhile p1:try:u1=content1=get_content(url,a1)u1=list(set(u1))print u'获取一级网址成功,开始提取二级网址'p1=Falseexcept:print u'获取一级网址失败,10秒后重新连接'time.sleep(10)#提取二级网址a2=re.compile(r'<P><LI><a href="(.+?)</A>',re.I)for u2 in u1:u2=url+'/'+u2p2=Trueprint u2while p2:try:u3=get_content(u2,a2)print len(u3)if len(u3)==0:u3=for_change(u2,a2)print len(u3)alll.extend(u3)print u'获取二级网址成功,存放成功'p2=Falseexcept:print u'获取二级网址失败,10秒后重新连接'time.sleep(10)#提取每个种的图片            for everyurl in alll:n+=1print u'正在下载第'+str(n)+u'个种'load_picture(everyurl,url,path,n)
print 'over,共下载兰花'+str(n)+'种'

python批量下载兰科植物网站的图片，并重命名文件相关推荐

Python批量下载excel表中超链接图片
目录背景数据格式处理步骤 1.使用xlrd读取excel表格数据 2.详细代码 ①引入相关库 ②实例代码背景导出数据到excel,数据中有图片,需求是批量下载图片的指定文件夹数据格式数据 ...
【爬虫】批量下载某壁纸网站的图片
这个网站有些飞机很漂亮,一个一个下太慢了,就练了一把 jsoup https://10wallpaper.com 翠花,上酸菜 ! 错了上代码 package net.downPic.downPi ...
python批量下载色影无忌和蜂鸟的图片爬虫小应用
有些冗余信息,因为之前测试正则表达式,所以没有把它们给移走,不过不影响使用. # -*- coding:utf-8 -*- import re,urllib,sys,os,timedef getAll ...
python下载网页里面所有的图片-Python批量下载网页图片详细教程
很多朋友在网上查找批量下载图片的方法~发觉挺凌乱的,无从下手.这里绿茶小编就来跟大家分享下使用Python批量下载图片方法. 目标:爬取某个网站上n多页的链接,每个链接有n多张图片,每一页对应一个文件 ...
如何使用Python批量下载图片
爬虫程序在采集网页的过程中,需要从网上下载一些图片,比如表情包.壁纸.素材等,如果图片的数量很多,这样做就会非常麻烦和耗时.那么有没有更方便快捷的方法呢?答案是肯定的.我们可以利用Python编程语言 ...
教你怎么使用python批量下载图片
教你怎么使用python批量下载图片文章目录教你怎么使用python批量下载图片前言一.运行环境 1. win10 2. python==3.7.2 二.需要用到的参数 1. download ...
python批量下载模库网图片
这里写自定义目录标题 python批量下载模库网图片步骤: 代码 python批量下载模库网图片步骤: 获取页数获取列表页获取图片链接和名字相关字典创建存放图片的文件夹下载图片代码 im ...
如何使用python批量下载-用Python调用迅雷实现后台批量下载
迅雷9.10实测可用,老版本迅雷应该也可以用,不推荐迅雷极速版写在前面最近在学习python的网络爬虫,想要爬取某个网站上的上百万条链接,批量下载链接所指向的图片.文档.视频等内容.大部分的小文件 ...
python批量下载必应每日壁纸
文章目录 python批量下载必应每日壁纸一.图片来源选择二.python实现 python批量下载必应每日壁纸必应搜索的每日背景壁纸都是高质量的图片,下载来当桌面壁纸再好不过了,微软官方也推出 ...

python批量下载兰科植物网站的图片，并重命名文件

python批量下载兰科植物网站的图片，并重命名文件相关推荐

最新文章

热门文章