07_python爬虫内容以及介绍
有时候看到一些喜欢的动图,如果一个个取保存挺麻烦,有的网站还不支持右键保存,因此使用Python来获取动态图,就看看就很有意思了
本次爬取的网站是 居然搞笑网
思路:
获取当前页面内容
查找页面中动图所代表的url地址
保存这个地址内容到本地
如果想爬取多页,就可以加上一个循环条件
代码:
- #!/usr/bin/python
- #coding:utf-8
- import urllib2,time,uuid,urllib,os,sys,re
- from bs4 import BeautifulSoup
- reload(sys)
- sys.setdefaultencoding('utf-8')
- #获取页面内容
- def getHtml(url):
- try:
- print url
- html = urllib2.urlopen(url).read()#.decode('utf-8')#解码为utf-8
- except:
- return
- return html
- #获取动图所代表的url列表
- def getImagUrl(html):
- if not html:
- print 'nothing can be found'
- return
- ImagUrlList=[]
- soup=BeautifulSoup(html,'lxml')
- #获取item列表
- items=soup.find("div",{"class":"main"}).find_all('div',{'class':'item'})
- for item in items:
- target={}
- #通过if语句,过滤广告项
- if item.find('div',{"class":"text"}):
- #获取url
- imgurl=item.find('div',{"class":"text"}).find('img').get('src')
- target['url']=imgurl
- #获取名字
- target['name']=item.find('h3').text
- ImagUrlList.append(target)
- return ImagUrlList
- #下载图片到本地
- def download(author,imgurl,typename,pageNo):
- #定义文件夹的名字
- x = time.localtime(time.time())
- foldername = str(x.__getattribute__("tm_year"))+"-"+str(x.__getattribute__("tm_mon"))+"-"+str(x.__getattribute__("tm_mday"))
- download_img=None
- picpath = 'Jimy/%s/%s/%s' % (foldername,typename,str(pageNo))
- filename = author+str(uuid.uuid1())
- pic_type=imgurl[-3:]
- if not os.path.exists(picpath):
- os.makedirs(picpath)
- target = picpath+"/%s.%s" % (filename,pic_type)
- print "动图存贮位置:"+target
- download_img = urllib.urlretrieve(imgurl, target)#将图片下载到指定路径中
- print "图片出处为:"+imgurl
- return download_img
- #退出函数
- def myquit():
- print "Bye Bye!"
- exit(0)
- def start(pageNo):
- targeturl="http://www.zbjuran.com/dongtai/list_4_%s.html" % str(pageNo)
- html = getHtml(targeturl)
- urllist=getImagUrl(html)
- for imgurl in urllist:
- download(imgurl['name'],imgurl['url'],'搞笑动图',pageNo)
- if __name__ == '__main__':
- print '''''
- *****************************************
- ** Welcome to Spider of GIF **
- ** Created on 2017-3-16 **
- ** @author: Jimy **
- *****************************************'''
- pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\
- 请输入要爬取的页面,范围为(1-100),如果退出,请输入Q>\n>")
- while not pageNo.isdigit() or int(pageNo) > 50 or int(pageNo) < 1:
- if pageNo == 'Q':
- myquit()
- print "Param is invalid , please try again."
- pageNo = raw_input("Input the page number you want to scratch >")
- print pageNo
- start(pageNo)
- #第一次爬取结束
- pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\
- 请输入总共需要爬取的页面,范围为(1-5000),如果退出,请输入Q>\n>")
- while not pageNo.isdigit() or int(pageNo) > 5000 or int(pageNo) < 1:
- if pageNo == 'Q':
- myquit()
- print "Param is invalid , please try again."
- pageNo = raw_input("Input the page number you want to scratch >")
- #循环遍历,爬取多页
- for num in xrange(int(pageNo)):
- start(str(num+1))
结果如下:
- *****************************************
- ** Welcome to Spider of GIF **
- ** Created on 2017-3-16 **
- ** @author: Jimy **
- *****************************************
- Input the page number you want to scratch (1-50),please input 'quit' if you want to quit
- 请输入要爬取的页面,范围为(1-100),如果退出,请输入Q>
- >1
- 1
- http://www.zbjuran.com/dongtai/list_4_1.html
- 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/真是艰难的选择。3f0fe8f6-09f8-11e7-9161-f8bc12753d1e.gif
- 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135ZHJ.gif
- 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/这么贱会被打死吧……3fa9da88-09f8-11e7-9161-f8bc12753d1e.gif
- 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135H35U.gif
- 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/一看就是印度……4064e60c-09f8-11e7-9161-f8bc12753d1e.gif
- 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F20613543c50.gif
- 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/新垣结衣的正经工作脸414b4f52-09f8-11e7-9161-f8bc12753d1e.gif
- 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135250553.gif
- 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/妹子这是在摇什么的421afa86-09f8-11e7-9161-f8bc12753d1e.gif
- 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F20613493N03.gif
- Input the page number you want to scratch (1-50),please input 'quit' if you want to quit
- 请输入总共需要爬取的页面,范围为(1-5000),如果退出,请输入Q>
- >Q
- Bye Bye!
最终就能够获得动态图了
(完)
http://shishan185225.blog.sohu.com/
http://qinya19669.blog.sohu.com/
http://ludu37164.blog.sohu.com/
http://cidui6683106.blog.sohu.com/
http://zhile1889.blog.sohu.com/
http://duixingjietao.blog.sohu.com/
http://jiaoqian9le.blog.sohu.com/
http://baodi198934.blog.sohu.com/
http://zhitang086707.blog.sohu.com/
http://ranbei3311383.blog.sohu.com/
http://taocheng24066.blog.sohu.com/
http://zhibo3008766.blog.sohu.com/
http://chengwei85808.blog.sohu.com/
http://qianzhongzhengz.blog.sohu.com/
http://shuoke0832234.blog.sohu.com/
http://jionglianlu.blog.sohu.com/
http://jubeituituanchu.blog.sohu.com/
http://pa43817424.blog.sohu.com/
http://weidou5877103.blog.sohu.com/
http://fanfan0812322.blog.sohu.com/
http://kangchuicheng.blog.sohu.com/
http://zhanlu58501373.blog.sohu.com/
http://pingxie001053.blog.sohu.com/
http://tuoyou126126.blog.sohu.com/
http://jing48741512.blog.sohu.com/
http://feichuang7dao.blog.sohu.com/
http://anji3407046.blog.sohu.com/
http://daowei874.blog.sohu.com/
http://fenyong684039.blog.sohu.com/
http://sidi995699.blog.sohu.com/
http://yuandong3ci.blog.sohu.com/
http://shangxian059.blog.sohu.com/
http://ke77585150.blog.sohu.com/
http://nazhaoweipuzhi.blog.sohu.com/
http://daohe403202.blog.sohu.com/
http://zhuozhao8207.blog.sohu.com/
http://xiecong332111.blog.sohu.com/
http://huanmi672099.blog.sohu.com/
http://tangmeng6bi.blog.sohu.com/
http://daopou46564892.blog.sohu.com/
http://qingxian078650.blog.sohu.com/
http://zhaoshi723013.blog.sohu.com/
http://yunzhong006.blog.sohu.com/
http://bi61996350.blog.sohu.com/
http://huansu7942.blog.sohu.com/
http://bei72786133.blog.sohu.com/
http://yaguaya675708.blog.sohu.com/
http://pang11614239.blog.sohu.com/
http://bianyan954804.blog.sohu.com/
http://panghan940387.blog.sohu.com/
http://shiao5606.blog.sohu.com/
http://shiao5606.blog.sohu.com/
http://chejie69069832.blog.sohu.com/
http://xiandoupulao.blog.sohu.com/
http://xiandoupulao.blog.sohu.com/
http://lachui8718.blog.sohu.com/
http://yingyou3658947.blog.sohu.com/
http://jingbu111.blog.sohu.com/
http://mu32992727.blog.sohu.com/
http://quezhuoliangmei.blog.sohu.com/
http://qunba226790.blog.sohu.com/
http://toujiaomuguaish.blog.sohu.com/
http://jingoudi011104.blog.sohu.com/
http://yifei824161.blog.sohu.com/
http://muluyi377214.blog.sohu.com/
http://shanhaiouqiangy.blog.sohu.com/
http://yeshao1534907.blog.sohu.com/
http://laoou56881.blog.sohu.com/
http://guagu31016063.blog.sohu.com/
http://dehuang6yan.blog.sohu.com/
http://dong48649188.blog.sohu.com/
http://simenluhuituo.blog.sohu.com/
http://ciye167785.blog.sohu.com/
http://dixian54057.blog.sohu.com/
http://gougou9643826.blog.sohu.com/
http://youyou094129.blog.sohu.com/
http://tudong06095818.blog.sohu.com/
http://dongguan4148944.blog.sohu.com/
http://yichao1566.blog.sohu.com/
http://yutuan0237580.blog.sohu.com/
http://taokuangzai.blog.sohu.com/
http://guaitan0212.blog.sohu.com/
http://beimi209301.blog.sohu.com/
http://xianqin9749060.blog.sohu.com/
http://naliang92411.blog.sohu.com/
http://dutong7814288.blog.sohu.com/
http://zhansongxiantao.blog.sohu.com/
http://blog.sohu.com/home/news/index.htm
http://yetui2zhui.blog.sohu.com/
http://bitao10890135.blog.sohu.com/
http://chen71022738.blog.sohu.com/
http://yao06394935.blog.sohu.com/
http://sijia7948.blog.sohu.com/
http://fufeilurezhi.blog.sohu.com/
http://jiacai115618.blog.sohu.com/
http://qiaolin360596.blog.sohu.com/
http://purangzijue.blog.sohu.com/
http://xiayuan7838.blog.sohu.com/
http://paoping24185.blog.sohu.com/
http://chixin1duan.blog.sohu.com/
http://shishixi198833.blog.sohu.com/
http://beishan4664475.blog.sohu.com/
http://bengkerongbi.blog.sohu.com/
http://nuozhongguba.blog.sohu.com/
http://polei1282473.blog.sohu.com/
http://badihangongcang.blog.sohu.com/
http://lukong091985.blog.sohu.com/
http://yinei051529.blog.sohu.com/
http://yanyaoouhaoxia.blog.sohu.com/
http://dihaohezhaoya.blog.sohu.com/
http://shihuang916116.blog.sohu.com/
http://bizhi3693465.blog.sohu.com/
http://congmenbo.blog.sohu.com/
http://zhunxunqian.blog.sohu.com/
http://qiaopi93391332.blog.sohu.com/
http://zhanyong3783.blog.sohu.com/
http://dutuan450487.blog.sohu.com/
http://wuzhe9288558.blog.sohu.com/
http://tuoxinshuoyongf.blog.sohu.com/
http://laolian9843251.blog.sohu.com/
http://kebizong213477.blog.sohu.com/
http://queshi1835145.blog.sohu.com/
http://xingzhi5615.blog.sohu.com/
http://yaoyan5ren.blog.sohu.com/
http://liangxianbenluk.blog.sohu.com/
http://poguizhangshaqi.blog.sohu.com/
http://kanzhuioulianzh.blog.sohu.com/
http://jikeshaoshaolia.blog.sohu.com/
http://xianyong0702250.blog.sohu.com/
http://lanzai4327556.blog.sohu.com/
http://julu0684924.blog.sohu.com/
http://meirao73747901.blog.sohu.com/
http://guzhi656608037.blog.sohu.com/
http://yipo67938.blog.sohu.com/
http://cangyin50156.blog.sohu.com/
http://yonggua0798633.blog.sohu.com/
http://paowei5810021.blog.sohu.com/
http://jiyou1688687.blog.sohu.com/
http://butuoxunjique.blog.sohu.com/
http://bisi6892994.blog.sohu.com/
http://zijing51708541.blog.sohu.com/
http://jiaochi7715.blog.sohu.com/
http://fuyou95335.blog.sohu.com/
http://lumei370594.blog.sohu.com/
http://hetuijionghuaic.blog.sohu.com/
http://yexia596359.blog.sohu.com/
http://panchengdouliao.blog.sohu.com/
http://aoju6793521230.blog.sohu.com/
http://xinggai755240.blog.sohu.com/
http://bupu2037212.blog.sohu.com/
http://gaizhui759027.blog.sohu.com/
http://dugai1517609.blog.sohu.com/
http://jiaowei5091.blog.sohu.com/
http://guayao3gou.blog.sohu.com/
http://yajiao73215411.blog.sohu.com/
http://pangcang7115.blog.sohu.com/
http://guxie1754047.blog.sohu.com/
http://yimei58853774.blog.sohu.com/
http://jichun5109804.blog.sohu.com/
http://yugou676384.blog.sohu.com/
http://jiaocong99078.blog.sohu.com/
http://qiangjiao199598.blog.sohu.com/
http://miyi54123243.blog.sohu.com/
http://luxian4996595.blog.sohu.com/
http://sheliao3095242.blog.sohu.com/
http://fucan201728037.blog.sohu.com/
http://taopuzhi045254.blog.sohu.com/
http://anmeng7739962.blog.sohu.com/
http://xiejiu792979.blog.sohu.com/
http://yunxiong2055.blog.sohu.com/
http://chaosi0515.blog.sohu.com/
http://congcailuoqiya.blog.sohu.com/
http://panzi1583746.blog.sohu.com/
http://pabi161252138.blog.sohu.com/
http://beipu2fu406685.blog.sohu.com/
http://mengluan835423.blog.sohu.com/
http://puchuang9059747.blog.sohu.com/
http://chunhefangdi.blog.sohu.com/
http://fengcongba.blog.sohu.com/
http://feixie844735.blog.sohu.com/
http://zhuozhong48472.blog.sohu.com/
http://xihuang08455.blog.sohu.com/
http://anzhuo368995.blog.sohu.com/
http://yazi26544845.blog.sohu.com/
http://tuoye152356.blog.sohu.com/
http://fanggua313.blog.sohu.com/
http://shensha322414.blog.sohu.com/
http://beianju589206.blog.sohu.com/
07_python爬虫内容以及介绍相关推荐
- 【0基础学爬虫】爬虫基础之爬虫的基本介绍
大数据时代,各行各业对数据采集的需求日益增多,网络爬虫的运用也更为广泛,越来越多的人开始学习网络爬虫这项技术,本期为爬虫的基本介绍. 分享一些自己的爬虫项目,学习爬虫一些经验很不错 基于python实 ...
- mysql5.0入门_MySQL 5.0基础的基本内容的介绍
下面的内容主要是对MySQL 5.0基础的基本内容的介绍.同时本文也列举了MySQL 5.0中的实际应用代码,望你在浏览之后会对MySQL 5.0基础的相关实际内容有所了解,以下就是文章的具体内容描述 ...
- 绝地求生国际服服务器维护到几点,绝地求生更新维护到今天几点?更新内容详细介绍...
绝地求生更新维护到今天几点?更新内容详细介绍 2021-04-14 09:09:38 绝地求生在4月14日的早上八点进行停机维护,此次的维护将会持续八个小时,也就是将会在当天下午四点半左右结束维护,此 ...
- Python使用os.listdir()函数来得目录内容的介绍
转载:http://www.cnblogs.com/100thMountain/p/4769238.html Python使用os.listdir()函数来得目录内容的介绍 Python编程语言是计算 ...
- textfield获取其中内容_冲压工艺流程,常见冲压缺陷及消除方法,46页内容全面介绍冲压...
冲压工艺概述 冲压是靠压力机和模具对板材.带材.管材和型材等施加外力,使之产生塑性变形或分离,从而获得所需形状和尺寸的工件(冲压件)的成形加工方法. 冲压主要是按工艺分类,可分为分离工序和成形工序两大 ...
- 魔兽世界 8.1 服务器维护时间,魔兽世界8.1内容开放时间表介绍 8.1开放内容汇总介绍...
魔兽世界8.1内容开放时间表介绍 8.1开放内容汇总介绍 2018-11-26 09:03:34来源:NGA/ 二萌alice编辑:苦力趴评论(0) <魔兽世界>8.1的内容将在12.11 ...
- 猫哥教你写爬虫 027--模块介绍
time模块 import time # 时间对象转美式时间字符串 print(time.asctime()) # Wed May 29 09:25:07 2019 print(time.asctim ...
- 分布式网络爬虫框架Cola介绍
分布式网络爬虫框架Cola介绍 这个分布式网络爬虫框架设计思想来源于: https://github.com/chineking/cola/wiki 下面给出框架设计图: ...
- python爬取大众点评数据_python爬虫实例详细介绍之爬取大众点评的数据
python 爬虫实例详细介绍之爬取大众点评的数据 一. Python作为一种语法简洁.面向对象的解释性语言,其便捷性.容易上手性受到众多程序员的青睐,基于python的包也越来越多,使得python ...
最新文章
- Github 的使用
- python编程从入门到精通 叶维忠 pdf-零基础如何学习python?十本精品python书籍推荐...
- 多款eclipse黑色坏境任你选择,只要导入配置
- 用 Java 写一个植物大战僵尸简易版!
- 头条上python广告_满大街都是Python广告,真的如广告说的那样方便吗?,学精通后真能月如过万吗?业内大佬怎么看?...
- 31天重构学习笔记下载
- 如何向K8s,Docker-Compose注入镜像Tag
- vuex webpack 配置_vue+webpack切换环境和打包之后服务器配置
- stanford-parser for C#
- 【leetcode】416. Partition Equal Subset Sum
- C# Oracle.DataAccess.dll 版本错误链接不上数据库
- IPv4(分类编址)
- ICLR 2022 语言模型驱动的语义分割算法:Language-Driven Semantic Segmentation
- android底部蒙版,Android实现蒙板效果
- linux洪水攻击软件,Linux遭受SYN洪水攻击设置
- 学习少儿编程成就不平凡人生
- 详版大数据报告_如何制作大数据报告
- 电子制造业生产车间物料怎么管?方法有哪些
- java代码实现身份证第18位的计算和验证身份证号码是否是真实有效.
- Burp Suite工具详解