py3+urllib+re，爬虫下载捧腹网图片

实现原理及思路请参考我的另外几篇爬虫实践博客

py3+urllib+bs4+反爬，20+行代码教你爬取豆瓣妹子图：http://www.cnblogs.com/UncleYong/p/6892688.html
py3+requests+json+xlwt，爬取拉勾招聘信息：http://www.cnblogs.com/UncleYong/p/6960044.html
py3+urllib+re，轻轻松松爬取双色球最近100期中奖号码：http://www.cnblogs.com/UncleYong/p/6958242.html

实现代码如下：

import urllib.request, re# 获取网页源码
def page(pg):url = 'https://www.pengfu.com/index_%s.html'%pg# 页面是utf8编码，所有解码成unicodehtml = urllib.request.urlopen(url).read().decode('utf8') # <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /># print(html)return html# 获取标题
def title(html):reg = re.compile(r'<h1 class="dp-b"><a href=".*?" target="_blank">(.*?)</a>') # r表示防止转义item = re.findall(reg, html)# print(item)return item# 获取图片url
def content(html):# html = page(1)reg = r'<img src="(.*?)" width='item = re.findall(reg, html)# print(item)return itemdef download(url, name):path = 'image\%s.jpg'%name#.decode('utf-8').encode('gbk') # win下只识别gbkurllib.request.urlretrieve(url, path)for i in range(5,9):html = page(i)title_list = title(html)content_list = content(html)for m, n in zip(title_list, content_list): # 把标题和图片对个对应print('正在下载>>>>>：' + m, n)download(n, m)

转载于:https://www.cnblogs.com/uncleyong/p/6973887.html

py3+urllib+re，爬虫下载捧腹网图片相关推荐

Golang实现并发版网络爬虫：捧腹网段子爬取并保存文件
爬取捧腹网段子 url分页分析 https://www.pengfu.com/xiaohua_1.html 1 下一页+1 https://www.pengfu.com/xiaohua_2.html ...
python3制作捧腹网段子页爬虫
0x01 春节闲着没事(是有多闲),就写了个简单的程序,来爬点笑话看,顺带记录下写程序的过程.第一次接触爬虫是看了这么一个帖子,一个逗逼,爬取煎蛋网上妹子的照片,简直不要太方便.于是乎就自己照猫画虎, ...
Go语言段子爬虫--捧腹网
最后我们来进行一次网络段子的爬虫,爬取捧腹网的段子数据 1.爬取网页的段子链接: 程序代码: package mainimport ("fmt""net/http&quo ...
python爬虫之爬取捧腹网段子
原文链接:http://www.nicemxp.com/articles/12 背景:抓取捧腹网首页的段子和搞笑图片链接如图: 地址:https://www.pengfu.com/ 首页中有很多子页 ...
python爬取捧腹网gif图片
#_*_coding:utf-8_*_ #爬取捧腹网GIF图片 import urllib,re import urllib.request import chardet #需要导入这个模块,检测编码 ...
Android实战：手把手实现“捧腹网”APP（三）-----UI实现，逻辑实现
APP页面实现根据原型图,我们可以看出,UI分为两部分,底部Tab导航+上方列表显示. 所以此处,我们通过 FragmentTabHost+Fragment,来实现底部的导航页面,通过Recycle ...
Android实战：手把手实现“捧腹网”APP（二）-----捧腹APP原型设计、实现框架选取
APP原型设计在APP的开发过程中,原型设计是必不可少的.用户界面原型必须在先启阶段的初期或在精化阶段一开始建立.整个系统(包括它的"实际"用户界面)的分析.设计和实施必须在原型 ...
java爬取捧腹网段子
先上效果图: 准备工作: /*** 建立http连接*/ public static String Connect(String address) {HttpURLConnection conn = ...
golang实现捧腹网爬取笑话
爬虫的步骤见:here 以下golang代码实现对捧腹网笑话的爬取,并保存到本地的joy文件夹(程序会自行创建)内 package mainimport ("fmt""n ...

py3+urllib+re，爬虫下载捧腹网图片

py3+urllib+re，爬虫下载捧腹网图片相关推荐

最新文章

热门文章