python爬duitang的摄影类图片

这几天闲着没事，写了个python爬虫，专把堆糖上的摄影类图片爬下来。

废话不多说，直接上代码，不用解释应该也能看懂。

#coding: utf-8
# 抓堆糖摄影图片
from html.parser import HTMLParser
import urllib.request
import string
import queue
from datetime import datetime
import osqueue_url = queue.Queue()
site = 'http://www.duitang.com'
savePath = '/home/michael/Pictures/' # 图片保存路径前缀# logs
url_log = 'urls.log'
img_log = 'imgs.log'
err_log = 'errors.log'action = ("connecting", "downloading", "parsing")class MyHTMLParser(HTMLParser): #follow指定是否往下搜索def __init__(self, strict, follow=True):super(MyHTMLParser, self).__init__()self.follow = followdef handle_starttag(self, tag, attrs):if tag.__eq__("img"):# imgurl = [x[1] for x in attrs if x[0].__eq__('src')][0]imgurl = Nonewidth = 200height = 200for x in attrs:if x[0].__eq__('src'):imgurl = x[1]elif x[0].__eq__('width'):width = x[1]print('width=%s' % width)elif x[0].__eq__('height'):height = x[1]print('height=%s' % height)if imgurl and float(width)>300 and float(height)>300:print(imgurl)r = imgurl.rfind("/")#下载图片到本地urllib.request.urlretrieve(imgurl, '%s%s' %(savePath, imgurl[r:]))# 写入日志img_file.write("%s\t%s\t%s\t%s\t%s\n" %(datetime.now(), current_url,\imgurl[r:], width, height))if tag.__eq__("a") and self.follow:href = [x[1] for x in attrs if x[0].__eq__('href')]if href:if href[0].startswith("/people/mblog/"): #取出大图get_img_in_url("%s%s" %(site, href[0]))elif href[0].startswith("/category/photography/"):url = href[0]url = '%s%s' % (site, url)queue_url.put(url)def handle_endtag(self, tag):# print("Encountered an end tag :", tag)passdef handle_data(self, data):# print("Encountered some data  :", data)passua = {'User-agent': 'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11'
}def get_html(url_address):'''open url and read it'''try:url_file.write("%s\t%s\t%s\t" % (datetime.now(), url_address, action[2]))req = urllib.request.Request(url_address, headers=ua)f = urllib.request.urlopen(req)html = urllib.request.urlopen(req).read().decode('utf-8')url_file.write("%s\n" % ("YES"))return htmlexcept Exception:url_file.write("%s\n" % ("NO"))def get_img_in_url(url_address):html = get_html(url_address)if html:p = MyHTMLParser(strict=False, follow=False)p.feed(html)parser = MyHTMLParser(strict=False)
url = "http://www.duitang.com/category/photography/"
queue_url.put(url)# 创建日志文件
if os.path.isfile(url_log):url_file = open(url_log, 'a+')
else:url_file = open(url_log, 'w+')url_file.write("%s\t%s\t%s\t%s\n" %("time", "url", "action", "success"))if os.path.isfile(img_log):img_file = open(img_log, 'a+')
else:img_file = open(img_log, 'w+')img_file.write("%s\t%s\t%s\t%s\t%s\n" %("time", "url", "name", "width", "height"))current_url = None
while(not queue_url.empty()):current_url = queue_url.get()html = get_html(current_url)if html:parser.feed(html)

python爬duitang的摄影类图片相关推荐

利用python爬取qq个性网图片
利用python爬取qq个性网图片网站头像布局大同小异,稍改代码即可爬取想要的头像. 不多bb,上代码. import requests from parsel import Selector im ...
python爬取图片-Python爬取网页中的图片（搜狗图片）详解
前言最近几天,研究了一下一直很好奇的爬虫算法.这里写一下最近几天的点点心得.下面进入正文: 你可能需要的工作环境: Python 3.6官网下载本地下载我们这里以sogou作为爬取的对象. 首先 ...
Python 爬取陈都灵百度图片
Python 爬取陈都灵百度图片标签(空格分隔): 随笔今天意外发现了自己以前写的一篇爬虫脚本,爬取的是我的女神陈都灵,尝试运行了一下发现居然还能用.故把脚本贴出来分享一下. import req ...
用Python爬取彼岸图网图片
用Python爬取彼岸图网图片 *使用了四个模块 import time import requests from lxml import etree import os 没有的话自行百度安装. ...
python如何爬取网页视频_快就完事了！10分钟用python爬取网站视频和图片
原标题:快就完事了!10分钟用python爬取网站视频和图片话不多说,直接开讲!教你如何用Python爬虫爬取各大网站视频和图片. 638855753 网站分析: 我们点视频按钮,可以看到的链接是: ...
Python爬取瀑布流百度图片
Python爬去瀑布流百度图片 import requests from bs4 import BeautifulSoup import re from urllib.parse import url ...
Python爬取国家地理杂志的图片
一.简介:Python爬取国家地理杂志的图片二.代码展示 from bs4 import BeautifulSoup import requests import osos.mkdir('./img ...
Python爬取 | 唯美女生图片
这里只是代码展示,且复制后不能直接运行,需要配置一些设置才行,具体请查看下方链接介绍: Python爬取 | 唯美女生图片 from selenium import webdriver from fa ...
python爬取帖吧图片实验报告,Python爬取百度贴吧图片
原标题:Python爬取百度贴吧图片作者:MTbaby 来源:http://blog.csdn.net/mtbaby/article/details/70209729 描述:用Python爬去百度贴 ...
python爬取街拍美女图片
python爬取街拍美女图片完整代码: import requests from urllib.parse import urlencode import os from hashlib impor ...

python爬duitang的摄影类图片

python爬duitang的摄影类图片相关推荐

最新文章

热门文章