python scrapy 爬取妹子图的照片

主要描述在windows 上如何用scrapy抓取煎蛋网妹子图所有的图片下载。
软件准备：winpython,啥都不说了，ipython很好用。

安装scrapy,进入winpython 执行scrapt\env.bat。这样就可以直接pip install scrapy；注意执行的log，好像还要安装一个service_identity,具体什么功能也不清楚。以后慢慢研究。
建立工程scrapy startproject myscrapy

建立spider scrapy genspider spider(需要进入myscrapy)

使用ImagesPipeLine,在settings.py中添加如下文件

ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline':1}#使用imagespipeline

IMAGES_STORE = 'E:\download'#下载的路径

还可以过滤图片大小什么的，暂时用不到。

scrapy 提供imagespipe 提供图片下载功能，只要提供urls就行了。

定义item类

class xxxItem(scrapy.Item):

image_urls = scrapy.Field()

imges = scrapy.Field()

spider.py文件

class JiandanSpider(scrapy.Spider):

name = "jandan"

allowed_domains = ["jandan.net"]

start_urls = (

'http://jandan.net/ooxx',

)

def parse(self, response):

t = response.xpath('//div[1]/div/div[2]/p/img')

img_urls = t.xpath('@src').extract()

newItem = BlueItem(image_urls = img_urls)

yield newItem

#extract 每个图片的src，直接丢给ImagesPipeline处理。

sel_next_url = response.xpath('//div[2]/div/a')

for item in sel_next_url:

classname = item.xpath('@class').extract()

if len(classname) > 0:

if "previous-comment" in classname[0]:

urls = item.xpath('@href').extract()

for url in urls:

if "comments" in url:

print "-->",url,"<--"

yield scrapy.Request(url, callback = self.parse)

#提取下一个网页，重新发起请求。

坐等大巴大巴的妹子图片

python scrapy 爬取妹子图的照片相关推荐

用scrapy爬取妹子图网的图片，附上源代码
实现这个是因为之前在谋个公众号里面看到一篇文章,关注了也拿不到源代码 ,所以就自己写了一个爬取这个网站图片的功能.个人觉得这个网站的图片就一般吧. 开始环境,py3, win, linux下运行都是 ...
python多线程爬取妹子图网站_python爬取妹子图全站全部图片-可自行添加-线程-进程爬取，图片去重...
from bs4 import BeautifulSoup import sys,os,requests,pymongo,time from lxml import etree def get_fen ...
Scrapy爬取妹子图保存到不同目录下
进行设置settings #启动图片管道 ITEM_PIPELINES = {'mztu.pipelines.ImagesPipelinse': 300, } #设置默认目录地址注意下载图片的话默认 ...
使用python requests 爬取妹子图网站图片
import requests import os import re# 封面图http://mm.chinasareview.com/wp-content/uploads/2017a/07/04/l ...
python爬虫爬妹子图_【爬虫】直接上干货-爬取妹子图整站图片
该楼层疑似违规已被系统折叠隐藏此楼查看此楼 #coding=utf-8 import os import requests from lxml import etree import time cl ...
python爬取妹子图片1_【爬虫】直接上干货-爬取妹子图整站图片
该楼层疑似违规已被系统折叠隐藏此楼查看此楼 #coding=utf-8 import os import requests from lxml import etree import time cl ...
Python scrapy爬取京东，百度百科出现乱码，解决方案
Python scrapy爬取京东百度百科出现乱码解决方案十分想念顺店杂可... 抓取百度百科,出现乱码把页面源码下载下来之后,发现全是乱码,浏览器打开但是浏览器链接打开就没有乱码以下是浏 ...
python多线程爬取斗图啦数据
python多线程爬取斗图啦网的表情数据使用到的技术点 requests请求库 re 正则表达式 pyquery解析库,python实现的jquery threading 线程 queue 队列 ' ...
scrapy爬取斗图表情
用scrapy爬取斗图表情,其实呀,我是运用别人的博客写的,里面的东西改了改就好了,推存链接" http://www.cnblogs.com/jiaoyu121/p/6992587.html ...

python scrapy 爬取妹子图的照片

python scrapy 爬取妹子图的照片相关推荐

最新文章

热门文章