Scrapy爬取妹子图保存到不同目录下

进行设置settings

#启动图片管道
ITEM_PIPELINES = {'mztu.pipelines.ImagesPipelinse': 300,
}
#设置默认目录地址  注意下载图片的话默认地址必须设置！！！
IMAGES_STORE = "E:\study\Python\scrapy\mztu\imges"
#设置图片通道失效时间
IMAGES_EXPIRES =90
#缩略图生成
#IMAGES_THUMBS = {#   'small': (50, 50),
#    'big': (270, 270),
#}

spider目录

# -*- coding: utf-8 -*-
import scrapy
from mztu.items import MztuItemclass ZimdgSpider(scrapy.Spider):name = 'zimdg'allowed_domains = ['mzitu.com']#生成链接列表start_urls = ['http://www.mzitu.com/xinggan/page/{}/'.format(str(x)) for x in range(118)]def parse(self, response):#解析出链接set_li = response.xpath("//div[@class='postlist']/ul/li")for ecth in set_li:ed = ecth.xpath('./a/@href').extract()#进行二次分类解析yield scrapy.Request(ed[0],callback=self.parse_item)def parse_item(self,response):itme = MztuItem()# 获取页数链接进行访问offset = int(response.xpath('//div[@class="pagenavi"]/a/span/text()')[4].extract())#生成链接访问#遍历链接访问for i in [response.url+"/{}".format(str(x))  for x in range(1,offset+1)]:itme['Referer']=i#将meta传入链接yield scrapy.Request(itme['Referer'],meta={'meta_1':itme}, callback=self.parse_ponse)# for i in url:def parse_ponse(self,response):#获取itme资源itme = response.meta['meta_1']#获取图片地址imgs = response.xpath('//div[@class="main-image"]/p/a/img/@src')[0].extract()#获取图片目录title = response.xpath('//div[@class="main-image"]/p/a/img/@alt')[0].extract()itme["title"]= titleitme["imge_url"]= imgs#itme["nickname"] = itme["Referer"][itme["Referer"].rfind("/"):]+itme["imge_url"][itme["imge_url"].rfind('/')+1:itme["imge_url"].rfind('.')]#itme["nickname"] = itme["imge_url"][itme["imge_url"].rfind('/')+1:itme["imge_url"].rfind('.')]yield itme

items

import scrapyclass MztuItem(scrapy.Item):#目录title = scrapy.Field()#图片地址imge_url = scrapy.Field()#请求头Referer = scrapy.Field()image_Path = scrapy.Field()#图片名称# nickname = scrapy.Field()

pipelines管道

# -*- coding: utf-8 -*-# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
# 导入这个包为了移动文件
import shutil
#此包不解释
import scrapy
# 导入项目设置
from scrapy.utils.project import get_project_settings
# 导入scrapy框架的图片下载类
from scrapy.pipelines.images import ImagesPipeline
#此包不解释
import osclass ImagesPipelinse(ImagesPipeline):#def process_item(self, item, spider):#    return item# 获取settings文件里设置的变量值IMAGES_STORE = get_project_settings().get("IMAGES_STORE")# 重写ImagesPipeline类的此方法# 发送图片下载请求def get_media_requests(self, item, info):image_url = item["imge_url"]#headers是请求头主要是防反爬虫yield scrapy.Request(image_url,headers={'Referer':item['Referer']})def item_completed(self, result, item, info):image_path = [x["path"] for ok, x in result if ok]# 定义分类保存的路径img_path = "%s\%s" % (self.IMAGES_STORE, item['title'])# 目录不存在则创建目录if os.path.exists(img_path) == False:os.mkdir(img_path)# 将文件从默认下路路径移动到指定路径下shutil.move(self.IMAGES_STORE + "\\" +image_path[0], img_path + "\\" +image_path[0][image_path[0].find("full\\")+6:])item['image_Path'] = img_path + "\\" + image_path[0][image_path[0].find("full\\")+6:]return item

这里实现图片保存到不同的目录下，主要函数是shutil.move（），将图片从原始默认路径移动到指定目录下

转载于:https://www.cnblogs.com/contiune/p/9384973.html

Scrapy爬取妹子图保存到不同目录下相关推荐

python scrapy 爬取妹子图的照片
主要描述在windows 上如何用scrapy抓取煎蛋网妹子图所有的图片下载. 软件准备:winpython,啥都不说了,ipython很好用. 安装scrapy,进入winpython 执行scra ...
用scrapy爬取妹子图网的图片，附上源代码
实现这个是因为之前在谋个公众号里面看到一篇文章,关注了也拿不到源代码 ,所以就自己写了一个爬取这个网站图片的功能.个人觉得这个网站的图片就一般吧. 开始环境,py3, win, linux下运行都是 ...
python爬虫爬妹子图_【爬虫】直接上干货-爬取妹子图整站图片
该楼层疑似违规已被系统折叠隐藏此楼查看此楼 #coding=utf-8 import os import requests from lxml import etree import time cl ...
scrapy爬取斗图表情
用scrapy爬取斗图表情,其实呀,我是运用别人的博客写的,里面的东西改了改就好了,推存链接" http://www.cnblogs.com/jiaoyu121/p/6992587.html ...
Scrapy爬取网页并保存到数据库中
Scrapy爬取网页并保存到数据库中一.新建一个Scrapy工程.进入一个你想用来保存代码的文件夹,然后执行: T:\>scrapy startproject fjsen 会生成一堆文件夹和文件 ...
python爬取妹子图片1_【爬虫】直接上干货-爬取妹子图整站图片
该楼层疑似违规已被系统折叠隐藏此楼查看此楼 #coding=utf-8 import os import requests from lxml import etree import time cl ...
【Python】从爬虫开始吧——爬取妹子图整站
首先得解决环境和工具的问题 Python基础教程 Python3基础教程大家也可以去慕课网看视频学习哦,关于选择Python2还是Python3的问题,上手的话还是直接选择3吧. 关于爬虫爬虫就是 ...
Python爬虫入门教程：爬取妹子图网站 - 独行大佬
妹子图网站---- 安装requests打开终端:使用命令pip3 install requests等待安装完毕即可使用接下来在终端中键入如下命令?123# mkdir demo # cd demo# ...
python多线程爬取妹子图网站_python爬取妹子图全站全部图片-可自行添加-线程-进程爬取，图片去重...
from bs4 import BeautifulSoup import sys,os,requests,pymongo,time from lxml import etree def get_fen ...

Scrapy爬取妹子图保存到不同目录下

Scrapy爬取妹子图保存到不同目录下相关推荐

最新文章

热门文章