scrapy框架学习之demo2

在该部分将pipeLine模块加上
直接上代码
qsbk.py

# -*- coding: utf-8 -*-
import scrapy
from tutorial.items import TutorialItemfrom scrapy.http import response
class QsbkSpider(scrapy.Spider):name = 'qsbk'allowed_domains = ['qiushibaike.com']start_urls = ['https://www.qiushibaike.com/text/page/1/']def parse(self, response):authors = response.xpath('//div[@class="col1 old-style-col1"]//h2/text()')contents = response.xpath('//div[@class="col1 old-style-col1"]//div[@class="content"]/span')for v,c in zip(authors,contents):author = v.get().strip()content = "".join(c.xpath('./text()').extract()).strip()item = TutorialItem(author= author,content=content)yield item

items.py

# -*- coding: utf-8 -*-# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.htmlimport scrapyclass TutorialItem(scrapy.Item):# define the fields for your item here like:author = scrapy.Field()content = scrapy.Field()pass

pipelines.py代码

# -*- coding: utf-8 -*-# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.htmlimport jsonclass TutorialPipeline(object):def __init__(self):self.file = open("spider.txt","w",encoding="utf-8") #打开文件def open_spider(self,spider):print("begin spider")def process_item(self, item, spider):print(dict(item))line = json.dumps(dict(item),ensure_ascii=False)+"\n" #ensure_ascii=False防止中文编码时出现乱码print("*"*40)print(line)self.file.write(line)return itemdef close_spider(self,spider):self.file.close() #关闭文件print("close spider")

settings.py

scrapy框架学习之demo2相关推荐

Scrapy框架学习（四）----CrawlSpider、LinkExtractors、Rule及爬虫示例
Scrapy框架学习(四)--CrawlSpider.LinkExtractors.Rule及爬虫示例 CrawlSpider.LinkExtractors.Rule是scrapy框架中的类,其中Cr ...
Scrapy框架学习记录
随着Python爬虫学习的深入,开始接触Scrapy框架记录下两个参考教程: 官网教程:http://scrapy-chs.readthedocs.io/zh_CN/0.24/intro/tutor ...
Scrapy框架学习（1）
Scrapy框架官方网址:http://doc.scrapy.org/en/latest 安装 Windows 安装方式 • Python 2 / 3 • 升级pip版本:pip install –u ...
Scrapy框架学习笔记 - 爬取腾讯招聘网数据
文章目录一.Scrapy框架概述 (一)网络爬虫 (二)Scrapy框架 (三)安装Scrapy框架 (四)Scrapy核心组件 (五)Scrapy工作流程二. Scrapy案例演示 (一)爬取目 ...
爬虫Scrapy框架学习（五）-东莞阳光热线问政平台爬取案例
本案例通过典型的scrapy框架Spider类展现了一个模板式的爬虫过程,请读者细细体会,此案例为必会项目,按照本博客配置,完美通过.本文还对item做了限制,编写item文件的主要作用是防止爬取数据 ...
scrapy框架学习
文章目录一.IP代理池(比较简陋,后续更新) 验证ip,proxies用的是两个协议,http和https都要有二.python爬虫之scrapy框架 **先贴一张图** 并来一个牛逼的连接[Sc ...
scrapy框架学习记录（2）
更新一下上次的内容: Scrapy Engine(引擎):Scrapy框架的核心部分.负责在Spider和ItemPipeline.Downloader.Scheduler中间通信.传递数据等. Sp ...
Scrapy框架学习笔记
在Pycharm中新建一个项目KwScrapySpider 2.File->setting->Python Interpreter安装scrapy 打开Terminal,执行命令: scr ...
爬虫系列---Scrapy框架学习
项目的需求需要爬虫某网的商品信息,自己通过Requests,BeautifulSoup等编写了一个spider,把抓取的数据存到数据库里面. 跑起来的感觉速度有点慢,尤其是进入详情页面抓取信息的时候, ...

scrapy框架学习之demo2

scrapy框架学习之demo2相关推荐

最新文章

热门文章