Scrapy框架，爬取88读书网小说

链接：

88读书网

源码

工具

python 3.7

pycharm

scrapy框架

教程

spider：

# -*- coding: utf-8 -*-
import scrapy
from dushu.items import DushuItemclass BookSpider(scrapy.Spider):name = 'book'# allowed_domains = ['xdushu.com']start_urls = ['https://www.x88dushu.com/xiaoshuo/111/111516/']def parse(self, response):if response.url == self.start_urls[0]:self.logger.info('访问小说目录'+response.url)li_list = response.css("div.mulu ul li a")for li in li_list:link = li.css('a::attr(href)').extract_first()yield scrapy.Request(self.start_urls[0]+link)else:self.logger.info('访问小说内容'+response.url)novel = response.css('div.novel')item = DushuItem()item['chapterName'] = novel.css('h1::text').extract_first()item['text'] = novel.css('div.yd_text2::text').extract()# self.logger().info(item)yield item# pass

items.py：

# -*- coding: utf-8 -*-# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.htmlimport scrapyclass DushuItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()# 章节名称chapterName = scrapy.Field()# 内容text = scrapy.Field()pass

pipelines.py：

# -*- coding: utf-8 -*-# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.htmlimport jsonclass DushuPipeline(object):def process_item(self, item, spider):file = open('mulu/' + item['chapterName'] + '.txt', 'w', encoding='utf-8')for text in item['text']:file.write(text + '\n')file.close()return item

setting.py:

BOT_NAME = 'dushu'SPIDER_MODULES = ['dushu.spiders']
NEWSPIDER_MODULE = 'dushu.spiders'ROBOTSTXT_OBEY = FalseITEM_PIPELINES = {'dushu.pipelines.DushuPipeline': 300,
}

程序运行：

要爬取的小说url：

start_urls = ['https://www.x88dushu.com/xiaoshuo/111/111516/']

运行cmd：

scrapy crawl book

运行结果：

使用Scrapy框架爬取88读书网小说，并保存本地文件相关推荐

Python爬虫 scrapy框架爬取某招聘网存入mongodb解析
这篇文章主要介绍了Python爬虫 scrapy框架爬取某招聘网存入mongodb解析,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下创建项目 sc ...
运用Scrapy框架爬取淘车网十七万二手车数据
本篇内容将使用scrapy框架爬取淘车网所有二手车信息. 下面开始讲解下如何爬取我们想要的数据: 明确爬取目标: 首先,进入官网:https://www.taoche.com/ 进入官网发现,我们要获 ...
Scrapy框架爬取中国裁判文书网案件数据
Scrapy框架爬取中国裁判文书网案件数据项目Github地址: https://github.com/Henryhaohao/Wenshu_Spider 中国裁判文书网 - http://wens ...
scrapy框架爬取古诗文网的名句
使用scrapy框架爬取名句,在这里只爬取的了名句和出处两个字段.具体解析如下: items.py 用来存放爬虫爬取下来的数据模型,代码如下: import scrapyclass QsbkItem( ...
爬虫项目实操三、用scrapy框架爬取豆瓣读书Top250的书名,出版信息和评分
安装方法:Windows:在终端输入命令:pip install scrapy:mac:在终端输入命令:pip3 install scrapy,按下enter键,再输入cd Python,就能跳转到P ...
Python爬虫之爬取笔趣阁小说下载到本地文件并且存储到数据库
学习了python之后,接触到了爬虫,加上我又喜欢看小说,所以就做了一个爬虫的小程序,爬取笔趣阁小说. 程序中一共引入了以下几个库: import requests import mysql.conn ...
python爬取链家网实例——scrapy框架爬取-链家网的租房信息
说明: 本文适合scrapy框架的入门学习. 一.认识scrapy框架开发python爬虫有很多种方式,从程序的复杂程度的角度来说,可以分为:爬虫项目和爬虫文件. scrapy更适合做爬虫项目,ur ...
scrapy框架爬取王者荣耀英雄数据
scrapy框架爬取王者荣耀英雄属性爬虫工程爬虫文件 import scrapy from theKingPro.items import ThekingproItemclass ThekingS ...
python爬虫教程：Scrapy框架爬取Boss直聘网Python职位信息的源码
今天小编就为大家分享一篇关于Scrapy框架爬取Boss直聘网Python职位信息的源码,小编觉得内容挺不错的,现在分享给大家,具有很好的参考价值,需要的朋友一起跟随小编来看看吧分析使用Crawl ...

使用Scrapy框架爬取88读书网小说，并保存本地文件

Scrapy框架，爬取88读书网小说

链接：

工具

教程

使用Scrapy框架爬取88读书网小说，并保存本地文件相关推荐

最新文章

热门文章