Scrapy Crawl 运行出错 AttributeError: 'xxxSpider' object has no attribute '

按照官方的文档写的demo，只是多了个init函数，最终执行时提示没有_rules这个属性的错误日志如下：

 ......File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spiders\crawl.py", line 82, in _parse_responsefor request_or_item in self._requests_to_follow(response):File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spiders\crawl.py", line 60, in _requests_to_followfor n, rule in enumerate(self._rules):
AttributeError: 'TestSpider' object has no attribute '_rules'

出问题的spider代码如下：

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from newtest.items import NewtestItemclass TestSpider(CrawlSpider):def __init__(self,*args, **kwargs):self.headers = {'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8','Accept-Encoding':'gzip, deflate','User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}name = 'test'allowed_domains = ['example.com']start_urls = ['http://www.example.com']rules = (# Extract links matching 'category.php' (but not matching 'subsection.php')# and follow links from them (since no callback means follow=True by default).Rule(LinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))),# Extract links matching 'item.php' and parse them with the spider's method parse_itemRule(LinkExtractor(allow=('item\.php', )), callback='parse_item'),)def parse_item(self, response):self.logger.info('Hi, this is an item page! %s', response.url)item = scrapy.Item()item['id'] = response.xpath('//td[@id="item_id"]/text()').re(r'ID: (\d+)')item['name'] = response.xpath('//td[@id="item_name"]/text()').extract()item['description'] = response.xpath('//td[@id="item_description"]/text()').extract()return item

后来仔细看了下，跟官方不一样的就是自己重写了init初始化方法，而根据这个提示的日志，应该是覆盖了CrawlSpider的init方法但是没有调用父类的init导致_rules这个属性没有声明导致的。我们来看下CrawlSpider的源码：

所以如果我们的Spider是从CrawlSpider继承过来的，并且自己需要实现__init__ 方法的话，记住要调用父类的__init__方法保障能正常初始化crawlspider的属性。
修改后的代码如下：

第11行的super(TestSpider, self).__init__(*args, **kwargs) 是关键：

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from newtest.items import NewtestItemclass TestSpider(CrawlSpider):def __init__(self, *args, **kwargs):super(TestSpider, self).__init__(*args, **kwargs)  # 这里是关键self.headers = {'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8','Accept-Encoding':'gzip, deflate','User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}name = 'test'allowed_domains = ['example.com']start_urls = ['http://www.example.com']rules = (# Extract links matching 'category.php' (but not matching 'subsection.php')# and follow links from them (since no callback means follow=True by default).Rule(LinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))),# Extract links matching 'item.php' and parse them with the spider's method parse_itemRule(LinkExtractor(allow=('item\.php', )), callback='parse_item'),)def parse_item(self, response):self.logger.info('Hi, this is an item page! %s', response.url)item = scrapy.Item()item['id'] = response.xpath('//td[@id="item_id"]/text()').re(r'ID: (\d+)')item['name'] = response.xpath('//td[@id="item_name"]/text()').extract()item['description'] = response.xpath('//td[@id="item_description"]/text()').extract()return item

转载于:https://www.cnblogs.com/xiaocy66/p/10589277.html

Scrapy Crawl 运行出错 AttributeError: 'xxxSpider' object has no attribute '_rules' 的问题解决...相关推荐

Python3.7 Scrapy crawl 运行出错解决方法
Python3.7 Scrapy crawl 运行出错解决方法参考文章: (1)Python3.7 Scrapy crawl 运行出错解决方法 (2)https://www.cnblogs.com/ ...
Scrapy爬虫报错AttributeError: ‘NoneType‘ object has no attribute ‘write‘
前言一.报错 AttributeError: 'NoneType' object has no attribute 'write' 二.报错原因 1.piplines文件中的方法不能自定义的呢开始 ...
rosrun rqt_graph rqt_graph报警：AttributeError: ‘ElementTree‘ object has no attribute ‘getiterator‘
一.问题描述: joes@joes:~$ rosrun rqt_graph rqt_graph Found metadata in lib /home/joes/.local/lib/python3. ...
Spyder 运行出现 Reloaded modules: **AttributeError: 'NoneType' object has no attribute 'modules' 错误的解决方法
问题描述: 刚开始学习python,有很多问题不懂,网上找了很久找不到答案,就把这个问题记下来,希望可以帮助到其他初学者使用spyder运行以下代码: 第一次运行可以正常显示结果,第二次运行时报错: ...
运行项目时flask_sqlalchemy报错AttributeError: ‘LocalStack‘ object has no attribute ‘__ident_func__‘
运行项目时flask_sqlalchemy报错AttributeError: 'LocalStack' object has no attribute '__ident_func__' 1.原因 2. ...
web.py——运行错误【AttributeError: ‘StaticApp‘ object has no attribute ‘directory‘】
问题描述 AttributeError("'StaticApp' object has no attribute 'directory'") Traceback (most rec ...
AttributeError: 'dict' object has no attribute 'has_key'
运行下面的代码: if (locals().has_key('data')):del datagc.collect() 出错: if (locals().has_key('data')): Attri ...
【报错记录】AttributeError: ‘xxx‘ object has no attribute ‘module‘
文章目录问题描述问题分析与解决总结参考资料问题描述在跑代码时,报出 AttributeError: 'InpaintGenerator' object has no attribute ' ...
Python错误：AttributeError: 'generator' object has no attribute 'next'解决办法
今天在学习生成器对象(generation object)运行以下代码时,遇到了一个错误: #定义生成器函数 def liebiao(): for x in range(10): yield x #函 ...

Scrapy Crawl 运行出错 AttributeError: 'xxxSpider' object has no attribute '_rules' 的问题解决...

Scrapy Crawl 运行出错 AttributeError: 'xxxSpider' object has no attribute '_rules' 的问题解决...相关推荐

最新文章

热门文章