建包

对于网络爬虫，我们首先要做的便是利用命令行创建文本包，本文命名为cast


scrapy startproject castcd castscrapy genspider ast itcast.cn

具体步骤如下图：

屏幕快照 2019-05-12 下午09.47.06 上午.png

对生成的item文件进行编写：


import scrapyclass CastItem(scrapy.Item):# define the fields for your item here like:name = scrapy.Field()position = scrapy.Field()detail = scrapy.Field()

对ast文件进行修改

# -*- coding: utf-8 -*-
import scrapy
from cast.items import CastItemclass AstSpider(scrapy.Spider):name = 'ast'allowed_domains = ['itcast.cn']start_urls = ['http://www.itcast.cn/channel/teacher.shtml']def parse(self, response):node_list = response.xpath('//div[@class="li_txt"]')for node in node_list:item = CastItem()name = response.xpath('//h3/text()').extract()position = response.xpath('//h4/text()').extract()detail = response.xpath('//p/text()').extract()item['name'] = name[0].encode('utf-8')item['position'] = position[0].encode('utf-8')item['detail'] = detail[0].encode('utf-8')yield item

修改管道文件

import jsonclass CastPipeline(object):def __init__(self):self.f = open("1.json", "w")def process_item(self, item, spider):content = json.dumps(str(dict(item)), ensure_ascii=False) + ',\n'self.f.write(content)return itemdef close_spider(self, spider):self.f.close()

开启通道，结束

利用scrapy爬取itcast的老师信息的超级详细步骤相关推荐

利用scrapy爬取58同城租房信息
tc.py 代码 # -*- coding: utf-8 -*- import scrapy from ..items import TcItemclass Tc58Spider(scrapy.Spi ...
爬虫进阶之 Scrapy 框架 1（实例：爬取ITcast 的教师信息）
Scrapy 什么是Scrapy 简介 Scrapy 架构使用Scrapy 爬取 ITcast 什么是Scrapy 简介 Scrapy是适用于Python的一个快速.高层次的屏幕抓取和web抓取框架 ...
Python利用Scrapy爬取前程无忧
** Python利用Scrapy爬取前程无忧 ** 一.爬虫准备 Python:3.x Scrapy PyCharm 二.爬取目标爬取前程无忧的职位信息,此案例以Python为关键词爬取相应的职位 ...
使用scrapy爬取斗鱼直播间信息
目录 1. 谷歌抓包工具的使用 1.1 打开Chrome开发者工具的方法 1.2 开发者工具的结构 1.3 network模块 2. 使用谷歌抓包工具抓取斗鱼数据 3. 使用scrapy爬取斗鱼直播间 ...
利用Selenium爬取淘宝商品信息
文章来源:公众号-智能化IT系统. 一. Selenium和PhantomJS介绍 Selenium是一个用于Web应用程序测试的工具,Selenium直接运行在浏览器中,就像真正的用户在操作一样. ...
python爬虫scrapy爬取新闻标题及链接_python爬虫框架scrapy爬取梅花网资讯信息
原标题:python爬虫框架scrapy爬取梅花网资讯信息一.介绍本例子用scrapy-splash爬取梅花网(http://www.meihua.info/a/list/today)的资讯信息, ...
Scrapy爬取知乎用户信息以及人际拓扑关系
Scrapy爬取知乎用户信息以及人际拓扑关系 1.生成项目 scrapy提供一个工具来生成项目,生成的项目中预置了一些文件,用户需要在这些文件中添加自己的代码. 打开命令行,执行:scrapy sta ...
利用scrapy爬取京东移动端的图片素材和商品信息
有一个练习项目需要一些带分类信息的商品测试图片,从现有的电商网站爬取是个不错的选择.刚好最近又在练习scrapy的使用,这一篇记录一下用scrapy爬取京东的图片素材并保存商品信息的思路. 文中代码共 ...
Scrapy爬取知乎用户信息
1 爬取逻辑先选取一个用户,爬取他的粉丝列表和关注列表.然后对每个粉丝进行分析,找出他们的粉丝列表和关注列表,以此往复,递归下去,就可以爬取大部分的用户信息了.通过一个树形的结构,蔓延到所有的用户. ...

利用scrapy爬取itcast的老师信息的超级详细步骤

建包

对生成的item文件进行编写：

对ast文件进行修改

修改管道文件

开启通道，结束

利用scrapy爬取itcast的老师信息的超级详细步骤相关推荐

最新文章

热门文章