潭州课堂25班：Ph201805201 爬虫高级第三课 sclapy 框架腾讯招聘案例 (课堂笔记）...

到指定目录下，创建个项目

进到 spiders 目录创建执行文件，并命名

运行调试

执行代码，：

# -*- coding: utf-8 -*-
import scrapy
from ..items import TenXunItemclass TenxunSpider(scrapy.Spider):name = 'tenxun'# allowed_domains = ['tenxun.com']  # 域名范围start_urls = ['https://hr.tencent.com/position.php?lid=&tid=87&keywords']burl = 'https://hr.tencent.com/'def parse(self, response):tr_list = response.xpath('//table[@class="tablelist"]/tr')for tr in tr_list[1:-1]:item = TenXunItem()item['position_name']=tr.xpath('./td[1]/a/text()').extract()[0]item['position_link']=self.burl+tr.xpath('./td[1]/a/@href').extract()[0]item['position_type']=tr.xpath('./td[2]/text()').extract()[0]item['position_num']=tr.xpath('./td[3]/text()').extract()[0]item['position_addr']=tr.xpath('./td[4]/text()').extract()[0]item['position_time']=tr.xpath('./td[5]/text()').extract()[0]# yield item# 匹配下一页next_url =self.burl + response.xpath('//div[@class="pagenav"]/a[11]/@href').extract()[0]yield scrapy.Request(url=next_url, callback=self.parse)# 要获取内容，则要发起个新的请求，                      回调函数                回调时传参yield scrapy.Request(url = item['position_link'],callback=self.detail_tent,meta={'items': item})def detail_tent(self,response):# 得到上面传过来的参数item = response.meta.get('items')item['position_con'] = ''.join(response.xpath('//ul[@class="squareli"]//text()').extract())yield item# # 名字# position_name_list = response.xpath('//td[@class="l square"]/a/text()').extract()# # 链接# position_link_list = response.xpath('//td[@class="l square"]/a/@href').extract()# # 类型# position_type_list = response.xpath('//table[@class="tablelist"]/tr/td[2]/text()').extract()# # 人数# position_num_list = response.xpath('//table[@class="tablelist"]/tr/td[3]/text()').extract()# print('====================')# print('====================')# print(self.burl + tr_list[2].xpath('./td[1]/a/@href').extract()[0])# print('====================')# print('====================')

pipelines.py

# -*- coding: utf-8 -*-# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import jsonclass TenXunPipeline(object):def open_spider(self,spider):self.f = open('tenxun.json', 'w', encoding='utf8')def process_item(self, item, spider):conn = json.dumps(dict(item), ensure_ascii=False)+'\n'self.f.write(conn)return itemdef close_spider(self,spider):self.f.close()

items.py

# -*- coding: utf-8 -*-# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.htmlimport scrapyclass TenXunItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()# 名字print('00000000000000001111111111111111')position_name = scrapy.Field()# 链接position_link = scrapy.Field()# 类型position_type = scrapy.Field()# 人数position_num = scrapy.Field()# 地点position_addr = scrapy.Field()# 发布时间position_time = scrapy.Field()# 要求position_con = scrapy.Field()

存入数据库：

转载于:https://www.cnblogs.com/gdwz922/p/9719704.html

潭州课堂25班：Ph201805201 爬虫高级第三课 sclapy 框架腾讯招聘案例 (课堂笔记）...相关推荐

潭州课堂25班：Ph201805201 爬虫高级第五课 sclapy 框架日志和 settings 配置模拟登录(课堂笔记）...
当要对一个页面进行多次请求时, 设 dont_filter = True 忽略去重在 scrapy 框架中模拟登录创建项目创建运行文件设请求头 # -*- coding: utf-8 ...
爬虫(20)Scrapy知识补充+腾讯招聘案例+古诗文详情页+总结
文章目录第十八章腾讯招聘案例 1. 腾讯招聘案例 2. 代码实现 2.1 配置项目 2.2 解析数据 2.3 翻页处理 2.4 获取详情页信息 3. 古诗词网补充 3.1 验证是否在源码中 3.2 ...
潭州课堂25班：Ph201805201 第十课类的定义，属性和方法 (课堂笔记)
类的定义共同属性,特征,方法者,可分为一类,并以名命之 class Abc: # class 定义类, 后面接类名 ( 规则首字母大写 ) cls_name = '这个类的名字是Abc' # 在类 ...
潭州课堂25班：Ph201805201 爬虫高级第十二课 Scrapy-redis分布项目实战 (课堂笔记)...
建代理池, 1,获取多个网站的免费代理IP, 2,对免费代理进行检测,>>>>>携带IP进行请求, 3,检测到的可用IP进行存储, 4,实现api接口,方便调用, 5,各 ...
潭州课堂25班：Ph201805201 爬虫基础第一课 (课堂笔记)
爬虫的概念: 其实呢,爬虫更官方点的名字叫数据采集,英文一般称作spider,就是通过编程来全自动的从互联网上采集数据. 比如说搜索引擎就是一种爬虫. 爬虫需要做的就是模拟正常的网络请求,比如你在网站 ...
潭州课堂25班：Ph201805201 爬虫基础第六课选择器 (课堂笔记)
HTML解析库BeautifulSoup4 BeautifulSoup 是一个可以从HTML或XML文件中提取数据的Python库,它的使用方式相对于正则来说更加的简单方便,常常能够节省我们大量的时间 ...
潭州课堂25班：Ph201805201 爬虫基础第十五课 js破解二 (课堂笔记）
PyExecJs使用 PyExecJS是Ruby的ExecJS移植到Python的一个执行JS代码的库. 安装 pip install PyExecJS 例子 >>> import ...
潭州课堂25班：Ph201805201 爬虫基础第九课图像处理- PIL (课堂笔记）
Python图像处理-Pillow 简介 Python传统的图像处理库PIL(Python Imaging Library ),可以说基本上是Python处理图像的标准库,功能强大,使用简单. 但是由 ...
潭州课堂25班：Ph201805201 第十三课文件 (课堂笔记)
对文件的操作, open('h:\\asa.txt') r 以只读方式打开 w 以写入方式打开,会覆盖已文件 X 如果已存在,会异常 a 如果文件存在,则在 ...

潭州课堂25班：Ph201805201 爬虫高级第三课 sclapy 框架腾讯招聘案例 (课堂笔记）...

潭州课堂25班：Ph201805201 爬虫高级第三课 sclapy 框架腾讯招聘案例 (课堂笔记）...相关推荐

最新文章

热门文章

潭州课堂25班：Ph201805201 爬虫高级 第三课 sclapy 框架 腾讯 招聘案例 (课堂笔记）...

潭州课堂25班：Ph201805201 爬虫高级 第三课 sclapy 框架 腾讯 招聘案例 (课堂笔记）...相关推荐

最新文章

热门文章

潭州课堂25班：Ph201805201 爬虫高级第三课 sclapy 框架腾讯招聘案例 (课堂笔记）...

潭州课堂25班：Ph201805201 爬虫高级第三课 sclapy 框架腾讯招聘案例 (课堂笔记）...相关推荐