爬取爱笔智能招聘职位

http://aibee.com/cn/joinus.aspx

 1 import requests
 2 from urllib.parse import urlencode
 3 from pyquery import PyQuery as pq
 4 from pymongo import MongoClient
 5 import json
 6
 7
 8 base_url = 'http://aibee.com/cn/joinus.aspx?action=jobinfo&'
 9
10 headers = {
11     'Host': 'aibee.com',
12     'Referer': 'http://aibee.com/cn/joinus.aspx',
13     'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
14     'X-Requested-With': 'XMLHttpRequest',
15 }
16
17 client = MongoClient()
18 db = client['aibee']
19 collection = db['aibee']
20 max_id = 50
21
22
23
24 def get_page(id):
25
26     formData = {
27             'id': id,
28         }
29
30
31     data = urlencode(formData)
32     url = base_url + urlencode(formData)
33     try:
34         response = requests.get(url, data=data, headers=headers)
35         if response.status_code == 200:
36
37             return response.json()
38     except requests.ConnectionError as e:
39         print('Error', e.args)
40
41
42 def parse_page(json_1):
43     if json_1:
44         items = json_1.get('shuzu')
45         for item in items:
46             if id == 1 :
47                 continue
48             else:
49
50                 aibee = {}
51                 aibee['id'] = item.get('id')
52                 aibee['title'] = item.get('title')
53                 aibee['zhize'] = pq(item.get('zhize')).text()
54                 aibee['yaoqiu'] = pq(item.get('yaoqiu')).text()
55                 aibee['dtt'] = item.get('dtt')
56                 aibee['emailaddr'] = item.get('emailaddr')
57                 yield aibee
58
59
60 def write_to_file(content):
61     with open('aibee.json','a',encoding='utf-8') as f:
62         f.write(json.dumps(content,ensure_ascii=False)+'\n')
63         f.close()
64
65 def save_to_mongo(result):
66     if collection.insert(result):
67         print('Saved to Mongo')
68
69
70 if __name__ == '__main__':
71     for id in range(1, max_id + 1):
72         json_1 = get_page(id)
73         #print(json_1)
74
75         results = parse_page(json_1)
76         for result in results:
77             print(result)
78             write_to_file(result)
79             save_to_mongo(result)

或者：

 1 import requests
 2 from urllib.parse import urlencode
 3 from pyquery import PyQuery as pq
 4 from pymongo import MongoClient
 5 import json
 6
 7
 8 url = 'http://aibee.com/cn/joinus.aspx?action=jobinfo'
 9
10 headers = {
11     'Host': 'aibee.com',
12     'Referer': 'http://aibee.com/cn/joinus.aspx',
13     'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
14     'X-Requested-With': 'XMLHttpRequest',
15 }
16
17 client = MongoClient()
18 db = client['aibee']
19 collection = db['aibee']
20 max_id = 50
21
22
23
24 def get_page(id):
25
26     formData = {
27             'id': id,
28         }
29     try:
30         r = requests.post(url, data=formData, headers=headers)
31         if r.status_code == 200:
32             return r.json()
33     except requests.ConnectionError as e:
34         print('Error', e.args)
35
36
37 def parse_page(json_1):
38     if json_1:
39         items = json_1.get('shuzu')
40         for item in items:
41             if id == 1 :
42                 continue
43             else:
44
45                 aibee = {}
46                 aibee['id'] = item.get('id')
47                 aibee['title'] = item.get('title')
48                 aibee['zhize'] = pq(item.get('zhize')).text()
49                 aibee['yaoqiu'] = pq(item.get('yaoqiu')).text()
50                 aibee['dtt'] = item.get('dtt')
51                 aibee['emailaddr'] = item.get('emailaddr')
52                 yield aibee
53
54
55 def write_to_file(content):
56     with open('aibee.json','a',encoding='utf-8') as f:
57         f.write(json.dumps(content,ensure_ascii=False)+'\n')
58         f.close()
59
60 def save_to_mongo(result):
61     if collection.insert(result):
62         print('Saved to Mongo')
63
64
65 if __name__ == '__main__':
66     for id in range(1, max_id + 1):
67         json_1 = get_page(id)
68         #print(json_1)
69
70         results = parse_page(json_1)
71         for result in results:
72             print(result)
73             write_to_file(result)
74             save_to_mongo(result)

转载于:https://www.cnblogs.com/wanglinjie/p/9226880.html

爬取爱笔智能招聘职位相关推荐

北京 | Aibee 爱笔智能招聘计算机视觉算法实习生
关注公众号,获取更多AI领域发展机会公司介绍 Aibee(爱笔智能),创立于2017年11月,是业界领先的人工智能(AI)整体解决方案企业,致力于"赋能升级垂直行业,打磨极致AI技术&qu ...
Crawler：爬虫之基于https+parse库实现爬取国内某知名招聘网上海、北京关于区块链职位的求职信息
Crawler:爬虫之基于https+parse库实现爬取国内某知名招聘网上海.北京关于区块链职位的求职信息目录输出结果 1.上海区块链职位 2.北京区块链职位设计思路核心代码输出结果 1. ...
python爬虫——爬取拉勾上的职位信息
爬取拉勾网站岗位数据 1.调用网页查找网页链接规律写一个for循环,爬取每一个网页的职位信息 def down():for i in range(1,4):if i == 1:strUrl = & ...
python爬取boss直聘招聘信息_python学习之路-爬取boss直聘的岗位信息
背景想了解从事python相关岗位需要具备什么技能,于是就想从招聘网站上的职位需求入手,把信息获取下来后,生成词云,这样就能很直观的看出来哪些技能是python相关岗位需要具备的了. 技术概览 sc ...
使用python爬取猎聘网的职位信息
闲来无事,就像看看现在各个行业找工作的情况,写了个简单的爬虫,爬取猎聘网的职位信息. 话不多说,直接上代码. #-*- coding:utf-8 -*- # 抓取猎聘的职位 import time i ...
python爬取招聘信息_python 爬取boss直聘招聘信息实现
原标题:python 爬取boss直聘招聘信息实现 1.一些公共方法的准备获取数据库链接: importpymysql ''' 遇到不懂的问题?Python学习交流群:821460695满足你的需求 ...
python爬取boss直聘招聘信息_Python 爬取boss直聘招聘信息！
原标题:Python 爬取boss直聘招聘信息! 1.一些公共方法的准备获取数据库链接: importpymysql ''' 遇到不懂的问题?Python学习交流群:821460695满足你的需求, ...
python——图片爬虫：爬取爱女神网站(www.znzhi.net)上的妹子图进阶篇
在上一篇博客中:python--图片爬虫:爬取爱女神网站(www.znzhi.net)上的妹子图基础篇我讲解了图片爬虫的基本步骤,并实现了爬虫代码在本篇中,我将带领大家对基础篇中的代码进行改善, ...
python3 练手：爬取爱问知识人，运用sqlite3保存数据
python3 练手:爬取爱问知识人参考地址:https://cuiqingcai.com/1972.html 获取页面:https://iask.sina.com.cn/c/74.html 分析: ...

爬取爱笔智能招聘职位

爬取爱笔智能招聘职位相关推荐

最新文章

热门文章

爬取 爱笔智能 招聘职位

爬取 爱笔智能 招聘职位相关推荐

最新文章

热门文章

爬取爱笔智能招聘职位

爬取爱笔智能招聘职位相关推荐