Python笔记-获取拉钩网南京关于python岗位数据

FIddler抓包如下：

程序打印如下：

源码如下：

import re
import requestsclass HandleLaGou(object):def __init__(self):self.laGou_session = requests.session()self.header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}self.city_list = ""#获取全国城市列表def handle_city(self):city_search = re.compile(r'zhaopin/">(.*?)</a>')city_url = "https://www.lagou.com/jobs/allCity.html"city_result = self.handle_request(method = "GET", url = city_url)self.city_list = city_search.findall(city_result)self.laGou_session.cookies.clear()def handle_city_job(self, city):first_request_url = "https://www.lagou.com/jobs/list_python?city=%s&cl=false&fromSearch=true&labelWords=&suginput=" % cityfirst_response = self.handle_request(method = "GET", url = first_request_url)total_page_search = re.compile(r'class="span\stotalNum">(\d+)</span>')try:total_page = total_page_search.search(first_response).group(1)except:returnelse:for i in range(1, int(total_page) + 1):data = {"pn": i,"kd": "python"}page_url = "https://www.lagou.com/jobs/positionAjax.json?city=%s&needAddtionalResult=false" % cityreferer_url = "https://www.lagou.com/jobs/list_python?city=%s&cl=false&fromSearch=true&labelWords=&suginput=" % cityself.header['Referer'] = referer_url.encode()response = self.handle_request(method = "POST", url = page_url, data = data)print(response)def handle_request(self, method, url, data=  None, info = None):if method == "GET":response = self.laGou_session.get(url = url, headers = self.header, proxies={"http": "http://127.0.0.1:8888", "https":"http:127.0.0.1:8888"},verify=r"D:/Fiddler/FiddlerRoot.pem")elif method == "POST":response = self.laGou_session.post(url = url, headers = self.header, data=data, proxies={"http": "http://127.0.0.1:8888", "https":"http:127.0.0.1:8888"},verify=r"D:/Fiddler/FiddlerRoot.pem")response.encoding = 'utf-8'return response.textif __name__ == '__main__':laGou = HandleLaGou()laGou.handle_city()for city in laGou.city_list:laGou.handle_city_job(city)breakpass

这里有个小技巧

以前用C++去搞爬虫，简直累死，现在用python真是香，很多都帮忙处理了！

通过使用这个session，当在爬数据时，可能他会先触发一个页面，设置了cookie后，才能进入爬取。

Python笔记-获取拉钩网南京关于python岗位数据相关推荐

python爬虫获取拉钩网在线搜索招聘信息(超实用!)
在之前的博客<用python爬虫制作图片下载器(超有趣!)>中,小菌为大家分享了如何制作一个快捷便利的图片下载器.本次分享,小菌为大家带来的同样是关于爬虫程序方面的分享--获取拉勾网在线搜 ...
python 爬取拉钩网数据
python 爬取拉钩网数据完整代码下载:https://github.com/tanjunchen/SpiderProject/blob/master/lagou/LaGouSpider.py # ...
千锋python笔记_《2020千锋Python入门视频全套全开源》多实用
确实,对于在学开发或者已经从事开发工作的小哥哥小姐姐来说,恋爱这个事还挺让人揪心的! 有对象的时候又怕嘴笨不会哄对象,总不能跟对象说你看我用代码给你画个心吧!没对象的用爬虫技术爬来各种小哥哥小姐姐的头 ...
python爬虫获取方法_小白学python爬虫：2.获得数据
在上一篇文章我我们已经完成了对网页的分析,包括了:在源码中数据的定位:获取方法(xpath).那么在获得数据之前我们考虑的则是如何获取源码. 接下来我们将学习如何从服务器获得源码. #写在前面&quo ...
python经纬度获取县名_利用 Python 批量获取县镇运输距离
起因最近做规划项目,领导让查出某几个市的所有乡镇级行政区域,距离所在县级行政中心的交通运输距离.想着也不是啥难事儿,高德地图一搜就有. 当我把各市行政区划统计完,发现一共有五百多个乡镇,意味着要在地 ...
python中获取文件大小_如何在Python中获取文件大小
python中获取文件大小 We can get file size in Python using the os module. 我们可以使用os模块在Python中获取文件大小. Python中的 ...
Python爬虫+可视化分析技术实现招聘网站岗位数据抓取与分析推荐系统
程序主要采用Python 爬虫+flask框架+html+javascript实现岗位推荐分析可视化系统,实现工作岗位的实时发现,推荐检索,快速更新以及工作类型的区域分布效果,关键词占比分析等. 程序 ...
python爬虫应聘信息_python爬虫获取拉钩网在线搜索招聘信息(超实用!)
# -*- encoding: utf-8 -*- """ """ # 导入相应的包 import requests import tim ...
Python笔记-获取某贴吧页面所有的贴吧id
这里要注意,获取贴吧ID有防爬虫,他会把数据用这个注释掉. 运行截图如下: 注意要伪造成浏览器,不然获取不到数据代码如下: import requests import ...

Python笔记-获取拉钩网南京关于python岗位数据

Python笔记-获取拉钩网南京关于python岗位数据相关推荐

最新文章

热门文章