python爬取前程无忧招聘岗位信息

######################首先使用requests获取前程无忧一级网页
import requests
from lxml.etree import HTML
import re
import time
page=1
while True:
url=‘https://search.51job.com/list/020000,000000,0000,01,2,99,%2B,2,{}.html?lang=c&stype=1&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=5&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=’
try:
rst=requests.get(url.format(page))
rst.apparent_encoding
rst.encoding=‘GB2312’
rst=rst.text
html=HTML(rst)
ls=html.xpath(’//[@id=“resultList”]/div[.]/p/span/a/@href’) #Xpath获取所有岗位的url
tatal=re.compile('共(.?)页’).findall(rst) #正则获取总页数
tatal=int(tatal[0])
################################获取二级网页的url(所有岗位的url) 储存起来

    with open('sh1.txt','a',encoding='utf-8') as f:for i in ls:f.write(str(i)+'\n')f.flush()print(i)
except Exception:print('失败')with open('shibai1.txt','a',ending='utf-8') as s:s.write(url)
if page < tatal:page=page +1
else:break

############################分析获取到的信息
with open(‘sh1.txt’,‘r’,encoding=‘utf-8’) as e: #读取已存url
ls1=e.readlines()
n=1
for j in ls1:
try:
rst=requests.get(j)
rst.encoding = ‘gbk’
rst = rst.text
with open(‘D:\ClassWork\Python\前程无忧1\’+str(n)+’.html’,‘a’,encoding=‘gbk’) as h:
###############下载所有岗位的二级网页，保存本地
h.write(rst)
print(‘第’+str(n)+‘个网页’)
n=n+1
except Exception:
print(‘失败’)
with open(‘失败1.txt’,‘a’,encoding=‘gbk’) as g:
g.write(str(j)+’\n’)
timeout=10
continue

python爬取前程无忧招聘岗位信息相关推荐

用python爬取前程无忧招聘网
直接上代码了,相比前篇文章智联招聘网的数据,前程无忧网的数据可以爬取很多. 网址:https://search.51job.com/list/040000,000000,0000,00,9,99,%2 ...
python爬取前程无忧招聘网站数据搭建Hadoop、Flume、Kafka、Spark用Hive做数据分析Sqoop存储到Mysql并实现可视化
文章目录一.项目总体要求二.环境搭建 1.安装包准备 2.安装jdk (1)查询是否安装java (2)卸载jdk (3)安装jdk (4)配置jdk环境变量 3.配置ssh免密登录 (1)进入到 ...
大数据项目开发hadoop集群搭建 python爬取前程无忧招聘网信息以及进行数据分析和数据可视化
大数据项目开发实训报告一.Hadoop环境搭建 1: jdk的安装 1):在linux系统下的opt目录下创建software 和 module 两个目录 2):利用filezilla工具将 jdk ...
使用Python爬取51job招聘网的数据
使用Python爬取51job招聘网的数据进行网站分析获取职位信息存储信息最终代码进行网站分析进入https://www.51job.com/这个网站我在这就以python为例搜索职位跳 ...
python爬取前程无忧scrapy存mogondb案例
一.分析网页新:python爬取前程无忧scrapy存mogondb案例+可视化原网页直达 1.比如java字段,可以先拿到全部的el获取java字段的href,然后在逐个访问进入详情页 2.编写 ...
python爬取前程无忧当日的全部招聘信息
用了几天时间写成的爬取前程无忧的当日的招聘信息,通过多线程的方式同时爬取多个城市的信息,作为资料保存下来,一下是完整代码,可以直接复制粘贴爬取这里爬取的数据条件是是24小时内,周末双休的,会在当前文 ...
python爬取前程无忧_用python爬取前程无忧网，看看我们是否真的“前程无忧”？...
The best time to plant a tree was 10 years ago,the second best time is now. 种一棵树最好的时间是十年前,其次是现在. 利用p ...
用python爬取前程无忧网，看看我们是否真的“前程无忧”？
作者:旧时晚风拂晓城公众号:凹凸数据 The best time to plant a tree was 10 years ago,the second best time is now. 种一棵树 ...
Python3爬取前程无忧招聘数据教程
文章来自群友易某某的投稿,在此表示感谢! 原文链接:https://blog.csdn.net/weixin_42572590/article/details/103443213 前几天,我发表了 ...

python爬取前程无忧招聘岗位信息

python爬取前程无忧招聘岗位信息相关推荐

最新文章

热门文章