python爬虫招聘-Python爬虫-爬取招聘网站信息（一）

学习内容，来源于百度搜索

工具及环境

1、python版本：python 3.7.3

2、安装工具：beautifulsoup

3、系统环境：Windows10

4、浏览器：chrome

网页分析

image.png

F12控制台，根据页面检查前端源码，找到想要爬取内容的对应链接

关键分析

html = getHtml("http://www.zhrczp.com/jobs/jobs_list/key/%E5%BB%BA%E6%98%8E%E9%95%87/page/1.html")

soup = BeautifulSoup(html, 'lxml') #声明BeautifulSoup对象

hrefbox = soup.find_all("div","td-j-name",True);

links = [];

for href in range(0,len(hrefbox)):

links.append("http://www.zhrczp.com"+hrefbox[href].contents[0].get('href'));#拼接链接

分析页面，页面所有感兴趣的内容均在 div标签里面，可以使用beautifulsoup提供的find_all函数来查找

main = soup.find_all("div","main",True); 意思是查找div标签class为main的内容

源码

#!/usr/bin/python

# -*- coding: utf-8 -*-

import urllib.request

from bs4 import BeautifulSoup

def getHtml(url):

page = urllib.request.urlopen(url)

html = page.read()

return html

#建明镇=%E5%BB%BA%E6%98%8E%E9%95%87

html = getHtml("http://www.zhrczp.com/jobs/jobs_list/key/%E5%BB%BA%E6%98%8E%E9%95%87/page/1.html")

soup = BeautifulSoup(html, 'lxml') #声明BeautifulSoup对象

hrefbox = soup.find_all("div","td-j-name",True);

links = [];

for href in range(0,len(hrefbox)):

links.append("http://www.zhrczp.com"+hrefbox[href].contents[0].get('href'));#拼接链接

f=open('a.txt','w',encoding='utf-8')

for link in links:

print(link);

html = getHtml(link)

soup = BeautifulSoup(html, 'lxml') #声明BeautifulSoup对象

main = soup.find_all("div","main",True);

f.write(" ***************************显示招聘信息************************************* ")

f.write("职位名称:"+main[0].contents[1].contents[5].contents[1].contents[0]+" ");#职位名称

f.write("发布时间:"+main[0].contents[1].contents[3].contents[1].contents[0]+" ");#发布时间

f.write(" --------------------职位待遇-------------------- ");

f.write("工资:"+main[0].contents[1].contents[7].contents[0]+" ");#wage

f.write("福利：");

for i in range(1,len(main[0].contents[1].contents[9].contents)-3):

f.write(main[0].contents[1].contents[9].contents[i].contents[0]+" ");

f.write(" --------------------联系方式-------------------- ")

f.write(main[0].contents[5].contents[3].contents[0].strip()+" ");#联系人去掉空格

f.write(main[0].contents[5].contents[7].contents[0]+main[0].contents[5].contents[7].contents[1].contents[0]+" ");#联系电话

f.write(" --------------------联系描述-------------------- ")

describe = main[0].contents[7].contents;

f.write(describe[1].contents[0]+describe[3].contents[0]+" ");#职位描述

item = soup.find_all("div","item",True);

f.write(" --------------------职位要求-------------------- ");

f.write(item[0].contents[3].contents[0].contents[0]+":"+item[0].contents[3].contents[1]+" ");#工作性质

f.write(item[0].contents[5].contents[0].contents[0]+":"+item[0].contents[5].contents[1]+" ");#职位类别

f.write(item[0].contents[7].contents[0].contents[0]+":"+item[0].contents[7].contents[1]+" ");#招聘人数

f.write(item[0].contents[11].contents[0].contents[0]+":"+item[0].contents[11].contents[1]+" ");#学历要求

f.write(item[0].contents[13].contents[0].contents[0]+":"+item[0].contents[13].contents[1]+" ");#工作经验

f.write(item[0].contents[15].contents[0].contents[0]+":"+item[0].contents[15].contents[1]+" ");#性别要求

f.write(item[0].contents[19].contents[0].contents[0]+":"+item[0].contents[19].contents[1]+" ");#年龄要求

f.write(item[0].contents[21].contents[0].contents[0]+":"+item[0].contents[21].contents[1]+" ");#招聘部门

f.write(item[0].contents[25].contents[0].contents[0]+":"+item[0].contents[25].contents[1]+" ");#招聘部门

company = soup.find_all("div","cominfo link_gray6",True);

f.write(" --------------------公司信息-------------------- ");

f.write(company[0].contents[3].contents[1].contents[0]+" ");#公司名称

f.write(company[0].contents[5].contents[0].contents[0]+":"+company[0].contents[5].contents[1]+" ");#公司性质

f.write(company[0].contents[7].contents[0].contents[0]+":"+company[0].contents[7].contents[1]+" ");#公司行业

f.write(company[0].contents[9].contents[0].contents[0]+":"+company[0].contents[9].contents[1]+" ");#公司规模

f.write(company[0].contents[11].contents[0].contents[0]+":"+company[0].contents[11].contents[1]+" ");#公司地区

f.write(" ***************************结束招聘信息************************************* ")

f.close();

运行结果

image.png

python爬虫招聘-Python爬虫-爬取招聘网站信息（一）相关推荐

教你用python实现34行代码爬取东方财富网信息，爬虫之路，永无止境！！
教你用python实现34行代码爬取东方财富网信息,爬虫之路,永无止境!! 代码展示: 开发环境: windows10 python3.6 开发工具: pycharm weddriver 库: sel ...
python爬虫案例——根据网址爬取中文网站，获取标题、子连接、子连接数目、连接描述、中文分词列表
全栈工程师开发手册 (作者:栾鹏) python教程全解其中使用到了urllib.BeautifulSoup爬虫和结巴中文分词的相关知识. 调试环境python3.6 # 根据连接爬取中文网站,获取 ...
小白都能看明白的Python网络爬虫、附上几个实用的爬虫小例子：爬取豆瓣电影信息和爬取药监局
文章目录网络爬虫爬虫的基础知识爬虫分类 requests模块爬虫的简单案例简单的收集器爬取豆瓣电影信息爬取药监局返回数据类型数据解析爬取糗事百科图片(正则表达式) xpath解析数 ...
python爬取web漫画网站_[Python爬虫学习]利用selenuim爬取漫画网站
注意事项版本 Python版本为 Python-3.8.3 系统为 Windows10 浏览器为 Firefox-77.0 前置 \(selenium\) \(BeautifulSoup\) \(r ...
python爬取京东商品属性_python爬虫小项目：爬取京东商品信息
#爬取京东手机信息 import requests from bs4 import BeautifulSoup from selenium import webdriver import re imp ...
python自动爬取更新电影网站_python爬取电影网站信息
一.爬取前提 1)本地安装了mysql数据库 5.6版本 2)安装了Python 2.7 二.爬取内容电影名称.电影简介.电影图片.电影下载链接三.爬取逻辑 1)进入电影网列表页, 针对列表的ht ...
爬虫实战之全站爬取拉勾网职位信息
全站爬取拉勾网职位信息一.环境 window7 scrapy MySQL 二.简介 scrapy的全站爬取方式crawlspider跟其通用爬取方式spider实现上有一定的区别,两者都有各自的优势 ...
Python爬取中药网站信息并对其进行简单的分析
开发工具 Python版本:3.5.4 相关模块: 爬虫: import requests from bs4 import BeautifulSoup 词云: from wordcloud impor ...
python第一个项目：爬取一个网站的所有图片
目的:爬取一个网站的所有图片调用库:requests库,BeautifulSoup库程序设计: 1.函数getHTML():用于获取url的html文本代码如下 def getHTML(url) ...
爬虫爬取相亲网站信息
项目背景:男女人数差过大,导致大部分适合结婚的男女没有找到心仪的另一半,可以选择通过各种相亲网站,但网站上面的信息过多而且需要会员才能获取到联系方式,因此我们项目主要是通过爬取相亲网站(在此采用我主良 ...

python爬虫招聘-Python爬虫-爬取招聘网站信息（一）

python爬虫招聘-Python爬虫-爬取招聘网站信息（一）相关推荐

最新文章

热门文章