代码使用了try，except来排除异常

随机选取代理IP+sleep15秒，模拟人类点击，以避开反爬虫机制

# coding=utf-8
from bs4 import BeautifulSoup
import requests
import time
import random
import sys
import pandas
import MySQLdbdef getpage():pg=1

    h1={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}o_g=['222.33.192.238:8118','121.61.17.36:8118','113.200.214.164:9999','222.33.192.238:8118']a=random.randint(0,4)pro={'http': o_g[a]}url='http://sou.zhaopin.com/jobs/searchresult.ashx?in=180000&pd=30&jl=%E4%B8%8A%E6%B5%B7&kw=%E5%9F%BA%E9%87%91&sm=0&sf=0&el=4&isfilter=0&fl=538&isadv=1&sb=1&p='+ str(pg)while pg <10:try:html=requests.get(url,timeout=20,headers=h1,proxies=pro)html.encoding = "utf-8"
            html = html.textprint '抓到'
            re(html)except :       #锁定是哪种异常
            xx=raw_input('again?')if xx=='yes':pass
            else:print 'ERROR INPUT !'
        print('翻页等15秒钟')time.sleep(15)pg=pg+1

def re(html):try:l1=[]l2=[]l3=[]soup = BeautifulSoup(html,'lxml')con=soup.find_all('a',style="font-weight: bold")for item in con:l1.append(item.get_text())l2.append(item.attrs['href'])con2=soup.find_all('li',class_="newlist_deatil_two")for item2 in con2:l3.append(item2)print l1print l2print l3for j in range(0, 59):conn= MySQLdb.connect(host='localhost',port = 3306,user='root',passwd='******',db ='zlzp',charset='utf8'
            )cur = conn.cursor()cur.execute("insert into zlzp VALUES (NULL,'%s','%s','%s')"%(l1[j],l2[j],l3[j]))cur.close()conn.commit()conn.close()print("成功")except:print("重新解析")re(html)getpage()

使用BS4爬取智联招聘相关推荐

python爬虫多url_Python爬虫实战入门六：提高爬虫效率—并发爬取智联招聘
之前文章中所介绍的爬虫都是对单个URL进行解析和爬取,url数量少不费时,但是如果我们需要爬取的网页url有成千上万或者更多,那怎么办? 使用for循环对所有的url进行遍历访问? 嗯,想法很好,但是 ...
BeautifulSoup爬取智联招聘数据
BeautifulSoup爬取智联招聘数据警告: 此项技术仅适用于练习,限制大量大规模爬取,在爬取中使用了个人cookie,请注意不要随意泄露,内含个人隐私信息! 如果过分爬取,会造成ip被封! 1 ...
爬取智联招聘信息并存储
#-*- coding: utf-8 -*- import urllib.request import os,time from bs4 import BeautifulSoup #爬取智联招聘网站的 ...
python爬取智联招聘网_python爬取智联招聘工作岗位信息
1 # coding:utf-8 2 # auth:xiaomozi 3 #date:2018.4.19 4 #爬取智联招聘职位信息 5 6 7 import urllib 8 from lxml i ...
Scrapy学习——爬取智联招聘网站案例
Scrapy学习--爬取智联招聘网站案例安装scrapy 下载安装准备分析代码结果安装scrapy 如果直接使用pip安装会在安装Twisted报错,所以我们需要手动安装. 下载安装s ...
Python爬虫爬取智联招聘职位信息
目的:输入要爬取的职位名称,五个意向城市,爬取智联招聘上的该信息,并打印进表格中 #coding:utf-8 import urllib2 import re import xlwtclass ZLZ ...
scrapy爬取智联招聘
我想分析下互联网行业全国招聘实习生的情况,通过爬取智联招聘,得到15467条数据,并导入Mysql 在items.py里: import scrapy from scrapy.http import ...
xpath爬取智联招聘--大数据开发职位并保存为csv
先上项目效果图: 本次爬取的URL为智联招聘的网址:https://www.zhaopin.com/ 首先先登录上去,为了保持我们代码的时效性,让每个人都能直接运行代码出结果,我们要获取到我们登录上去 ...
克服反爬虫机制爬取智联招聘网站
一.实验内容 1.爬取网站: 智联招聘网站(https://www.zhaopin.com/) 2.网站的反爬虫机制: 在我频繁爬取智联招聘网站之后,它会出现以下文字(尽管我已经控制了爬虫的爬 ...

使用BS4爬取智联招聘

代码使用了try，except来排除异常

使用BS4爬取智联招聘相关推荐

最新文章

热门文章