F12开发者模式

打开谷歌浏览器，F12进入后观察network部分

如何找到network中我们需要的部分

如果是按F5刷新才出来的，一般在DOC里面
如果是点击按钮加载更多，请求在XHR里面

如何验证请求正确

复制页面上的字，在response里ctrl+F查找，能找到说明找对了位置

模拟发送

找到 url 方法以及 header

从 accept 到 user-agent 全都要，cookie 在程序中可以写空

首字母大写，逗号连接

requests_headers = {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3','Accept-encoding': 'gzip, deflate, br','Accept-language': 'zh-CN,zh;q=0.9','Cache-control':'max-age=0','Cookie': '','Referer': 'https://www.zhihu.com/search?q=vczh&type=content','Upgrade-insecure-requests': '1','User-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'
}url = 'https://www.zhihu.com/topic/20004648/hot'z = requests.get(url, headers=requests_headers)print(z.content)

headers反爬虫

headers = {'User - Agent': 'Mozilla / 5.0(Windows NT 6.1;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 73.0.3683.103Safari / 537.36'
}

response.status_code

如果数值是200说明请求被回应

etree 和 xpath

etree 能简单地从源码text中得到想要的内容
etree 来自模块 lxml

etree.HTML(text)

将字符串化的html文档text转换为html格式的文档

html.xpath 能获得某个标签的内容

html = etree.HTML(wb_data)
html_data = html.xpath('/html/body/p/ul/li/a')

参考

.xpath(‘div[contains(@class, “sline”)]’)

.xpath(‘div[contains(@class, “sline”)]’)只要包含sline就可以

参考代码

import requests
import pandas as pdfrom lxml import etreedef extract_first(selectors):if len(selectors) <= 0:return Nonereturn selectors[0]class EduScore:def __init__(self):self.score_url = 'http://www.eol.cn/html/g/fsx/index.shtml'def basic(self):response = requests.get(self.score_url)response.encoding = 'utf-8'if response.status_code != 200:raise Exception('http status code not 200')html = response.textselector = etree.HTML(html)cities = selector.xpath('//div[@class="fsshowli"]')items = []for city in cities:items.append({'code': extract_first(city.xpath('@id')),'name': extract_first(city.xpath('div[@class="topline"]/div[@class="city"]/text()'))})return pd.DataFrame(items)def scores(self, code, year):year = str(year) + '年'response = requests.get(self.score_url)if response.status_code != 200:raise Exception('http status code not 200')response.encoding = 'utf-8'html = response.textselector = etree.HTML(html)city = extract_first(selector.xpath('//div[@id="{}"]'.format(code)))if city is None:returns_line = extract_first(city.xpath('div[contains(@class, "sline")]'))t_line = extract_first(city.xpath('div[contains(@class, "tline")]'))if s_line is None or t_line is None:returnyears = []for x in s_line.xpath('div[contains(@class, "year")]'):y = extract_first(x.xpath('text()'))if y is None:continueyears.append(y)if str(year) not in years:return Noneindex = years.index(str(year))tables = t_line.xpath('div/table')if len(tables) < index + 1:returntable = tables[index]items = []for tr in table.xpath('tr'):items.append([extract_first(tr.xpath('td[1]/text()')).strip(),extract_first(tr.xpath('td[2]/text()')).strip(),extract_first(tr.xpath('td[3]/text()')).strip()])return pd.DataFrame(items[1:], columns=items[0])
if __name__ == '__main__':edu_score = EduScore()print(edu_score.basic())print(edu_score.scores('hub', 2014))

F12打开浏览器模式

模拟登陆知乎

python爬虫高考成绩相关推荐

python 爬虫爬取高考录取分数线信息
原文链接: python 爬虫爬取高考录取分数线信息上一篇: axios 原生上传xlsx文件下一篇: pandas 表格数据补全空值网页 https://gkcx.eol.cn/scho ...
2018python培训多年口碑_2018高考成绩不理想怎么办学python前途无量
2018年高考就这样匆匆而过,对于考生们无论考的好与否,都已成过去式,应该朝前看.对于高考成绩不理想的同学来讲,也不要气馁,现在条条大路通罗马,总有适合你的.就拿如今最火的人工智能来讲,无疑是当下最好 ...
python爬虫学校正方教务系统获取全部成绩
python爬虫正方教务系统许昌学院来自许昌学院的大四小菜鸡,疫情封在宿舍,闲来无事在宿舍爬了一下之前没成功的教务系统爬虫,当时觉得挺难的,今天发现这个还是挺简单的,这个程序参考了很多前辈大佬们的程 ...
高考成绩可以查询了，用Python爬取数据：看看哪些学校专业更受宠
前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理. 作者:Python头条今天各地的2020年高考成绩陆续可以查询了,考生的 ...
23个Python爬虫开源项目代码：爬取微信、淘宝、豆瓣、知乎、微博
今天为大家整理了32个Python爬虫项目.整理的原因是,爬虫入门简单快速,也非常适合新入门的小伙伴培养信心,所有链接指向GitHub. 1.WechatSogou – 微信公众号爬虫基于搜狗微信搜 ...
32个Python爬虫实战项目，满足你的项目慌（带源码）
学习Python爬虫的小伙伴想成为爬虫行业的大牛么? 你想在网页上爬取你想要的数据不费吹灰之力么? 那么亲爱的小伙伴们肯定需要项目实战去磨练自己的技术,毕竟没有谁能随随便便成功! 小编前段时间精心总结 ...
基于python爬虫数据处理_基于Python爬虫的校园数据获取
苏艺航徐海蛟何佳蕾杨振宇王佳鹏摘要:随着移动时代的到来,只适配了电脑网页.性能羸弱的校园教务系统,已经不能满足学生们的移动查询需求.为此,设计了一种基于网络爬虫的高实用性查询系統.它首先通过 ...
python爬虫——从此不用再愁找不到小说txt文件
python爬虫--从此不用再愁找不到小说txt文件最近在学习python,学了个大概就开始写爬虫了,之前做了个糗百的简单爬虫,然后底下还做了一些学校教务系统的爬虫,爬取了自己的成绩,看着挂科的大英 ...
Python爬虫实战（3）：计算大学本学期绩点
Python爬虫入门(1):综述 Python爬虫入门(2):爬虫基础了解 Python爬虫入门(3):Urllib库的基本使用 Python爬虫入门(4):Urllib库的高级用法 Python爬虫 ...

python爬虫高考成绩

文章目录