【icourse163】学习python爬虫的代码整理

学习python爬虫的代码整理

代码思路等均参考课程：http://www.icourse163.org/learn/BIT-1001870001?tid=1002236011

1 对于一般的网页的抓取

def getHTML(url):try:response = requests.get(url,timeout = 30)response.raise_for_status() #如果状态不是200，则返回“产生异常”response.encoding = response.apparent_encodingreturn response.textexcept:return "产生异常"if __name__=="__main__":url = "https://item.jd.com/5181380.html"print(getHTML(url))

2 亚马逊的例子

由于亚马逊有来源审查功能，因此限制User-Agent字段为Python的访问,所以采用以下方法爬取亚马逊商品：

def getHTMLAmazon(url):try:kv = {'User-Agent':'Mozilla/5.0'}response = requests.get(url, headers = kv)response.raise_for_status()response.encoding = response.apparent_encodingreturn response.textexcept:return "爬取失败"   if __name__=="__main__":url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"print(getHTMLAmazon(url))

3 提供百度搜索的接口

def getBaiduso(keyword):try:kv ={"wd":keyword}response = requests.get("http://www.baidu.com/s", params = kv)print response.request.urlresponse.raise_for_status()return len(response.text)except:return "爬取失败"keyword = "Python"
print getBaiduso(keyword)

4 爬取图片并保存

def getPic(url,root):path = root +url.split('/')[-1]try:if not os.path.exists(root):os.mkdir(root)if not os.path.exists(path):response = requests.get(url)response.raise_for_status()with open(path,"wb") as f:f.write(response.content)f.close()print "文件已保存"else:print "文件已存在"except:print "爬取失败"url = "http://image.nationalgeographic.com.cn/2017/0526/20170526025441983.jpg"
root = "E://Python dir//"
getPic(url,root)

5 查询IP地址归属地

def getIP(ipaddress):try:kv ={"ip":ipaddress}response = requests.get("http://m.ip138.com/ip.asp", params = kv)print response.request.urlresponse.raise_for_status()return response.text[-500:]except:return "爬取失败"ipaddress = "202.204.80.112"
print getIP(ipaddress)

【icourse163】学习python爬虫的代码整理相关推荐

python基础知识整理-python爬虫基础知识点整理
首先爬虫是什么? 网络爬虫(又被称为网页蜘蛛,网络机器人,在FOAF社区中间,更经常的称为网页追逐者),是一种按照一定的规则,自动的抓取万维网信息的程序或者脚本. 根据我的经验,要学习Python爬虫 ...
python爬虫基础知识点整理
更多编程教程请到:菜鸟教程 https://www.piaodoo.com/ 友情链接: 高州阳光论坛https://www.hnthzk.com/ 人人影视http://www.sfkyty.com ...
学python需要学数据库吗-学习Python爬虫前，你必须知道的一些工具！
原标题:学习Python爬虫前,你必须知道的一些工具! 许多小伙伴在学习了一段时间的Python后,开始上手爬虫项目了,作为一个总算掌握了基础,开始向上进阶的Python小白,在做爬虫的时候肯定会遇到 ...
如何自学python爬虫-小白如何快速学习Python爬虫？
原标题:小白如何快速学习Python爬虫? 很多同学想学习爬虫 ,对于小白来说,爬虫可能是一件非常复杂.技术门槛很高的事情.而且爬虫是入门 Python 最好的方式,没有之一. 我们可以通过爬虫获取 ...
python3爬虫项目代码_三个python爬虫项目实例代码
这篇文章主要介绍了三个python爬虫项目实例代码,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下爬取内涵段子: #encoding=utf-8 i ...
python爬取知乎话题广场_学习python爬虫---爬虫实践：爬取B站排行榜2（爬取全部分类排行榜、利用pygal库作图）...
前面我们爬取了B站上全站的排行榜,详细见:魏勇:学习python爬虫---爬虫实践:爬取B站排行榜zhuanlan.zhihu.com 一.爬取全部分类排行榜我们观察一下B站排行榜,那里还有番剧排 ...
学习 Python 爬虫，手把手通过 Python 入门爬取网页信息
Python 爬虫是什么? 我们在网络上收集资料的过程其实就称之为爬虫(web scraping).复制粘贴歌词.摘抄文本或数据都可以算作爬虫的一部分,但网络编程背景下的爬虫,更强调自动化,通过 Py ...
如何自学python爬虫-怎样入门学习Python爬虫？
怎样入门学习Python爬虫? 1.掌握Python编程能基础想要学习爬虫,首先要充分掌握Python编程技术相关的基础知识.爬虫其实就是遵循一定的规则获取数据的过程,所以在学习Python知识的过 ...
python爬虫入门 - 代码、案例集合
python爬虫入门 - 代码.案例集合资源案例 · 统计 · 如下: 10个Python爬虫入门实例以上就是关于"python爬虫入门 - 代码.案例集合"的全部内容.
day19 学习python爬虫——requests和bs4
day19 学习python爬虫--requests和bs4 一.requests使用详解 import requests 1.发送请求 requests.get(请求地址) - 直接获取请求地址对应 ...

【icourse163】学习python爬虫的代码整理

学习python爬虫的代码整理

1 对于一般的网页的抓取

2 亚马逊的例子

3 提供百度搜索的接口

4 爬取图片并保存

5 查询IP地址归属地

【icourse163】学习python爬虫的代码整理相关推荐

最新文章

热门文章