BeautifulSoup的初使用！

简单使用：

python小例子链接：

https://python123.io/ws/demo.html

代码：

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
print(r.text)
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
print(soup)
print(soup.prettify())

结果：

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.py
<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>
</body></html>
<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
</body></html>
<html><head><title>This is a python demo page</title></head><body><p class="title"><b>The demo python introduces several python courses.</b></p><p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>and<a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p></body>
</html>Process finished with exit code 0

查看tag爸爸以及爷爷的标签名字：

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
print("\n")
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
tag_a = soup.a
print(soup.a.parent.name)#查看其父亲的名字！
print("\n")
print(soup.a.parent.parent.name)#查看其父亲的父亲的名字！

结果：

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.pypbodyProcess finished with exit code 0

转换为字典之后，获取对应的值：

代码：

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
print("\n")
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
print(soup.a)#soup.tag  tag就是你想要查看的标签类型！仅仅显示带有<a></a>标签的信息！
tag_a = soup.a
print("\n")
print(tag_a.attrs)#attrs:属性的意思
print("\n")
print(tag_a.attrs['id'])#获取href对应的值。
print("\n")
print(tag_a.attrs['href'])#获取href对应的值。
print("\n")

结果：

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.py<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>{'href': 'http://www.icourse163.org/course/BIT-268001', 'class': ['py1'], 'id': 'link1'}link1http://www.icourse163.org/course/BIT-268001Process finished with exit code 0

HTML查看除网页标签之外字符串的方法：

代码：

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
print("\n")
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
print(soup.a)#soup.tag  tag就是你想要查看的标签类型！仅仅显示带有<a></a>标签的信息！
tag_a = soup.a
print("\n")
print(soup.a.string)
print("\n")
print(soup.p)
print("\n")
print(soup.p.string)

结果：

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.py<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>Basic Python<p class="title"><b>The demo python introduces several python courses.</b></p>The demo python introduces several python courses.Process finished with exit code 0

BeautifulSoup的初使用！相关推荐

beautifulsoup关于标签的初学习
代码: import requests from bs4 import BeautifulSoup r = requests.get("https://python123.io/ws/dem ...
1.1python初入网络爬虫-网络连接和BeautifulSoup库的使用
目录: 一,网络连接 1.网络连接的过程 2.python实现的网络连接功能 3.拓展: 二,BeautifulSoup简介 1.安装BeautifulSoup库 2.运行BeautifulSoup库 ...
Spark系列-初体验（数据准备篇）
Spark系列-初体验(数据准备篇) Spark系列-核心概念在Spark体验开始前需要准备环境和数据,环境的准备可以自己按照Spark官方文档安装.笔者选择使用CDH集群安装,可以参考笔者之前的文 ...
xhr get获取文件流下载文件_python爬虫实战——豆瓣电影get初体验
影评许可证公众号[2019]第22期本栏目由"数据皮皮侠"独家呈献专场 python爬虫实战--豆瓣电影get初体验 2019.10.28 / 早上7点场 / 免费本期&q ...
从入门到入土：python爬虫|scrapy初体验|安装教程|爬取豆瓣电影短评相关信息（昵称，内容，时间和评分）
此博客仅用于记录个人学习进度,学识浅薄,若有错误观点欢迎评论区指出.欢迎各位前来交流.(部分材料来源网络,若有侵权,立即删除) 本人博客所有文章纯属学习之用,不涉及商业利益.不合适引用,自当删除! 若 ...
第7课： bs4 库的 BeautifulSoup 基础学习
这里写目录标题本节课内容所需要安装的库: BeautifulSoup 简介: lxml 简介: requests ,BeautifulSoup 和 lxml 相互三者关系: 如何利用 bs4 的 ...
BeautifulSoup爬取博客实例
BeautifulSoup爬取博客实例爬取对象はてなブックマーク博客(日本网站) 用for循环爬取每个类别博客的前两页博客使用python BeautifulSoup库第一步: 爬取所有类别的文 ...
爬虫选手初养成Day1 | 影评数据爬虫及情感分析
爬虫选手初养成Day1 爬取影评数据网络爬虫踩坑指南影评数据应用之情感分析数据预处理词向量嵌入模型训练结果测试踩坑指南众所周知,Python的爬虫是一个很好用的数据工具,但是学校课程 ...
06—小白学Python爬虫之BeautifulSoup入门与应用(以糗百为例)
之前介绍了通过正则和xpath来解析HTML文本,本篇将会介绍一种全新的方式BeautifulSoup来解析HTML,相对前两种使用更简单,那么,在介绍之前,先对这三种方式做一个简单的对比. 抓取方式 ...

BeautifulSoup的初使用！

BeautifulSoup的初使用！相关推荐

最新文章

热门文章