python soup提取叶子标签_使用Python爬虫库BeautifulSoup遍历文档树并对标签进行操作详解（新手必学）...

为大家介绍下Python爬虫库BeautifulSoup遍历文档树并对标签进行操作的详细方法与函数

下面就是使用Python爬虫库BeautifulSoup对文档树进行遍历并对标签进行操作的实例，都是最基础的内容

需要代码的同学可以添加群624440745

不懂的问题有老司机解决里面还有最新Python教程项目可拿,，一起相互监督共同进步！

html_doc = """

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie,

Lacie and

Tillie;

and they lived at the bottom of a well.

...

"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc,'lxml')

一、子节点

一个Tag可能包含多个字符串或者其他Tag，这些都是这个Tag的子节点.BeautifulSoup提供了许多操作和遍历子结点的属性。

1.通过Tag的名字来获得Tag

print(soup.head)

print(soup.title)

The Dormouse's storyThe Dormouse's story

通过名字的方法只能获得第一个Tag，如果要获得所有的某种Tag可以使用find_all方法

soup.find_all('a')

[Elsie,

Lacie,

Tillie]

2.contents属性：将Tag的子节点通过列表的方式返回

head_tag = soup.head

head_tag.contents

[

The Dormouse's story]

title_tag = head_tag.contents[0]

title_tag

The Dormouse's story

title_tag.contents

["The Dormouse's story"]

3.children：通过该属性对子节点进行循环

for child in title_tag.children:

print(child)

The Dormouse's story

4.descendants：不论是contents还是children都是返回直接子节点，而descendants对所有tag的子孙节点进行递归循环

for child in head_tag.children:

print(child)

```bash

```

for child in head_tag.descendants:

print(child)

The Dormouse's story

5.string 如果tag只有一个NavigableString类型的子节点，那么tag可以使用.string得到该子节点

title_tag.string

"The Dormouse's story"

如果一个tag只有一个子节点，那么使用.string可以获得其唯一子结点的NavigableString.

head_tag.string

如果tag有多个子节点，tag无法确定.string对应的是那个子结点的内容，故返回None

print(soup.html.string)

None

6.strings和stripped_strings

如果tag包含多个字符串，可以使用.strings循环获取

for string in soup.strings:

print(string)

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

;

and they lived at the bottom of a well.

...

.string输出的内容包含了许多空格和空行，使用strpped_strings去除这些空白内容

for string in soup.stripped_strings:

print(string)

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

;

and they lived at the bottom of a well.

...

二、父节点

1.parent：获得某个元素的父节点

title_tag = soup.title

title_tag.parent

The Dormouse's story

字符串也有父节点

title_tag.string.parent

The Dormouse's story

2.parents：递归的获得所有父辈节点

link = soup.a

for parent in link.parents:

if parent is None:

print(parent)

else:

print(parent.name)

body

html

[document]

三、兄弟结点

sibling_soup = BeautifulSoup("text1text2",'lxml')

print(sibling_soup.prettify())

text1

text2

1.next_sibling和previous_sibling

sibling_soup.b.next_sibling

text2

sibling_soup.c.previous_sibling

text1

在实际文档中.next_sibling和previous_sibling通常是字符串或者空白符

soup.find_all('a')

[Elsie,

Lacie,

Tillie]

soup.a.next_sibling # 第一个的next_sibling是,\n

```bash

‘,\n’

```bash

soup.a.next_sibling.next_sibling

Lacie

2.next_siblings和previous_siblings

for sibling in soup.a.next_siblings:

print(repr(sibling))

',\n'

Lacie

' and\n'

Tillie

';\nand they lived at the bottom of a well.'

for sibling in soup.find(id="link3").previous_siblings:

print(repr(sibling))

' and\n'

Lacie

',\n'

Elsie

'Once upon a time there were three little sisters; and their names were\n'

四、回退与前进

1.next_element和previous_element

指向下一个或者前一个被解析的对象(字符串或tag)，即深度优先遍历的后序节点和前序节点

last_a_tag = soup.find("a", id="link3")

print(last_a_tag.next_sibling)

print(last_a_tag.next_element)

;

and they lived at the bottom of a well.

Tillie

last_a_tag.previous_element

' and\n'

2.next_elements和previous_elements

通过.next_elements和previous_elements可以向前或向后访问文档的解析内容，就好像文档正在被解析一样

for element in last_a_tag.next_elements:

print(repr(element))

'Tillie'

';\nand they lived at the bottom of a well.'

'\n'

...

'...'

'\n'

————————————————

注意：很多人学Python过程中会遇到各种烦恼问题解决不了。为此小编建了个Python全栈免费答疑交流.裙：624440745，不懂的问题有老司机解决里面还有最新Python教程项目可拿,，一起相互监督共同进步！

本文的文字及图片来源于网络加上自己的想法,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理。

python soup提取叶子标签_使用Python爬虫库BeautifulSoup遍历文档树并对标签进行操作详解（新手必学）...相关推荐

python爬虫：BeautifulSoup_遍历文档树
前提.回顾 1.因为最近工作中都是在跟XML格式的报文打交道:主要就是XML报文的解析.入库.在做自动化时,需要解析XML报文,前面虽然学习过下BeautifulSoup,结果这次在写脚本时,突然发现 ...
python代码案例详解-新手必学Python爬虫之Scrapy框架案例详解
Scrapy简介 Scrapy是用纯Python实现一个为了爬取网站数据.提取结构性数据而编写的应用框架,用途非常广泛. 框架的力量,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页内 ...
python tag对象下有多个标签、属性_Python爬虫库BeautifulSoup获取对象(标签)名,属性,内容,注释...
Apple iPhone 11 (A2223) 128GB 黑色移动联通电信4G手机双卡双待 4999元包邮去购买 > 如何利用Python爬虫库BeautifulSoup获取对象(标签) ...
python获取标签属性值_Python爬虫库BeautifulSoup获取对象(标签)名,属性,内容,注释
更多python教程请到: 菜鸟教程www.piaodoo.com 人人影视www.sfkyty.com 16影视www.591319.com 星辰影院www.591319.com 一.Tag(标签) ...
python soup提取叶子标签_python3用BeautifulSoup抓取div标签
#-*- coding:utf-8 -*-#python 2.7#XiaoDeng#http://tieba.baidu.com/p/2460150866#标签操作 from bs4 importBe ...
python工具安装教程_Python 开发工具PyCharm安装教程图文详解(新手必看)
一.概念理解1.json.dumps()和json.loads()是json格式处理函数(可以这么理解,json是字符串) (1)json.dumps()函数是将一个Python数据类型列表进行jso ...
python soup提取叶子标签_python 利用beautifulSoup提取页面多个标签的文本内容
初学beautifulsoup解析库,拿一个招聘网页练手,想达到提取多个标签的文本内容,但是目前只可以提取到单个标签的单个文本内容,多标签的文本如何提取? from requests.exceptio ...
python正则表达式提取字符串密码_用python正则表达式提取字符串
在日常工作中经常遇见在文本中提取特定位置字符串的需求.python的正则性能好,很适合做这类字符串的提取,这里讲一下提取的技巧,正则表达式的基础知识就不说了,有兴趣的可以看re的教程. 提取一般分两种 ...
python数据提取和合并_用Python提取和合并Excel数据
我有一个Excel(.xlsx)文件,大约有40个工作表.每个工作表具有相同的结构,但包含不同的数据.我想从每张表格中提取信息,并将其合并到一张表格中,每张表格中的信息一张叠一张地叠在一起.我需要从每 ...
python如何提取图片特征向量_在python中计算图像的特征向量
我正在尝试将二维高斯拟合到图像中.噪声很低,所以我试图旋转图像,使两个主轴不同时变化,算出最大值,然后计算两个维度的标准偏差.选择的武器是Python. . 然而,我一直在寻找图像的特征向量--num ...

python soup提取叶子标签_使用Python爬虫库BeautifulSoup遍历文档树并对标签进行操作详解（新手必学）...

python soup提取叶子标签_使用Python爬虫库BeautifulSoup遍历文档树并对标签进行操作详解（新手必学）...相关推荐

最新文章

热门文章