从0开始学爬虫7之BeautifulSoup模块的简单介绍

参考文档：

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

# 安装 beautifulsoup4

(pytools) D:\python\pytools>pip install beautifulsoup4

# coding=utf-8from bs4 import BeautifulSoup as bs
import rehtml_doc = """
<html><head><title>The Dormouse's story</title></head><p class="title"><b>The Dormouse's story</b></p><p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p><p class="story">...</p>
"""
soup = bs(html_doc, "html.parser")# print(soup.prettify())# print(soup.title.string)
# print(soup.a)
# print(soup.find(id='link2'))# print(soup.find(id='link2').string)# print(soup.find(id='link2').get_text())# 获取所有a标签的内容
# for link in soup.find_all('a'):
#     print(link.get_text())# 使用string获取不到p标签的内容
# print(soup.find("p", {"class": "story"}).get_text())# 查找所有以b开头的标签
# for tag in soup.find_all(re.compile("^b")):
#     print(tag.get_text())# 查找所有的a标签中href以 http://example.com开头的数据，其中的.号代表任意字符，如果需要 表示点号本身可以用\转义   href=re.compile(r"^http://example\.com/")
data = soup.find_all("a", href=re.compile(r"^http://example.com/"))
print(data)

从0开始学爬虫7之BeautifulSoup模块的简单介绍相关推荐

从0开始学爬虫6比价工具开发1之爬取当当、京东的数据
从0开始学爬虫6比价工具开发1之爬取当当.京东的数据爬取当当数据 spider_dangdang.py #coding=utf-8import requests from lxml import h ...
【0基础学爬虫】爬虫基础之抓包工具的使用
大数据时代,各行各业对数据采集的需求日益增多,网络爬虫的运用也更为广泛,越来越多的人开始学习网络爬虫这项技术,K哥爬虫此前已经推出不少爬虫进阶.逆向相关文章,为实现从易到难全方位覆盖,特设[0基础学爬 ...
一木.溪桥学爬虫-03：请求模块urllib、 urllib.request、urllib.parse.urlencode、urllib.parse.quote(str)、.unquote()
一木.溪桥在Logic Education跟Jerry学爬虫 07期:Python 爬虫一木.溪桥学爬虫-03:请求模块urllib. urllib.request.urllib.parse.ur ...
一木.溪桥学爬虫-04：requests模块
一木.溪桥在Logic Education跟Jerry学爬虫 07期:Python 爬虫一木.溪桥学爬虫-04:requests模块.cookie.session 日期:2021年1月31日学习 ...
【0基础学爬虫】爬虫基础之爬虫的基本介绍
大数据时代,各行各业对数据采集的需求日益增多,网络爬虫的运用也更为广泛,越来越多的人开始学习网络爬虫这项技术,本期为爬虫的基本介绍. 分享一些自己的爬虫项目,学习爬虫一些经验很不错基于python实 ...
python爬虫之使用BeautifulSoup模块抓取500彩票网竞彩足球赛果及赔率
目录前言分析思路数据储存代码结果展示结语前言竞彩足球是目前比较受欢迎的一种体彩彩种,玩法较为灵活多样,赔率可观,今天就来记录一下如何抓取竞彩足球的开奖信息和赔率. 分析思路我使用的网 ...
python3 beautifulsoup 模块详解_关于beautifulsoup模块的详细介绍
这篇文章主要给大家介绍了python中 Beautiful Soup 模块的搜索方法函数. 方法不同类型的过滤参数能够进行不同的过滤,得到想要的结果.文中介绍的非常详细,对大家具有一定的参考价值,需要 ...
从入门到入土：Python爬虫学习|Selenium自动化模块学习|简单入门|轻松上手|自动操作浏览器进行处理|chrome|PART01
此博客仅用于记录个人学习进度,学识浅薄,若有错误观点欢迎评论区指出.欢迎各位前来交流.(部分材料来源网络,若有侵权,立即删除) 本人博客所有文章纯属学习之用,不涉及商业利益.不合适引用,自当删除! 若 ...
从0开始学爬虫6比价工具开发2之图书信息汇总
当当的数据 spider_dangdang.py # coding=utf-8 import requests from lxml import html def spider(sn, book_li ...

从0开始学爬虫7之BeautifulSoup模块的简单介绍

从0开始学爬虫7之BeautifulSoup模块的简单介绍相关推荐

最新文章

热门文章