爬虫5-BeautifulSoup模块简解

1、html标记语言了解

<html>
<meta http-equiv="Content-Type"content="text/html;charset=utf-8">
<h1>我的祖国</h1>
<h1 align="center">我的祖国</h1>
# h1 标签
# align  属性
# center 属性值
<标签 属性="属性值">被标记的内容</标签>
<img src="xxx.jpg"/>
<a href="http://www.baidu.com">百度</a>
</html>

2、BeautifulSoup模块介绍

# 1.拿到页面源代码
# 2.使用bs4进行解析 拿到数据
import requests
from bs4 import BeautifulSoup
import csv
url = "http://www.xinfadi.com.cn/marketanalysis/0/list/1.shtml"
resp = requests.get(url)# # 解析数据
# # 1.把页面源代码交给BeautifulSoup进行处理 生成bs对象
# # page = BeautifulSoup(resp.text)
page = BeautifulSoup(resp.text, "html.parser")
# # 2.从bs对象中查找对象
# # find(标签名，属性=值)
# # find_all(标签名，属性=值)
table = page.find("table", class_="hq_table")  # class 是python中的关键字
# # table = page.find("table", attrs={"class": "hq_table"})  #等价于上一行 可以避免class
# print(table)
# 拿到所有数据行trs = table.find_all("tr")
trs = table.find_all("tr")[1:]
f = open("菜价.csv", mode="w",encoding='utf-8')
csvwriter = csv.writer(f)
for tr in trs:tds = tr.find_all("td")  # 拿到每行的tdprint(tds)name = tds[0].textlow = tds[1].textaverage = tds[2].texthigh = tds[3].textgui = tds[4].textkind = tds[5].textdate = tds[5].textprint(name, low, average, high, gui, kind, date)csvwriter.writerow([name, low, average, high, gui, kind, date])
f.close()
resp.close()

爬虫5-BeautifulSoup模块简解相关推荐

爬虫5-BeautifulSoup模块简解2
1.BeautifulSoup简解2 from bs4 import BeautifulSoup import re file = open("./baidu.html",'rb' ...
python3 beautifulsoup 模块详解_关于beautifulsoup模块的详细介绍
这篇文章主要给大家介绍了python中 Beautiful Soup 模块的搜索方法函数. 方法不同类型的过滤参数能够进行不同的过滤,得到想要的结果.文中介绍的非常详细,对大家具有一定的参考价值,需要 ...
python bs4模块_python爬虫之Beautifulsoup模块用法详解
什么是beautifulsoup: 是一个可以从HTML或XML文件中提取数据的Python库.它能够通过你喜欢的转换器实现惯用的文档导航,查找,修改文档的方式.(官方) beautifulsoup是 ...
浅谈Python爬虫之BeautifulSoup模块
目录前言安装库导入库解析文档示例提取数据示例 CSS选择器实例小项目总结前言 BeautifulSoup是主要以解析web网页的Python模块,它会提供一些强大的解释器,以解析网 ...
UE4 RHI与Render模块简解
UE4中的RHI指的是Render hardware interface,作用像Ogre里的RenderSystem,针对Dx11,Dx12,Opengl等等平台抽象出相同的接口,我们能方便能使用相同 ...
爬虫五 Beautifulsoup模块详细
一.基本使用 from bs4 import BeautifulSoup htmlCharset = "GB2312" soup=BeautifulSoup(html_doc,'l ...
Python爬虫笔记——BeautifulSoup模块
Target:学会用BeautifulSoup解析和提取网页中的数据. [解析数据]:把服务器返回来的HTML源代码翻译为我们能看懂的样子. [提取数据]:是指把我们需要的数据从众多数据中挑选出来. ...
[Python从零到壹] 五.网络爬虫之BeautifulSoup基础语法万字详解
欢迎大家来到"Python从零到壹",在这里我将分享约200篇Python系列文章,带大家一起去学习和玩耍,看看Python这个有趣的世界.所有文章都将结合案例.代码和作者的经验讲 ...
python爬虫beautifulsoup爬当当网_Python爬虫包 BeautifulSoup 递归抓取实例详解_python_脚本之家...
Python爬虫包 BeautifulSoup 递归抓取实例详解概要: 爬虫的主要目的就是为了沿着网络抓取需要的内容.它们的本质是一种递归的过程.它们首先需要获得网页的内容,然后分析页面内容并找到 ...

爬虫5-BeautifulSoup模块简解

1、html标记语言了解

2、BeautifulSoup模块介绍

爬虫5-BeautifulSoup模块简解相关推荐

最新文章

热门文章