BeautitulSoup Html 解析器

import requestsfrom bs4 import BeautifulSoupurl = "http://www.baidu.com"
contents = requests.get(url).content
soup = BeautifulSoup(contents, 'lxml')
contents = soup.prettify()>>> soup.link
<link href="http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css" rel="stylesheet" type="text/css"/>
>>> soup.title
<title>\u767e\u5ea6\u4e00\u4e0b\uff0c\u4f60\u5c31\u77e5\u9053</title>
>>> soup.link.name
'link'
>>> soup.link.attrs
{'href': 'http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css', 'type': 'text/css', 'rel': ['stylesheet']}
>>> soup.link.attrs.get('type')
'text/css'
>>> soup.link.attrs['href']
'http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css'

>>> for node in soup.head.contents:
...     print(node)
...
...
<meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<meta content="always" name="referrer"/>
<link href="http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css" rel="stylesheet" type=""/>
<title>百度一下，你就知道</title>>>> for node in soup.head.contents:
...     if node.name == "meta":
...         print(node)
...     if node.name == "title":
...         print node.string
...
...
...
<meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<meta content="always" name="referrer"/>
百度一下，你就知道
>>> tags = soup.find_all(name='meta')
>>> tags
[<meta content="text/html;charset=unicode-escape" http-equiv="content-type"/>, <meta content="IE=Edge" http-equiv="X-UA-Compatibl
e"/>, <meta content="always" name="referrer"/>]>>> import re
>>> tags = soup.find_all(re.compile('^me.*'))
>>> tags
[<meta content="text/html;charset=unicode-escape" http-equiv="content-type"/>, <meta content="IE=Edge" http-equiv="X-UA-Compatibl
e"/>, <meta content="always" name="referrer"/>]
>>> for tag in tags:
...     print(tag)
...
...
<meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<meta content="always" name="referrer"/>
>>> tags = soup.find_all(re.compile('^me.*'), content="always")
>>> for tag in tags:
...     print(tag)
...
...
<meta content="always" name="referrer"/>

py3 BeautifulSoup 利器 html 解析器使用相关推荐

html5lib解析丢失span标签,xpath - 可以用Beautiful Soup的html5lib解析器替换Scrapy的默认lxml解析器吗？ - 堆栈内存溢出...
问题:有没有办法将BeautifulSoup的html5lib解析器集成到scrapy项目中,而不是scrapy的默认lxml解析器中? Scrapy的解析器在某些抓取页面上失败(对于某些元素). 每 ...
Python3.X 爬虫实战（静态下载器与解析器）
[工匠若水 http://blog.csdn.net/yanbober 未经允许严禁转载,请尊重作者劳动成果.私信联系我] 1 背景这两天比较忙,各种锅锅接,忙里偷闲完结这一篇吧.在我们在上一篇&l ...
python的网页解析器_网页解析器（BeautifulSoup）-- Python
分享一下关于 Python的网页解析器(BeautifulSoup) BeautifulSoup解析器为了实现解析器,可以选择使用正则表达式.html.parser.BeautifulSoup.lx ...
用 BeautifulSoup 解析器分析 RSS
用 BeautifulSoup 解析器分析简易信息聚合RSS 运行代码它是如何工作的分析地址遍历所有资讯安装解析器检查打印运行代码虽然今天传来不好的消息,京城新增数十例,但是调试程序的工 ...
BeautifulSoup中各种html解析器的比较及使用
Beautiful Soup解析器比较 ·Beautiful Soup支持各种html解析器,包括python自带的标准库,还有其他的许多第三方库模块.其中一个就是lxml parser,至于lxml ...
python中的网页解析器_python爬虫初探（三）：HTML解析器
爬虫初探系列一共4篇,耐心看完,我相信你就能基本了解爬虫是怎样工作的了,目录如下: 代码已上传至github,在python2.7下测试成功(请原谅我浓浓的乡村非主流代码风格)summerliehu/ ...
CSS 选择器：BeautifulSoup4解析器
和 lxml 一样,Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据. lxml 只会局部遍历,而Beautiful Soup 是基 ...
BeautfuiSoup4解析器
BeautifulSoup是一个HTML/XML的解析器,主要的功能是如何解析和提取HTML/XML的数据. 官方文档:http://beautifulsoup.readthedocs.io/zh_C ...
Python爬虫(十二)_BeautifulSoup4 解析器
CSS选择器:BeautifulSoup4 和lxml一样,Beautiful Soup也是一个HTML/XML的解析器,主要的功能也是如何解析和提取HTML/XML数据. lxml只会局部遍历,而B ...

py3 BeautifulSoup 利器 html 解析器使用

BeautitulSoup Html 解析器

py3 BeautifulSoup 利器 html 解析器使用相关推荐

最新文章

热门文章