python调用html数据_Python读取HTML页面

有一个类库叫作beautifulsoup。使用这个库，可以搜索html标签的值，并获取页面标题和页面标题列表等特定数据。

安装Beautifulsoup

使用Anaconda软件包管理器安装所需的软件包及其相关软件包。

conda install Beaustifulsoap

读取HTML文件

在下面的例子中，我们请求一个url被加载到python环境中。然后使用html parser参数来读取整个html文件。接下来，打印html页面的前几行。

import urllib2

from bs4 import BeautifulSoup

# Fetch the html file

import urllib3

from bs4 import BeautifulSoup

# Fetch the html file

http = urllib3.PoolManager()

response = http.request('GET','http://www.zyiz.net/python/features.html')

html_doc = response.data

# Parse the html file

soup = BeautifulSoup(html_doc, 'html.parser')

# Format the parsed html file

strhtm = soup.prettify()

# Print the first few characters

print (strhtm[:225])

当执行上面示例代码，得到以下输出结果 -

提取标记值

可以使用以下代码从标签的第一个实例中提取标签值。

import urllib3

from bs4 import BeautifulSoup

# Fetch the html file

http = urllib3.PoolManager()

response = http.request('GET','http://www.zyiz.net/python/features.html')

html_doc = response.data

# Parse the html file

soup = BeautifulSoup(html_doc, 'html.parser')

print (soup.title)

print(soup.title.string)

print(soup.a.string)

print(soup.b.string)

执行上面示例代码，得到以下结果 -

找一找教程网教程? - 专注于IT教程和实例

None

友情链接:

提取所有标签

可以使用以下代码从标签的所有实例中提取标签值。

import urllib3

from bs4 import BeautifulSoup

# Fetch the html file

http = urllib3.PoolManager()

response = http.request('GET','http://www.zyiz.net/python/features.html')

html_doc = response.data

# Parse the html file

soup = BeautifulSoup(html_doc, 'html.parser')

for x in soup.find_all('h1'):

print(x.string)

执行上面示例代码，得到以下结果 -

None

Python功能特点

python调用html数据_Python读取HTML页面相关推荐

python通信达数据_Python读取通达信数据
Python读取通达信数据一.介绍 python获取股票数据的方法很多,其中Tushare 财经数据接口包很好用,当然,也可以通过通达信本地的数据获取,这样更为方便. 日线数据存在这路径下 D:\通 ...
python 通达信数据_Python读取通达信本地数据
一.介绍 python获取股票数据的方法很多,其中 Tushare 财经数据接口包很好用,当然,也可以通过通达信本地的数据获取,这样更为方便. 日线数据存在这路径下 D:\通达信\vipdoc\sh\ ...
python处理mat数据_python读取.mat文件的数据及实例代码
首先导入scipy的包 from scipy.io import loadmat 然后读取 m = loadmat("F:/__identity/activity/论文/data/D001. ...
python读mat数据_python读取mat数据集
以http://ufldl.stanford.edu/housenumbers/上的mat数据集为例需要注意以下几点从mat提取出来的数据以字典的形式保存,所以需要提取字典的key和value i ...
python调用mysql数据_python使用mysql数据库(虫师)
转自虫师 http://www.cnblogs.com/fnng/p/3565912.html 一,安装mysql 如果是windows 用户,mysql 的安装非常简单,直接下载安装文件,双击安装文 ...
python提取数据库数据_Python读取xlsx并写入数据库
### 此程序是用来将表格的数据读入到数据库中 import xlrd import re import pymysql def read_xlsx(): workbook = xlrd.open_w ...
Python: 二进制字节流数据的读取操作 -- bytes 与 bitstring
Python: 二进制字节流数据的读取操作 – bytes 与 bitstring 最近项目有个需求,需要对二进制文件读取内容,操作读取到的字节流数据,主要是查找与切片获取内容.这要求有两个标志,一个 ...
python读取html文件中的表格数据_Python 读取各类文件格式的文本信息 | doc,excel,html,mht...
原标题:Python 读取各类文件格式的文本信息 | doc,excel,html,mht 众所周知,python最强大的地方在于,python社区汇总拥有丰富的第三方库,开源的特性,使得有越来越多的 ...
python读取表格数据_Python读取Excel数据并根据列名取值
一直想将自己接触到的东西梳理一遍,可就是迈不出第一步,希望从这篇总结开始不要再做行动的矮人了. 最近测试过程中需要用到python读取excel用例数据,于是去了解和学习了下xlrd库,这里只记录使用 ...

python调用html数据_Python读取HTML页面

python调用html数据_Python读取HTML页面相关推荐

最新文章

热门文章