python describe函数_Python pandas.DataFrame.describe函数方法的使用

DataFrame.describe(self, percentiles=None, include=None, exclude=None)

生成描述性统计数据，总结数据集分布的集中趋势，分散和形状，不包括 NaN值。

分析数字和对象系列，以及DataFrame混合数据类型的列集。输出将根据提供的内容而有所不同。有关更多详细信息，请参阅以下注释。

参数：percentiles：列表类似数字，可选

要包含在输出中的百分位数。全部应该介于0和1之间。

默认值为，返回第25，第50和第75百分位数。[.25, .5, .75]

include ： 'all'，类似于dtypes或None(默认值)，可选

要包含在结果中的数据类型的白名单。被忽略了Series。

以下是选项：

'all'：输入的所有列都将包含在输出中。

类似于dtypes的列表：将结果限制为提供的数据类型。

将结果限制为数字类型提交numpy.number。

要将其限制为对象列，请提交numpy.object数据类型。

字符串也可以以select_dtypes

(例如df.describe(include=['O']))的方式使用。

要选择pandas分类列，请使用'category'

None (default) ：结果将包括所有数字列。

exclude ：类似于dtypes或None(默认值)，可选，

要从结果中省略的黑色数据类型列表。被忽略了Series。

以下是选项：

类似于dtypes的列表：从结果中排除提供的数据类型。

要排除数字类型提交numpy.number。要排除对象列，

请提交数据类型numpy.object。字符串也可以以select_dtypes

(例如df.describe(include=['O']))的方式使用。

要排除pandas分类列，请使用'category'

None (default)：结果将不包含任何内容。

返回：Series或DataFrame

提供的Series或Dataframe的摘要统计信息。

Notes

对于数字数据，则结果的指数将包括count， mean，std，min，max以及下，50和上百分。默认情况下，百分位数较低，百分位数25较高75。该50百分比是一样的中位数。

为对象的数据(例如字符串或时间戳)，则结果的指数将包括count，unique，top，和freq。这top 是最常见的价值。这freq是最常见的价值频率。时间戳还包括first和last项目。

如果多个对象值具有最高计数，那么将从具有最高计数的那些中任意选择count和top结果。

对于通过a提供的混合数据类型DataFrame，默认情况下仅返回数字列的分析。如果数据框仅包含没有任何数字列的对象和分类数据，则默认情况下将返回对象和分类列的分析。如果include='all'作为选项提供，则结果将包括每种类型的属性的并集。

包括和排除参数可以被用于限制其列在DataFrame被分析的输出。分析a时会忽略这些参数Series。

例子

描述数字Series>>> s = pd.Series([1, 2, 3])

>>> s.describe()

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

dtype: float64

描述一个分类Series>>> s = pd.Series(['a', 'a', 'b', 'c'])

>>> s.describe()

count 4

unique 3

top a

freq 2

dtype: object

描述时间戳Series>>> s = pd.Series([

... np.datetime64("2000-01-01"),

... np.datetime64("2010-01-01"),

... np.datetime64("2010-01-01")

... ])

>>> s.describe()

count 3

unique 2

top 2010-01-01 00:00:00

freq 2

first 2000-01-01 00:00:00

last 2010-01-01 00:00:00

dtype: object

描述一个DataFrame。默认情况下，仅返回数字字段>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),

... 'numeric': [1, 2, 3],

... 'object': ['a', 'b', 'c']

... })

>>> df.describe()

numeric

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

描述DataFrame不管数据类型的所有列>>> df.describe(include='all')

categorical numeric object

count 3 3.0 3

unique 3 NaN 3

top f NaN c

freq 1 NaN 1

mean NaN 2.0 NaN

std NaN 1.0 NaN

min NaN 1.0 NaN

25% NaN 1.5 NaN

50% NaN 2.0 NaN

75% NaN 2.5 NaN

max NaN 3.0 NaN

DataFrame通过将其作为属性访问来描述a中的列>>> df.numeric.describe()

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

Name: numeric, dtype: float64

在DataFrame描述中仅包括数字列>>> df.describe(include=[np.number])

numeric

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

在DataFrame描述中仅包括字符串列>>> df.describe(include=[np.object])

object

count 3

unique 3

top c

freq 1

仅包括DataFrame描述中的分类列>>> df.describe(include=['category'])

categorical

count 3

unique 3

top f

freq 1

从DataFrame描述中排除数字列>>> df.describe(exclude=[np.number])

categorical object

count 3 3

unique 3 3

top f c

freq 1 1

从DataFrame描述中排除对象列>>> df.describe(exclude=[np.object])

categorical numeric

count 3 3.0

unique 3 NaN

top f NaN

freq 1 NaN

mean NaN 2.0

std NaN 1.0

min NaN 1.0

25% NaN 1.5

50% NaN 2.0

75% NaN 2.5

max NaN 3.0

python describe函数_Python pandas.DataFrame.describe函数方法的使用相关推荐

python dataframe loc函数_python pandas.DataFrame.loc函数使用详解
官方函数 DataFrame.loc Access a group of rows and columns by label(s) or a boolean array. .loc[] is prim ...
python数据去重的函数_python pandas dataframe 去重函数的具体使用
今天笔者想对pandas中的行进行去重操作,找了好久,才找到相关的函数先看一个小例子 from pandas import Series, DataFrame data = DataFrame({' ...
python convert函数_Python pandas.DataFrame.tz_convert函数方法的使用
DataFrame.tz_convert(tz, axis=0, level=None, copy=True)[source] 将tz-aware axis转换为目标时区. 参数:tz:str或 tz ...
python resample函数_Python pandas.DataFrame.resample函数方法的使用
DataFrame.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=Non ...
python iloc函数_Python pandas.DataFrame.iloc函数方法的使用
DataFrame.iloc 纯粹基于整数位置的索引,用于按位置选择. .iloc[] 主要是基于整数位置(从轴的0到长度-1),但也可以与布尔数组一起使用. 允许的输入:整数, 例如, 5 整数的列 ...
python agg函数_Python pandas.DataFrame.agg函数方法的使用
DataFrame.agg(func, axis=0, *args, **kwargs) 使用指定axis上的一个或多个操作Aggregate. 参数:func: function, str, lis ...
python mul函数_Python pandas.DataFrame.mul函数方法的使用
DataFrame.mul(self, other, axis='columns', level=None, fill_value=None)DataFrame.multiply(self, othe ...
python replace函数_Python pandas.DataFrame.replace函数方法的使用
DataFrame.replace(self, to_replace=None, value=None, inplace=False, limit=None, regex=False, method= ...
python中cumsum函数_Python pandas.DataFrame.cumsum函数方法的使用
DataFrame.cumsum(self, axis=None, skipna=True, *args, **kwargs) 返回DataFrame或Series轴上的累计和. 返回包含累计和的相同 ...

python describe函数_Python pandas.DataFrame.describe函数方法的使用

python describe函数_Python pandas.DataFrame.describe函数方法的使用相关推荐

最新文章

热门文章