
GroupBy对象可以通过pandas.DataFrame.groupby(), pandas.Series.groupby()来创建。

GroupBy = DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)[source]
参数 描述
by mapping, function, str, or iterable
axis int, default 0
level int, level name, or sequence of such, default None(复合索引的时候指定索引层级)
as_index boolean, default True(by列当成索引)
sort boolean, default True(排序)
group_keys boolean, default True(?)
squeeze boolean, default False(?)


import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : [1, 1, 2, 2,1, 2, 2, 2],'B' : [15,14,15,12,13,14,15,16]})
1  42
2  72df.groupby("A", as_index=False).sum()
df.groupby("A").sum().reset_index() # 和as_index=False等效
out:A   B
0  1  42
1  2  72



属性 描述
GroupBy.iter() Groupby iterator
GroupBy.groups dict {group name -> group labels}
GroupBy.indices dict {group name -> group indices}
GroupBy.get_group(name[, obj]) Constructs NDFrame from group with provided name
Grouper([key, level, freq, axis, sort]) A Grouper allows the user to specify a groupby instruction for a target

函数应用(Function application)

  1. GroupBy.apply(func, *args, **kwargs)apply函数是对迭代对象每个小数据框进行作用,可以调用dataframe的所有方法
  2. GroupBy.aggregate(func, *args, **kwargs)聚合函数可以传入np.sum或者"sum"等聚合参数,在描述统计中的函数,其实都是在调用agg(简写形式)函数
  3. GroupBy.transform(func, *args, **kwargs)
  4. filter



Function Describe
GroupBy.sum() 计算每组的和
GroupBy.ohlc() Compute sum of values, excluding missing values
GroupBy.cumcount([ascending]) Number each item in each group from 0 to the length of that group - 1.
GroupBy.mean(*args, **kwargs) 均值,不包含缺失值
GroupBy.prod() Compute prod of group values
GroupBy.var([ddof]) 方差,不包含缺失值
GroupBy.std([ddof]) 标准差,不包含缺失值
GroupBy.sem([ddof]) 标准误,不包含缺失值
GroupBy.size() 组大小
GroupBy.count() 组元素个数,不包含缺失值
GroupBy.max() 组最大值
GroupBy.min() 组最小值
GroupBy.median() 组中间值
GroupBy.first() Compute first of group values
GroupBy.head([n]) Returns first n rows of each group.
GroupBy.last() Compute last of group values
GroupBy.tail([n]) Returns last n rows of each group
GroupBy.nth(n[, dropna]) 每组第n条数据


Function Describe
DataFrameGroupBy.agg(arg,?*args,?**kwargs) Aggregate using input function or dict of {column ->
DataFrameGroupBy.all([axis,?bool_only,?..]) Return whether all elements are True over requested axis
DataFrameGroupBy.any([axis,?bool_only,?..]) Return whether any element is True over requested axis
DataFrameGroupBy.bfill([limit]) Backward fill the values
DataFrameGroupBy.corr([method,?min_periods]) Compute pairwise correlation of columns, excluding NA/null values
DataFrameGroupBy.count() Compute count of group, excluding missing values
DataFrameGroupBy.cov([min_periods]) Compute pairwise covariance of columns, excluding NA/null values
DataFrameGroupBy.cummax([axis,?skipna]) Return cumulative max over requested axis.
DataFrameGroupBy.cummin([axis,?skipna]) Return cumulative minimum over requested axis.
DataFrameGroupBy.cumprod([axis]) Cumulative product for each group
DataFrameGroupBy.cumsum([axis]) Cumulative sum for each group
DataFrameGroupBy.describe([percentiles,?..]) Generate various summary statistics, excluding NaN values.
DataFrameGroupBy.diff([periods,?axis]) 1st discrete difference of object
DataFrameGroupBy.ffill([limit]) Forward fill the values
DataFrameGroupBy.fillna([value,?method,?..]) Fill NA/NaN values using the specified method
DataFrameGroupBy.hist(data[,?column,?by,?..]) Draw histogram of the DataFrame’s series using matplotlib / pylab.
DataFrameGroupBy.idxmax([axis,?skipna]) Return index of first occurrence of maximum over requested axis.
DataFrameGroupBy.idxmin([axis,?skipna]) Return index of first occurrence of minimum over requested axis.
DataFrameGroupBy.mad([axis,?skipna,?level]) Return the mean absolute deviation of the values for the requested axis
DataFrameGroupBy.pct_change([periods,?..]) Percent change over given number of periods.
DataFrameGroupBy.plot Class implementing the .plot attribute for groupby objects
DataFrameGroupBy.quantile([q,?axis,?..]) Return values at the given quantile over requested axis, a la numpy.percentile.
DataFrameGroupBy.rank([axis,?method,?..]) Compute numerical data ranks (1 through n) along axis.
DataFrameGroupBy.resample(rule,?*args,?**kwargs) Provide resampling when using a TimeGrouper
DataFrameGroupBy.shift([periods,?freq,?axis]) Shift each group by periods observations
DataFrameGroupBy.size() Compute group sizes
DataFrameGroupBy.skew([axis,?skipna,?level,?..]) Return unbiased skew over requested axis
DataFrameGroupBy.take(indices[,?axis,?..]) Analogous to ndarray.take
DataFrameGroupBy.tshift([periods,?freq,?axis]) Shift the time index, using the index’s frequency if available.


Function Describe
SeriesGroupBy.nlargest(*args,?**kwargs) Return the largest?n?elements.
SeriesGroupBy.nsmallest(*args,?**kwargs) Return the smallest?n?elements.
SeriesGroupBy.nunique([dropna]) Returns number of unique elements in the group
SeriesGroupBy.unique() Return np.ndarray of unique values in the object.


Function Describe
DataFrameGroupBy.corrwith(other[,?axis,?drop]) Compute pairwise correlation between rows or columns of two DataFrame objects.
DataFrameGroupBy.boxplot(grouped[,?..]) Make box plots from DataFrameGroupBy data.

  1. Pandas GroupBy对象 索引与迭代

    

  2. python画熊猫代码_python – 使用子图和循环绘制Pandas groupby组

    

  Hadley Wickham(许多热门R语言包的作者)创造了一个用于表示分组运算的术语"split-apply-combine"(拆分-应用-合并),这个词很好的描述了整个过程.分 ...

    

  4. Pandas GroupBy 深度总结

    

  5. 【Python】Pandas GroupBy 深度总结

    

  6. 使用pandas GroupBy获取每个组的统计信息(例如计数,均值等)?

    

  7. pandas python groupby_python – pandas groupby方法实际上是如何工作的?

    

  8. itertools.groupby与pandas.groupby的异同

    

  9. pandas—groupby如何得到分组里的数据

    pandas-groupby如何得到分组里的数据 有的时候csv文件过大,利用循环时间消耗大,因此可以通过分组. 原数据如下: 想把link和future特征为基准,把current整合起来放在一列. ...


