dataframe 中的多层索引

1、生成两层行索引、列索引的样本数据

1）生成 DataFrame

import pandas as pd
import numpy as nppd.set_option('display.max_columns', 1000)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', 1000)df = pd.DataFrame(np.random.randint(80,150,size=(9, 9)),columns=pd.MultiIndex.from_product([['last year', 'this year', 'next year'],['price', 'height', 'weight']]), index=pd.MultiIndex.from_product([['xiaoming', 'lili', 'xiaohong'],['chicken', 'dog', 'cat']]))
df

Out[67]:last year               this year               next year              price height weight     price height weight     price height weight
xiaoming chicken       148    137    121       119     95     98        90    118    110dog            88    143    117       100     86     95        82    122    142cat           120     93    145       137    148    136       110     99     88
lili     chicken       146     86     90        91    116    134       124    116     94dog           103    144    141       131    104    108        90     87    121cat           120    119     88       102    129    113       131    118     98
xiaohong chicken       132    146    103       128     98    143       126     81    136dog           129     92     99       103     84    116        99    100     85cat           131    125    129       146    104    119       135    115    117

2）生成 Series

ser = pd.Series(np.random.randint(0,150,size=6),index=pd.MultiIndex.from_product([['xiaoming', 'lili'],['chicken', 'dog', 'cat']]))
ser

Out[23]:
xiaoming  chicken    135dog         94cat         74
lili      chicken     24dog        142cat          4
dtype: int32

2、索引使用总体原则

1）在进行多重索引操作之前要对索引进行排序

df2 = df.sort_index()
df2

Out[16]: last year               this year               next year              price height weight     price height weight     price height weight
lili     cat           120    119     88       102    129    113       131    118     98chicken       146     86     90        91    116    134       124    116     94dog           103    144    141       131    104    108        90     87    121
xiaohong cat           131    125    129       146    104    119       135    115    117chicken       132    146    103       128     98    143       126     81    136dog           129     92     99       103     84    116        99    100     85
xiaoming cat           120     93    145       137    148    136       110     99     88chicken       148    137    121       119     95     98        90    118    110dog            88    143    117       100     86     95        82    122    142

2）索引的顺序一般是先外层，再内层

3、索引与切片 —— Series

1）显式

从最外层开始索引, 不能直接索引最内侧层索引。

ser['lili', 'dog']

Out[27]: 142

或者

ser.loc['lili', :]

Out[28]:
chicken     24
dog        142
cat          4
dtype: int32

2）隐式

不区分层级索引。

ser.iloc[[1,3,4]]

Out[29]:
xiaoming  dog         94
lili      chicken     24dog        142
dtype: int32

4、索引与切片 —— DataFrame

1）列索引：直接用列名索引

# 以下几种用法，效果相同
df['last year']['price']df['last year', 'price']df[('last year', 'price')]df.iloc[:, 0]

Out[30]:
lili      cat        120chicken    146dog        103
xiaohong  cat        131chicken    132dog        129
xiaoming  cat        120chicken    148dog         88
Name: (last year, price), dtype: int32

2）列切片

# 显式
df.loc[:, 'this year']# 隐式
df.iloc[:, 3:6]

Out[45]:this year              price height weight
lili     cat           109    138     91chicken       106    111    103dog           119    106     84
xiaohong cat           112    125    135chicken       119     85    129dog           114     90    102
xiaoming cat           111    117     89chicken        95     99    113dog           135     90    136

3）行索引

#外层[ ] 表示返回原数据类型（df），否则返回 series
# 显式
df.loc[[('lili', 'dog')]]# 隐式
df.iloc[[0]]

Out[31]:last year               this year              price height weight     price height weight
lili dog       140    147     92       135     92     94

4）行切片

df.loc['lili':'xiaohong']

Out[32]:last year               this year               next year              price height weight     price height weight     price height weight
lili     cat           120    121    106       109    138     91        85    111    114chicken       117    124     93       106    111    103       133    115    140dog           107    112    141       119    106     84       138    119     93
xiaohong cat           102     93     80       112    125    135       101    115     94chicken        83    107     86       119     85    129        85    127    139dog           116    110    103       114     90    102        90    130    117

5、pd.IndexSlice 的用法

多层索引的切片，跟单层索引的不大一样，比如：

In[45]:  df.loc[[:, 'dog'], [:, 'price']]File "<ipython-input-117-9275508ae997>", line 1df.loc[[:, 'dog'], [:, 'price']]^
SyntaxError: invalid syntax

此时需要用 IndexSlice 实现单层索引的使用习惯。

In[46]:
idx = pd.IndexSlice
df.loc[idx[:, 'dog'], idx[:, 'price']]Out[46]:last year this year next yearprice     price     price
lili     dog       107       119       138
xiaohong dog       116       114        90
xiaoming dog       115       135        83

6、pd.xs() 的索引与切片

优点：pd.xs() 能跳过最外层索引，直接从指定层按索引取数据。
缺点
1）不能通过它进行值的设定。
2）同一级别只能索引单值。

pd.xs() 的语法是

DataFrame.xs(key, axis=0, level=None, drop_level=True)其中：
key : label 或 tuple 类型的 label
axis : {0 或 ‘index’, 1 或 ‘columns’}, 默认 0
level : 索引所在的层级，默认为前n层(n=1或len(key))，如果 key 部分包含在多索引中，请指示在哪个层级上使用。层级可以通过 label 或 position来引用。
drop_level : bool, 默认True。如果为False，返回与自己级别相同的对象。返回：
在原始 Series 或者 DataFrame 中按指定索引得到的横截面数据 （也是 Series 或者 DataFrame 类型）

# 基本使用方法
In[137]: df.xs(('xiaoming', 'cat))Out[137]:
last year  price     115height    134weight    100
this year  price     111height    117weight     89
next year  price     133height     85weight     83
Name: (xiaoming, cat), dtype: int32

# 同一级别只能索引单值，索引多值会报错
In[46]: df.xs(('xiaoming', 'lili'))KeyError: ('xiaoming', 'lili')

# 取出所有行索引含 'cat' 的数据
In[47]: df.xs('cat', axis=0, level=1)Out[47]:last year               this year               next year              price height weight     price height weight     price height weight
lili           107    112    141       119    106     84       138    119     93
xiaohong       116    110    103       114     90    102        90    130    117
xiaoming       115    140    121       135     90    136        83     88    127

# 取出所有行索引含 'cat' ，列索引含 'height' 的数据
In[130]: df.xs('cat', axis=0, level=1).xs('height', axis=1, level=1)
Out[130]: last year  this year  next year
lili            121        138        111
xiaohong         93        125        115
xiaoming        134        117         85

7、索引转换

列索引转成行索引，用参数 level 指定要转的索引层级，默认是最内层。
df.stack()

行索引转成列索引，用参数 level 指定要转的索引层级，默认是最内层。
df.unstack()

1）Series 转 DataFrame

In[131]: ser.unstack()Out[131]: cat  chicken  dog
lili        4       24  142
xiaoming   74      135   94

2）DataFrame 转 Series

In[136]: df.stack().stack()Out[136]:
lili      cat  height  last year    121next year    111this year    138price   last year    120next year     85...
xiaoming  dog  price   next year     83this year    135weight  last year    121next year    127this year    136
Length: 81, dtype: int32

3）多层索引转单层索引

In[146]: df.stack().stack().reset_index()Out[146]: level_0 level_1 level_2    level_3    0
0       lili     cat  height  last year  121
1       lili     cat  height  next year  111
2       lili     cat  height  this year  138
3       lili     cat   price  last year  120
4       lili     cat   price  next year   85
..       ...     ...     ...        ...  ...
76  xiaoming     dog   price  next year   83
77  xiaoming     dog   price  this year  135
78  xiaoming     dog  weight  last year  121
79  xiaoming     dog  weight  next year  127
80  xiaoming     dog  weight  this year  136

4）多层索引在不同轴上的转换

# 最外层的列索引转到行索引
In[156]:  df.stack(level=0)Out[156]: height  price  weight
lili     cat     last year     121    120     106next year     111     85     114this year     138    109      91chicken last year     124    117      93next year     115    133     140this year     111    106     103dog     last year     112    107     141next year     119    138      93this year     106    119      84
xiaohong cat     last year      93    102      80next year     115    101      94this year     125    112     135
...

# 最外层的行索引转到列索引上
In[159]: df.unstack(level=0)Out[159]: last year                                    ... next year                                  price                   height           ...    height          weight                  lili xiaohong xiaoming   lili xiaohong  ...  xiaohong xiaoming   lili xiaohong xiaoming
cat           120      102      115    121       93  ...       115       85    114       94       83
chicken       117       83      138    124      107  ...       127       89    140      139      149
dog           107      116      115    112      110  ...       130       88     93      117      127
[3 rows x 27 columns]

5）多层索引在同一个轴内的转换

df.swaplevel(axis=0)Out[148]: last year               this year               next year              price height weight     price height weight     price height weight
cat     lili           120    121    106       109    138     91        85    111    114
chicken lili           117    124     93       106    111    103       133    115    140
dog     lili           107    112    141       119    106     84       138    119     93
cat     xiaohong       102     93     80       112    125    135       101    115     94
chicken xiaohong        83    107     86       119     85    129        85    127    139
dog     xiaohong       116    110    103       114     90    102        90    130    117
cat     xiaoming       115    134    100       111    117     89       133     85     83
chicken xiaoming       138     96     82        95     99    113        99     89    149
dog     xiaoming       115    140    121       135     90    136        83     88    127

df.swaplevel(axis=1)Out[150]: price    height    weight     price    height    weight     price    height    weightlast year last year last year this year this year this year next year next year next year
lili     cat           120       121       106       109       138        91        85       111       114chicken       117       124        93       106       111       103       133       115       140dog           107       112       141       119       106        84       138       119        93
xiaohong cat           102        93        80       112       125       135       101       115        94chicken        83       107        86       119        85       129        85       127       139dog           116       110       103       114        90       102        90       130       117
xiaoming cat           115       134       100       111       117        89       133        85        83chicken       138        96        82        95        99       113        99        89       149dog           115       140       121       135        90       136        83        88       127

8、对索引的操作

1）给索引起名

In[151]: df.index.names
Out[151]: FrozenList([None, None])In[152]:
df.index.names = ['puple', 'animal']
dfOut[152]:
Out[165]: last year               this year               next year              price height weight     price height weight     price height weight
puple    animal
lili     cat           120    121    106       109    138     91        85    111    114chicken       117    124     93       106    111    103       133    115    140dog           107    112    141       119    106     84       138    119     93
xiaohong cat           102     93     80       112    125    135       101    115     94chicken        83    107     86       119     85    129        85    127    139dog           116    110    103       114     90    102        90    130    117
xiaoming cat           115    134    100       111    117     89       133     85     83chicken       138     96     82        95     99    113        99     89    149dog           115    140    121       135     90    136        83     88    127

2）取指定层级的索引值

In[160]: df.index.get_level_values(0)
或者
In[160]: df.index.get_level_values('')Out[160]: Index(['lili', 'lili', 'lili', 'xiaohong', 'xiaohong', 'xiaohong', 'xiaoming', 'xiaoming', 'xiaoming'], dtype='object')

3）索引排序

# 将第二层列索引按降序排列
In[166]: df.sort_index(axis=1, level=1, ascending=False)Out[166]: this year next year last year this year next year last year this year next year last yearweight    weight    weight     price     price     price    height    height    height
puple    animal
lili     cat            91       114       106       109        85       120       138       111       121chicken       103       140        93       106       133       117       111       115       124dog            84        93       141       119       138       107       106       119       112
xiaohong cat           135        94        80       112       101       102       125       115        93chicken       129       139        86       119        85        83        85       127       107dog           102       117       103       114        90       116        90       130       110
xiaoming cat            89        83       100       111       133       115       117        85       134chicken       113       149        82        95        99       138        99        89        96dog           136       127       121       135        83       115        90        88       140

参考资料：

1、【数据分析day03】pandas“层次化索引对象”的多层索引,切片,stack
2、DataFrame多重索引
3、python Dataframe多索引切片操作行多层索引
4、python pandas DataFrame.xs用法及代码示例
5、MultiIndex / advanced indexing
6、pandas.DataFrame.xs

dataframe 中的多层索引相关推荐

dataframe,python,numpy 问题索引1
# 找出只有赌场数据的账户 gp=data.groupby(['查询账号','场景标签'],as_index=True) tj=gp.size().reset_index()按查询账号和场景标签分组并 ...
6种方式创建多层索引MultiIndex
49_6种方式创建多层索引MultiIndex 公众号:尤而小屋作者:Peter 编辑:Peter 大家好,我是Peter~ 在上一篇文章中介绍了如何创建Pandas中的单层索引,今天给大家带来的是 ...
pandas索引复合索引dataframe数据、索引dataframe中指定行和指定列交叉格子的数据内容(getting a specific value)、使用元组tuple表达复合索引的指定行
pandas索引复合索引dataframe数据.索引dataframe中指定行和指定列交叉格子的数据内容(getting a specific value).使用元组tuple表达复合索引的指定行目 ...
pandas使用方括号[]或者loc函数、基于列名称或者列名称列表索引dataframe中的单个数据列或者多个数据列（accessing columns of a dataframe)
pandas使用方括号[]或者loc函数.基于列名称或者列名称列表索引dataframe中的单个数据列或者多个数据列(accessing columns of a dataframe using co ...
pandas使用np.where函数计算返回dataframe中指定数据列包含缺失值的行索引列表list
pandas使用np.where函数计算返回dataframe中指定数据列包含缺失值的行索引列表list(index of rows with missing values in dataframe ...
pandas使用drop函数删除dataframe中指定索引列表对应位置的数据行（drop multiple rows in dataframe with integer index）
pandas使用drop函数删除dataframe中指定索引列表对应位置的数据行(drop multiple rows in dataframe with integer index) 目录
pandas使用query函数基于判断条件获得dataframe中满足条件的数据行(row)的索引列表（index of rows matching conditions in dataframe）
pandas使用query函数基于判断条件获得dataframe中满足条件的数据行(row)的索引列表(index of rows matching conditions in dataframe) ...
pandas使用dropna函数计算返回dataframe中不包含缺失值的行索引列表list（index of rows without missing values in dataframe）
pandas使用dropna函数计算返回dataframe中不包含缺失值的行索引列表list(index of rows without missing values in dataframe) 目录
pandas使用iloc函数基于dataframe数据列的索引抽取单列或者多列数据、其中多列索引需要嵌入在列表方括号[]中、或使用：符号形成起始和终止范围索引
pandas使用iloc函数基于dataframe数据列的索引抽取单列或者多列数据.其中多列索引需要嵌入在列表方括号[]中.或使用:符号形成起始和终止范围索引目录

dataframe 中的多层索引

1、生成两层行索引、列索引的样本数据

1）生成 DataFrame

2）生成 Series

2、索引使用总体原则

1）在进行多重索引操作之前要对索引进行排序

2）索引的顺序一般是先外层，再内层

3、索引与切片 —— Series

1）显式

2）隐式

4、索引与切片 —— DataFrame

1）列索引：直接用列名索引

2）列切片

3）行索引

4）行切片

5、pd.IndexSlice 的用法

6、pd.xs() 的索引与切片

7、索引转换

1）Series 转 DataFrame

2）DataFrame 转 Series

3）多层索引转单层索引

4）多层索引在不同轴上的转换

5）多层索引在同一个轴内的转换

8、对索引的操作

1）给索引起名

2）取指定层级的索引值

3）索引排序

参考资料：

dataframe 中的多层索引相关推荐

最新文章

热门文章

dataframe 中的多层索引

1、生成两层行索引、列索引的样本数据

1）生成 DataFrame

2）生成 Series

2、索引使用总体原则

1）在进行多重索引操作之前要对索引进行排序

2）索引的顺序一般是先外层，再内层

3、 索引与切片 —— Series

1）显式

2）隐式

4、索引与切片 —— DataFrame

1）列索引：直接用列名索引

2）列切片

3）行索引

4）行切片

5、pd.IndexSlice 的用法

6、pd.xs() 的索引与切片

7、索引转换

1）Series 转 DataFrame

2）DataFrame 转 Series

3）多层索引转单层索引

4）多层索引在不同轴上的转换

5）多层索引在同一个轴内的转换

8、对索引的操作

1）给索引起名

2）取指定层级的索引值

3）索引排序

参考资料：

dataframe 中的多层索引相关推荐

最新文章

热门文章

3、索引与切片 —— Series