pandas(二) -- Dataframe创建及索引

Dataframe创建

由数组/list组成的字典

data1 = {'a':[1,2,3],'b':[3,4,5],'c':[5,6,7]}
df1 = pd.DataFrame(data1)
print(df1)

输出

添加索引

df1 = pd.DataFrame(data1,index = ['f1','f2','f3'])

2. 由Series组成的字典

data1 = {'one':pd.Series(np.random.rand(2)),'two':pd.Series(np.random.rand(3))}  # 没有设置index的Series
df1 = pd.DataFrame(data1,index = ['a','b','c'])
print(df1)

较短的序列补0

        one       two
a  0.065605  0.217466
b  0.973106  0.908904
c       NaN  0.663079

通过二维数组直接创建

ar = np.random.rand(9).reshape(3,3)
print(ar)
df1 = pd.DataFrame(ar)
df2 = pd.DataFrame(ar, index = ['a', 'b', 'c'], columns = ['one','two','three'])

          0         1         2
0  0.339401  0.773847  0.253083
1  0.281513  0.028760  0.751607
2  0.347467  0.252451  0.689796one       two     three
a  0.339401  0.773847  0.253083
b  0.281513  0.028760  0.751607
c  0.347467  0.252451  0.689796

Dataframe索引与切片

列索引和行索引
df

                a          b          c          d
one    94.473099  30.077407  70.953102   9.416436
two    41.958628  15.709462  47.400670  56.909647
three  14.539075   8.398997  80.139084  83.250374

1. 列索引
按照列名选择列，只选择一列输出Series，选择多列输出Dataframe

data1 = df['a']
data2 = df[['a','c']]

data1 = df['a']

one      94.473099
two      41.958628
three    14.539075
Name: a, dtype: float64 <class 'pandas.core.series.Series'>

data2 = df[['a','c']]

              a          c
one    94.473099  70.953102
two    41.958628  47.400670
three  14.539075  80.139084 <class 'pandas.core.frame.DataFrame'>

2. 行索引
按照index选择行，只选择一行输出Series，选择多行输出Dataframe
df.loc['one'] – 按标签索引
df.iloc[0] – 按位置索引

data3 = df.loc['one']#单标签索引
data4 = df.loc[['one','two']]#逐个选择--多标签索引
data5 = df.loc['one':'three']#范围--切片索引

data3

a    94.473099
b    30.077407
c    70.953102
d     9.416436
Name: one, dtype: float64 <class 'pandas.core.series.Series'>

data4

    a          b          c          d
one  94.473099  30.077407  70.953102   9.416436
two  41.958628  15.709462  47.400670  56.909647 <class 'pandas.core.frame.DataFrame'>

data5

  a          b          c          d
one    94.473099  30.077407  70.953102   9.416436
two    41.958628  15.709462  47.400670  56.909647

df.iloc[] - 按照整数位置做行索引
- 单位置索引 df.iloc[0] ———与 df.loc['one']相同
- 多位置索引 df.iloc[[2,1]] ———与 df.loc['three','two']相同
- 切片索引 df.iloc[0:2]———与 df.loc['one':'three']相同

3. 布尔型索引
df

     a          b          c          d
one    52.462365  92.336489  95.512607  85.587735
two    34.853185  12.887189  69.575950  79.705655
three  90.755125  98.826032  12.686749  99.404063
four   75.758254  97.520349  36.782117  18.956917

布尔型矩阵索引
b1 = df < 20得到与df同型的矩阵

 a      b      c      d
one    False  False  False  False
two    False   True  False  False
three  False  False   True  False
four   False  False  False   True

通过布尔型矩阵索引df[b1]，False处的值为NaN,True的为相应值

 a          b          c          d
one   NaN        NaN        NaN        NaN
two   NaN  12.887189        NaN        NaN
three NaN        NaN  12.686749        NaN
four  NaN        NaN        NaN  18.956917

布尔型序列/行索引
b2 = df['a'] > 50,得到布尔型序列

one       True
two      False
three     True
four      True
Name: a, dtype: bool <class 'pandas.core.series.Series'>

df[b2]

布尔型序列索引，得到的是布尔型序列为true所在的行

 a          b          c          d
one    52.462365  92.336489  95.512607  85.587735
three  90.755125  98.826032  12.686749  99.404063
four   75.758254  97.520349  36.782117  18.956917

不能通过相似的方法对行进行判断。列得到的是列名，与原始的DataFrame的index['one','two','three','four'],无法匹配

错误提示Boolean Series key will be reindexed to match DataFrame index.
通过行索引方式为。df在行索引为真时，为原始值，其余地方为NaN

多列索引和多行索引

b3 = df[['a','b']] > 50 #两列

              a      bone     True   Truetwo    False  Falsethree   True   Truefour    True   True <class 'pandas.core.frame.DataFrame'>

df[b3] 布尔型DataFrame中为True的地方有值，其余的均为NaN.多行索引也是同样的效果

 a          b   c   d
one    52.462365  92.336489 NaN NaN
two          NaN        NaN NaN NaN
three  90.755125  98.826032 NaN NaN
four   75.758254  97.520349 NaN NaN

多重索引：比如同时索引行和列
df['a'].loc[['one','three']]

one      52.462365
three    90.755125
Name: a, dtype: float64

print(df[['b','c','d']].iloc[::2])   # 选择b，c，d列的one，three行

          b          c          d
one      92.336489  95.512607  85.587735
three    98.826032  12.686749  99.404063

pandas(二) -- Dataframe创建及索引相关推荐

Pandas(二)--DataFrame结构
DataFrame结构 DataFrame 一个表格型的数据结构,既有行标签(index),又有列标签(columns),它也被称异构数据表,所谓异构,指的是表格中每列的数据类型可以不同,比如可以是字 ...
python dataframe去掉索引_python中pandas.DataFrame（创建、索引、增添与删除）的简单操作方法介绍...
这篇文章python中pandas.DataFrame(创建.索引.增添与删除)的简单操作方法介绍,其中包括创建.索引.增添与删除等的相关资料,文中介绍的非常详细,需要的朋友可以参考借鉴,下面来一起看 ...
dataframe两个表合并_Part25:Pandas基础(Series,DataFrame类的创建、索引、切片、算术方法)...
一.为什么学习pandas numpy已经可以帮助我们进行数据的处理了,那么学习pandas的目的是什么呢? numpy能够帮助我们处理的是数值型的数据,当然在数据分析中除了数值型的数据还有好多其他类 ...
python pandas dataframe 不显示索引_python中pandas.DataFrame的简单操作方法（创建、索引、增添与删除）...
前言最近在网上搜了许多关于pandas.DataFrame的操作说明,都是一些基础的操作,但是这些操作组合起来还是比较费时间去正确操作DataFrame,花了我挺长时间去调整BUG的.我在这里做一些 ...
python dataframe去掉索引_python中pandas.DataFrame的简单操作方法（创建、索引、增添与删除）...
这篇文章主要介绍了python中pandas.DataFrame的简单操作方法,其中包括创建.索引.增添与删除等的相关资料,文中介绍的非常详细,需要的朋友可以参考借鉴,下面来一起看看吧. 前言最近在 ...
python删除列索引_python中pandas.DataFrame的简单操作方法（创建、索引、增添与删除）...
前言最近在网上搜了许多关于pandas.DataFrame的操作说明,都是一些基础的操作,但是这些操作组合起来还是比较费时间去正确操作DataFrame,花了我挺长时间去调整BUG的.我在这里做一些 ...
pandas创建复合索引dataframe仿真数据集实战（create a multiIndex dataframe）
pandas创建复合索引dataframe仿真数据集实战(create a multiIndex dataframe) 目录 pandas创建复合索引dataframe仿真数据集实战(create a ...
pandas使用pd.MultiIndex.from_product函数和pd.MultiIndex.from_tuples函数创建复合索引dataframe数据实战
pandas使用pd.MultiIndex.from_product函数和pd.MultiIndex.from_tuples函数创建复合索引dataframe数据实战目录 pandas使用pd.Mu ...
pandas使用datetime创建示例dataframe：包含完整的时分秒信息、创建时间索引、使用不同的时间频率、date_range中参数freq的偏移量别名
pandas使用datetime创建示例dataframe:包含完整的时分秒信息.创建时间索引.使用不同的时间频率.date_range中参数freq的偏移量别名目录

pandas(二) -- Dataframe创建及索引

Dataframe创建

Dataframe索引与切片

pandas(二) -- Dataframe创建及索引相关推荐

最新文章

热门文章