一、简介

封装了Numpy和Matplotlib，具有便捷的数据处理能力，读取文件方便，专门用于数据挖掘。

二、三大数据结构

2.1 DataFrame

2.1.1 结构

既有行索引，又有列索引的二维数组

1. 生成DataFrame

pd.DataFrame(ndarrary)

import numpy as np
import pandas as pd
#生成二维数组
stock_change = np.random.normal(0,1,(10,5))
#创建DataFrame
pd.DataFrame(stock_change)0              1           2            3        4
0   -0.268469   1.730915    -1.009817   -0.724267   1.070360
1   -1.774932   -1.477198   1.293078    0.111610    0.422240
2   -0.339734   -0.327221   2.087621    -1.105055   0.372288
3   -1.511658   -0.192350   1.115869    -0.668625   0.223991
4   1.234608    -0.030666   -1.296268   0.676021    0.229242
5   -1.829124   -1.455717   1.119276    -0.493858   0.429053
6   1.849348    -0.155816   0.195674    -1.608004   -0.456923
7   -0.833554   -0.301686   0.189869    0.664212    -0.364294
8   -0.524520   0.328688    1.453038    -0.950180   1.066928
9   -0.014796   0.046543    0.194047    -0.753442   -1.235741

2. 添加行索引

pd.DataFrame(ndarrary,index= )

#1.生成行索引
stock = ["股票{}".format(i) for i in range(10)]
['股票0', '股票1', '股票2', '股票3', '股票4', '股票5', '股票6', '股票7', '股票8', '股票9']#2.添加行索引
pd.DataFrame(stock_change,index=stock)0                    1             2           3         4
股票0 -0.268469   1.730915    -1.009817   -0.724267   1.070360
股票1 -1.774932   -1.477198   1.293078    0.111610    0.422240
股票2 -0.339734   -0.327221   2.087621    -1.105055   0.372288
股票3 -1.511658   -0.192350   1.115869    -0.668625   0.223991
股票4 1.234608    -0.030666   -1.296268   0.676021    0.229242
股票5 -1.829124   -1.455717   1.119276    -0.493858   0.429053
股票6 1.849348    -0.155816   0.195674    -1.608004   -0.456923
股票7 -0.833554   -0.301686   0.189869    0.664212    -0.364294
股票8 -0.524520   0.328688    1.453038    -0.950180   1.066928
股票9 -0.014796   0.046543    0.194047    -0.753442   -1.235741

3. 添加列索引

pd.DataFrame(ndarrary,columns= )

#生成列索引
date = pd.date_range(start="20210101",periods=5,freq="B")
DatetimeIndex(['2021-01-01', '2021-01-04', '2021-01-05', '2021-01-06','2021-01-07'],dtype='datetime64[ns]', freq='B')
#添加列索引
pd.DataFrame(stock_change,index=stock,columns=date)2021-01-01 2021-01-04  2021-01-05  2021-01-06  2021-01-07
股票0 -0.268469   1.730915    -1.009817   -0.724267   1.070360
股票1 -1.774932   -1.477198   1.293078    0.111610    0.422240
股票2 -0.339734   -0.327221   2.087621    -1.105055   0.372288
股票3 -1.511658   -0.192350   1.115869    -0.668625   0.223991
股票4 1.234608    -0.030666   -1.296268   0.676021    0.229242
股票5 -1.829124   -1.455717   1.119276    -0.493858   0.429053
股票6 1.849348    -0.155816   0.195674    -1.608004   -0.456923
股票7 -0.833554   -0.301686   0.189869    0.664212    -0.364294
股票8 -0.524520   0.328688    1.453038    -0.950180   1.066928
股票9 -0.014796   0.046543    0.194047    -0.753442   -1.235741

2.1.2 属性

1. 常用属性

shape 矩阵形状

index 行索引列表

columns 列索引列表

values 显示ndarrary，不显示行列索引

T 行列转置

data.shape
(10, 5)data.index
Index(['股票0', '股票1', '股票2', '股票3', '股票4', '股票5', '股票6', '股票7', '股票8', '股票9'], dtype='object')data.columns
DatetimeIndex(['2021-01-01', '2021-01-04', '2021-01-05', '2021-01-06','2021-01-07'],dtype='datetime64[ns]', freq='B')data.values
array([[-0.26846949,  1.73091461, -1.00981674, -0.72426739,  1.0703596 ],[-1.77493189, -1.4771981 ,  1.29307836,  0.11161021,  0.42224003],[-0.3397336 , -0.32722064,  2.08762105, -1.10505512,  0.37228785],[-1.51165829, -0.19235048,  1.11586897, -0.66862519,  0.22399066],[ 1.23460754, -0.03066557, -1.29626776,  0.67602135,  0.22924198],[-1.8291236 , -1.45571725,  1.11927613, -0.49385831,  0.42905348],[ 1.8493478 , -0.15581612,  0.19567402, -1.60800405, -0.45692313],[-0.83355434, -0.30168627,  0.18986934,  0.66421236, -0.36429427],[-0.52451968,  0.32868791,  1.45303844, -0.95018047,  1.06692751],[-0.01479553,  0.04654277,  0.19404728, -0.75344158, -1.23574094]])data.T股票0   股票1 股票2 股票3 股票4 股票5 股票6 股票7 股票8 股票9
2021-01-01  -0.268469   -1.774932   -0.339734   -1.511658   1.234608    -1.829124                   1.849348    -0.833554   -0.524520   -0.014796
2021-01-04  1.730915    -1.477198   -0.327221   -0.192350   -0.030666   -1.455717                  -0.155816    -0.301686   0.328688    0.046543
2021-01-05  -1.009817   1.293078    2.087621    1.115869    -1.296268   1.119276                    0.195674    0.189869    1.453038    0.194047
2021-01-06  -0.724267   0.111610    -1.105055   -0.668625   0.676021    -0.493858                   -1.608004   0.664212    -0.950180   -0.753442
2021-01-07  1.070360    0.422240    0.372288    0.223991    0.229242    0.429053                     -0.456923  -0.364294   1.066928    -1.235741

2. 方法

head( ) 显示开头几行数据，默认为5，可以指定

tail( ) 显示尾部几行数据，默认为5，可以指定

#显示开头三行
data.head(3)2021-01-01  2021-01-04  2021-01-05  2021-01-06  2021-01-07
股票0 -0.268469   1.730915    -1.009817   -0.724267   1.070360
股票1 -1.774932   -1.477198   1.293078    0.111610    0.422240
股票2 -0.339734   -0.327221   2.087621    -1.105055   0.372288#显示尾部两行
data.tail(2)2021-01-01  2021-01-04  2021-01-05  2021-01-06  2021-01-07
股票8 -0.524520   0.328688    1.453038    -0.950180   1.066928
股票9 -0.014796   0.046543    0.194047    -0.753442   -1.235741

2.1.3 索引的设置

1. 修改行列索引

不能单个修改，只能全部重新生成

stock_code = ["股票_"+ str(i) for i in range(10)]
data.index = stock_code
2021-01-01  2021-01-04  2021-01-05  2021-01-06  2021-01-07
股票_0    -0.268469   1.730915    -1.009817   -0.724267   1.070360
股票_1    -1.774932   -1.477198   1.293078    0.111610    0.422240
股票_2    -0.339734   -0.327221   2.087621    -1.105055   0.372288
股票_3    -1.511658   -0.192350   1.115869    -0.668625   0.223991
股票_4    1.234608    -0.030666   -1.296268   0.676021    0.229242
股票_5    -1.829124   -1.455717   1.119276    -0.493858   0.429053
股票_6    1.849348    -0.155816   0.195674    -1.608004   -0.456923
股票_7    -0.833554   -0.301686   0.189869    0.664212    -0.364294
股票_8    -0.524520   0.328688    1.453038    -0.950180   1.066928
股票_9    -0.014796   0.046543    0.194047    -0.753442   -1.235741

2. 重设索引

rest_index(drop=False)

设置新的下标索引
drop:默认为False，不删除原来的索引，如果为True，则会删除原来的索引值

data.reset_index()index  2021-01-01  2021-01-04 2021-01-05   2021-01-06  2021-01-07
0   股票_0    -0.268469   1.730915    -1.009817   -0.724267   1.070360
1   股票_1    -1.774932   -1.477198   1.293078    0.111610    0.422240
2   股票_2    -0.339734   -0.327221   2.087621    -1.105055   0.372288
3   股票_3    -1.511658   -0.192350   1.115869    -0.668625   0.223991
4   股票_4    1.234608    -0.030666   -1.296268   0.676021    0.229242
5   股票_5    -1.829124   -1.455717   1.119276    -0.493858   0.429053

3.设置新索引

set_index(keys,drop=True)

keys：列索引名或者列索引名列表
drop：默认为True，当作新的索引，删除原来的列

df = pd.DataFrame({ 'month' : [1,4,7,10],'year':[2012,2014,2013,2014],'sale':[55,40,84,31]})month year sale
0   1   2012    55
1   4   2014    40
2   7   2013    84
3   10  2014    31# 将月份设置为新的索引
df.set_index("month")year sale
month
1   2012    55
4   2014    40
7   2013    84
10  2014    31df.set_index("month",False)
month   year    sale
month
1   1   2012    55
4   4   2014    40
7   7   2013    84
10  10  2014    31#将year和month设置为索引
new_df = df.set_index(['year','month'])
#注:通过刚才的设置,这样 DataFrame就变成了一个具有 MultiIndex的 DataFrame。
new_df.index.names
MultiIndex([(2012,  1),(2014,  4),(2013,  7),(2014, 10)],names=['year', 'month'])

2.1.4 MultiIndex和Panel

1.Multilndex

用来储存三维数组

属性：

names：levels名称
levels：每个level的元组值

new_df.index.names
FrozenList(['year', 'month'])new_df.index.levels
FrozenList([[2012, 2013, 2014], [1, 4, 7, 10]])

2.Panel

和Multilndex相同，但0.20.0弃用

2.2 Series

带索引的一维数组

2.2.1 创建

1.指定内容，索引为默认

pd.Series(np.arange(3,9,2))0    3
1    5
2    7
dtype: int32

2.指定索引

pd.Series(np.arange(3,9,2),index=["a","b","c"])a    3
b    5
c    7
dtype: int32

3.通过字典创建

pd.Series({'red':100,'bule':200,'green':500,'yellow':1000})red        100
bule       200
green      500
yellow    1000
dtype: int64

2.2.2 属性

index 行索引列表

values 显示ndarrary，不显示行索引

三、基本数据操作

3.1 索引操作

3.1.1 直接使用索引操作

data.head()open  high    close   low volume  price_change    p_change    turnover
2018-02-27  23.53   25.88   24.16   23.53   95578.03    0.63    2.68    2.39
2018-02-26  22.80   23.78   23.53   22.80   60985.11    0.69    3.02    1.53
2018-02-23  22.88   23.37   22.82   22.71   52914.01    0.54    2.42    1.32
2018-02-22  22.25   22.76   22.28   22.02   36105.01    0.36    1.64    0.90
2018-02-14  21.49   21.99   21.92   21.48   23331.04    0.44    2.05    0.58#必须先列后行
data["open"]["2018-02-06"]
22.8

3.1.2 按行列名进行索引

ndarrary.loc

data.loc["2018-02-06"]["open"]
22.8
data.loc["2018-02-06","open"]
22.8

3.1.3 按数组位置进行索引

ndarrary.iloc

data.iloc[1,0]
22.8

3.1.4 组合索引

ndarrary.ix (已经失效)

data.loc[data.index[:4],['open','close','high','low']]
data iloc[0:4, data.columns.get_indexer(['open','close','high','low'])]open close   high    low
2018-02-27  23.53   24.16   25.88   23.53
2018-02-26  22.80   23.53   23.78   22.80
2018-02-23  22.88   22.82   23.37   22.71
2018-02-22  22.25   22.28   22.76   22.02

3.2 赋值操作

data.open = 100open high    close   low volume  price_change    p_change    turnover
2018-02-27  100     25.88   24.16   23.53   95578.03    0.63    2.68    2.39
2018-02-26  100     23.78   23.53   22.80   60985.11    0.69    3.02    1.53
2018-02-23  100     23.37   22.82   22.71   52914.01    0.54    2.42    1.32
2018-02-22  100     22.76   22.28   22.02   36105.01    0.36    1.64    0.90
2018-02-14  100     21.99   21.92   21.48   23331.04    0.44    2.05    0.58data.iloc[1,0] = 222open   high    close   low volume  price_change    p_change    turnover
2018-02-27  100     25.88   24.16   23.53   95578.03    0.63    2.68    2.39
2018-02-26  222     23.78   23.53   22.80   60985.11    0.69    3.02    1.53
2018-02-23  100     23.37   22.82   22.71   52914.01    0.54    2.42    1.32
2018-02-22  100     22.76   22.28   22.02   36105.01    0.36    1.64    0.90
2018-02-14  100     21.99   21.92   21.48   23331.04    0.44    2.05    0.58

3.3 排序

3.3.1 对内容进行排序

1.DataFrame

data.sort_values(key=,ascending=)

单个键或者多个键，进行排序，默认升序
ascending=False：降序
ascending=True：升序

#按high列数据，升序排列
data.sort_values(by="high")open  high    close   low volume  price_change    p_change    turnover
2015-03-02  12.25   12.67   12.52   12.20   96291.73    0.32    2.62    3.30
2015-03-04  12.80   12.92   12.90   12.61   67075.44    0.20    1.57    2.30
2015-03-03  12.52   13.06   12.70   12.52   139071.61   0.18    1.44    4.76
2015-09-07  12.92   13.38   12.77   12.63   52490.04    0.37    2.98    1.80
2015-03-05  12.88   13.45   13.16   12.87   93180.39    0.26    2.02    3.19#high数据相同情况下，按price_change大小排序
data.sort_values(["high","price_change"],ascending = False)open    high    close   low volume  price_change    p_change    turnover
2015-06-10  34.10   36.35   33.85   32.23   269033.12   0.51    1.53    9.21
2015-06-12  34.69   35.98   35.21   34.01   159825.88   0.82    2.38    5.47
2017-10-31  32.62   35.22   34.44   32.20   361660.88   2.38    7.42    9.05
2015-06-15  34.99   34.99   31.69   31.69   199369.53   -3.52   -10.00  6.82
2015-06-11  33.17   34.98   34.39   32.51   173075.73   0.54    1.59    5.92

2. Series

sr.sort_values(ascending=False)
2015-06-09    3.03
2017-10-26    2.68
2015-05-21    2.57
2017-10-31    2.38
2017-06-22    2.36
Name: price_change, dtype: float64

3.3.2 对索引排序

1.DataFrame

data.sort_index( )

data.sort_index()open    high    close   low volume  price_change    p_change    turnover
2015-03-02  12.25   12.67   12.52   12.20   96291.73    0.32    2.62    3.30
2015-03-03  12.52   13.06   12.70   12.52   139071.61   0.18    1.44    4.76
2015-03-04  12.80   12.92   12.90   12.61   67075.44    0.20    1.57    2.30
2015-03-05  12.88   13.45   13.16   12.87   93180.39    0.26    2.02    3.19
2015-03-06  13.17   14.48   14.28   13.13   179831.72   1.12    8.51    6.16

2.Series

sr.sort_index()

sr.sort_index()
2015-03-02    0.32
2015-03-03    0.18
2015-03-04    0.20
2015-03-05    0.26
2015-03-06    1.12...
2018-02-14    0.44
2018-02-22    0.36
2018-02-23    0.54
2018-02-26    0.69
2018-02-27    0.63
Name: price_change, Length: 643, dtype: float64

四、DataFrame运算

4.1 算术运算

1.add( )

dataopen high    close   low volume  price_change    p_change    turnover
2018-02-27  23.53   25.88   24.16   23.53   95578.03    0.63    2.68    2.39
2018-02-26  22.80   23.78   23.53   22.80   60985.11    0.69    3.02    1.53
2018-02-23  22.88   23.37   22.82   22.71   52914.01    0.54    2.42    1.32
2018-02-22  22.25   22.76   22.28   22.02   36105.01    0.36    1.64    0.90
2018-02-14  21.49   21.99   21.92   21.48   23331.04    0.44    2.05    0.58#两种写法都可以
data.open + 3
data.open.add(3)2018-02-27    26.53
2018-02-26    25.80
2018-02-23    25.88
2018-02-22    25.25
2018-02-14    24.49
Name: open, Length: 643, dtype: float64data + 10open   high    close   low volume  price_change    p_change    turnover
2018-02-27  33.53   35.88   34.16   33.53   95588.03    10.63   12.68   12.39
2018-02-26  32.80   33.78   33.53   32.80   60995.11    10.69   13.02   11.53
2018-02-23  32.88   33.37   32.82   32.71   52924.01    10.54   12.42   11.32
2018-02-22  32.25   32.76   32.28   32.02   36115.01    10.36   11.64   10.90
2018-02-14  31.49   31.99   31.92   31.48   23341.04    10.44   12.05   10.58

2.sub( )

data.sub(10)open high    close   low volume  price_change    p_change    turnover
2018-02-27  13.53   15.88   14.16   13.53   95568.03    -9.37   -7.32   -7.61
2018-02-26  12.80   13.78   13.53   12.80   60975.11    -9.31   -6.98   -8.47
2018-02-23  12.88   13.37   12.82   12.71   52904.01    -9.46   -7.58   -8.68
2018-02-22  12.25   12.76   12.28   12.02   36095.01    -9.64   -8.36   -9.10
2018-02-14  11.49   11.99   11.92   11.48   23321.04    -9.56   -7.95   -9.42#close列数据减去open列数据
data["close"].sub(data.open)2018-02-27    0.63
2018-02-26    0.73
2018-02-23   -0.06
2018-02-22    0.03
2018-02-14    0.43
Length: 643, dtype: float64

4.2 逻辑运算

4.2.1 逻辑运算符

<、>、 |、 &

data["p_change"] > 22018-02-27     True
2018-02-26     True
2018-02-23     True
2018-02-22    False
2018-02-14     True
Name: p_change, Length: 643, dtype: bool

布尔索引

#显示p_change列数值大于2的数据
data[data.p_change>2]open    high    close   low volume  price_change    p_change    turnover
2018-02-27  23.53   25.88   24.16   23.53   95578.03    0.63    2.68    2.39
2018-02-26  22.80   23.78   23.53   22.80   60985.11    0.69    3.02    1.53
2018-02-23  22.88   23.37   22.82   22.71   52914.01    0.54    2.42    1.32
2018-02-14  21.49   21.99   21.92   21.48   23331.04    0.44    2.05    0.58
2018-02-12  20.70   21.40   21.19   20.63   32445.39    0.82    4.03    0.81#显示p_change>2并且low>15的数据
data[(data.p_change>2) & (data.low>15)]open   high    close   low volume  price_change    p_change    turnover
2018-02-27  23.53   25.88   24.16   23.53   95578.03    0.63    2.68    2.39
2018-02-26  22.80   23.78   23.53   22.80   60985.11    0.69    3.02    1.53
2018-02-23  22.88   23.37   22.82   22.71   52914.01    0.54    2.42    1.32
2018-02-14  21.49   21.99   21.92   21.48   23331.04    0.44    2.05    0.58
2018-02-12  20.70   21.40   21.19   20.63   32445.39    0.82    4.03    0.81

4.2.2 逻辑运算函数

1.query( )

指令语句执行函数，直接指定列名即可

data.query("p_change>2 & low>15")open    high    close   low volume  price_change    p_change    turnover
2018-02-27  23.53   25.88   24.16   23.53   95578.03    0.63    2.68    2.39
2018-02-26  22.80   23.78   23.53   22.80   60985.11    0.69    3.02    1.53
2018-02-23  22.88   23.37   22.82   22.71   52914.01    0.54    2.42    1.32
2018-02-14  21.49   21.99   21.92   21.48   23331.04    0.44    2.05    0.58
2018-02-12  20.70   21.40   21.19   20.63   32445.39    0.82    4.03    0.81

2.isin( )

筛选数值范围之内的数据

#筛选turnover列数值介于4.19和2.39之间的数据
data.turnover.isin([4.19,2.39])
2018-02-27     True
2018-02-26    False
2018-02-23    False
2018-02-22    False
2018-02-14    False
Name: turnover, Length: 643, dtype: booldata[data.turnover.isin([4.19,2.39])]open   high    close   low volume  price_change    p_change    turnover
2018-02-27  23.53   25.88   24.16   23.53   95578.03    0.63    2.68    2.39
2017-07-25  23.07   24.20   23.70   22.64   167489.48   0.67    2.91    4.19
2016-09-28  19.88   20.98   20.86   19.71   95580.75    0.98    4.93    2.39
2015-04-07  16.54   17.98   17.54   16.50   122471.85   0.88    5.28    4.19

4.3 统计运算

4.3.1 常用统计运算符

min、 max、 mean、 median、 var、 std

describe 一次性获得所有类型

data.describe()open  high    close   low volume  price_change    p_change    turnover
count   643.000000  643.000000  643.000000  643.000000  643.000000  643.000000              643.000000  643.000000
mean    21.272706   21.900513   21.336267   20.771835   99905.519114    0.018802            0.190280    2.936190
std 3.930973    4.077578    3.942806    3.791968    73879.119354    0.898476                4.079698    2.079375
min 12.250000   12.670000   12.360000   12.200000   1158.120000 -3.520000                   -10.030000  0.040000
25% 19.000000   19.500000   19.045000   18.525000   48533.210000    -0.390000               -1.850000   1.360000
50% 21.440000   21.970000   21.450000   20.980000   83175.930000    0.050000                0.260000    2.500000
75% 23.400000   24.065000   23.415000   22.850000   127580.055000   0.455000                2.305000    3.915000
max 34.990000   36.350000   35.210000   34.010000   501915.410000   3.030000                10.030000   12.560000#按列获得最大值（默认）
data.max(0)
open                34.99
high                36.35
close               35.21
low                 34.01
volume          501915.41
price_change         3.03
p_change            10.03
turnover            12.56#按行获得最大值
data.max(1)
2018-02-27     95578.03
2018-02-26     60985.11
2018-02-23     52914.01
2018-02-22     36105.01
2018-02-14     23331.04

4.3.2 获取位置

idxmax( )、idxmin( )…

data.idxmax()
open            2015-06-15
high            2015-06-10
close           2015-06-12
low             2015-06-12
volume          2017-10-26
price_change    2015-06-09
p_change        2015-08-28
turnover        2017-10-26
dtype: object

4.3.3 累计统计函数

cumsum 计算前1/2/3/.n个数的和
cummax 计算前1/2/3.n个数的最大值
cummin 计算前1/2/3/.n个数的最小值
cumprod 计算前1/2/3/.n个数的积

data.p_change
2018-82-27  2.68
2018-92-26  3.82
2018-2-23  2.42
2018-02-22  1.64
2018-82-14  2.85data.p_change.cumsum()
2018-92-27  2.68
2018-2-26  5.7
2018-92-23  8.12
2018-92-22  9.76
2018-92-14  11.81

4.4 自定义运算

apply(func,axis=0)

func:自定义函数
axis:0默认为列，1为行

data.apply(lambda x: x.max() - x.min())open                22.74
high                23.68
close               22.85
low                 21.81
volume          500757.29
price_change         6.55
p_change            20.06
turnover            12.52
dtype: float64

五、画图

5.1 Dataframe.plot( )

二维数组直接用plot( )，会自动生成统计图

#指定volume为x轴，turnover为y轴，类型为散点图
data.plot(x="volume",y="turnover",kind="scatter")

5.2 Series.plot( )

同Dataframe

六、文件读取与存储

6.1 CSV文件

1.读取

pd.read_csv(path)

usecols= 获取想要的列
names= 添加列索引

pd.read_csv("stock_day.csv",usecols=["high","low","open","close"])open    high    close   low
2018-02-27  23.53   25.88   24.16   23.53
2018-02-26  22.80   23.78   23.53   22.80
2018-02-23  22.88   23.37   22.82   22.71
2018-02-22  22.25   22.76   22.28   22.02
2018-02-14  21.49   21.99   21.92   21.48#读取文件不含列索引，读取时添加列索引
data = pd.read_csv("stock_day2.csv",names=['open', 'high', 'close', 'low', 'volume', 'price_change', 'p_change','ma5', 'ma10', 'ma20', 'v_ma5', 'v_ma10', 'v_ma20', 'turnover'])open    high    close   low volume  price_change    p_change    ma5
2018-02-27  23.53   25.88   24.16   23.53   95578.03    0.63    2.68    22.942
2018-02-26  22.80   23.78   23.53   22.80   60985.11    0.69    3.02    22.406
2018-02-23  22.88   23.37   22.82   22.71   52914.01    0.54    2.42    21.938
2018-02-22  22.25   22.76   22.28   22.02   36105.01    0.36    1.64    21.446
2018-02-14  21.49   21.99   21.92   21.48   23331.04    0.44    2.05    21.366

2.写入

DataFrame.to_csv（path）

columns=[ ] 需要写入文件的列
index=False 不保存行索引
header=False 不保存列索引

#保存open列的数据
data[:10].to_csv("mytest.csv",columns=["open"])Unnamed: 0  open
0   2018-02-27  23.53
1   2018-02-26  22.80
2   2018-02-23  22.88
3   2018-02-22  22.25
4   2018-02-14  21.49
5   2018-02-13  21.40
6   2018-02-12  20.70
7   2018-02-09  21.20
8   2018-02-08  21.79
9   2018-02-07  22.69#设置index为False
data[:10].to_csv("mytest.csv",columns=["open"],index=False)open
0   23.53
1   22.80
2   22.88
3   22.25
4   21.49
5   21.40
6   20.70
7   21.20
8   21.79
9   22.69

6.2 HDF5文件

hdf5存储三维数据的文件

key1 dataframe1 二维数据

key2 dataframe2 二维数据

key相当于存进大文件的小文件的文件名，在读取时必须指定key

1.读取

pd.read_hdf（path，key=）

#读取h5文件中，key值为close的数组
pd.read_hdf("mytest.h5",key="close")
pd.read_hdf("mytest.h5",key="open")

2.写入

pd.to_hdf（path，key=）

#将day_close数组写入h5文件中，key指定为close
day_close.to_hdf("mytest.h5",key="close")
#将day_open数组写入h5文件中，key指定为open
day_open.to_hdf("mytest.h5",key="open")

6.3 JSON文件

1.读取

pd.read_json（path）

orient=“records”
lines=True 按行读取，默认为False

sa = pd.read_json("Sarcasm_Headlines_Dataset.json",orient="records",lines=True)article_link                                       headline    is_sarcastic
0   https://www.huffingtonpost.com/entry/versace-b...   former versace store    0
1   https://www.huffingtonpost.com/entry/roseanne-...   the 'roseanne' revival    0
2   https://local.theonion.com/mom-starting-to-fea...   mom starting to fear    1
3   https://politics.theonion.com/boehner-just-wan...   boehner just wants w    1
4   https://www.huffingtonpost.com/entry/jk-rowlin...   j.k. rowling wishes     0

2.写入

pd.to_json( path)

#将数据按行写入json文件中
sa.to_json("mytest.json",orient="records",lines=True)

机器学习-Pandas相关推荐

列注释_机器学习 Pandas 03：基础前16题 ( 带答案、注释 )
13. 字符串处理方法 s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) s.str.lower ...
列注释_机器学习 Pandas 08：进阶前09题 ( 带答案、注释 )
8. Indicator设置合并列名称 df1 = pd.DataFrame({'col1':[0,1],'col_left':['a','b']}) df2 = pd.DataFrame({'col ...
机器学习-Pandas 知识点汇总(吐血整理)
作者:HappyPuppy 出处:http://www.cnblogs.com/tangxiaobo199181/ 本文版权归作者HappyPuppy所有,欢迎转载,但未经作者同意必须保留此段声明,且 ...
机器学习——pandas基础
一.导入pandas库 import pandas as pd 二.读取数据文件 pf1 = pd.read_csv("testDataFile.csv") pf2 = pd.re ...
机器学习——pandas
一.pandas的数据结构 Pandas中一共有三种数据结构,分别为:Series.DataFrame和MultiIndex(老版本中叫Panel ). 其中Series是一维数据结构,DataFra ...
pandas 机器学习_机器学习的PANDAS
pandas 机器学习 Pandas is one of the tools in Machine Learning which is used for data cleaning and analy ...
Python机器学习、深度学习库总结（内含大量示例，建议收藏）
Python机器学习.深度学习库总结(内含大量示例,建议收藏) 前言 python常用机器学习及深度学习库介绍总结与分类 python 常用机器学习及深度学习库总结分类更多前言目前,随着人工 ...
Python机器学习、深度学习库总结
Python机器学习.深度学习库总结(内含大量示例,建议收藏) 前言 python常用机器学习及深度学习库介绍总结与分类 python 常用机器学习及深度学习库总结分类更多前言为了大家能够对 ...
盘点27个机器学习、深度学习库最频繁使用的 Python 工具包（内含大量示例，建议收藏）
目前,随着人工智能的大热,吸引了诸多行业对于人工智能的关注,同时也迎来了一波又一波的人工智能学习的热潮,虽然人工智能背后的原理并不能通过短短一文给予详细介绍,但是像所有学科一样,我们并不需要从头开始& ...

机器学习-Pandas

一、简介

二、三大数据结构

2.1 DataFrame

2.1.1 结构

1. 生成DataFrame

2. 添加行索引

3. 添加列索引

2.1.2 属性

1. 常用属性

2. 方法

2.1.3 索引的设置

1. 修改行列索引

2. 重设索引

3.设置新索引

2.1.4 MultiIndex和Panel

1.Multilndex

2.Panel

2.2 Series

2.2.1 创建

1.指定内容，索引为默认

2.指定索引

3.通过字典创建

2.2.2 属性

三、基本数据操作

3.1 索引操作

3.1.1 直接使用索引操作

3.1.2 按行列名进行索引

3.1.3 按数组位置进行索引

3.1.4 组合索引

3.2 赋值操作

3.3 排序

3.3.1 对内容进行排序

1.DataFrame

2. Series

3.3.2 对索引排序

1.DataFrame

2.Series

四、DataFrame运算

4.1 算术运算

1.add( )

2.sub( )

4.2 逻辑运算

4.2.1 逻辑运算符

4.2.2 逻辑运算函数

1.query( )

2.isin( )

4.3 统计运算

4.3.1 常用统计运算符

4.3.2 获取位置

4.3.3 累计统计函数

4.4 自定义运算

五、画图

5.1 Dataframe.plot( )

5.2 Series.plot( )

六、文件读取与存储

6.1 CSV文件

1.读取

2.写入

6.2 HDF5文件

1.读取

2.写入

6.3 JSON文件

1.读取

2.写入

机器学习-Pandas相关推荐

最新文章

热门文章