pandas学习笔记——阅读官方文档

1. 初始化

（1）生成简单序列pd.Series

>>>s = pd.Series([1,3,5,np.nan,6,8])
>>>s
0    1.0
1    3.0
2    5.0
3    NaN   #注意空
4    6.0
5    8.0
dtype: float64

（2）生成日期序列pd.date_range

>>>dates = pd.date_range('20130101', periods=6)
>>> dates
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04','2013-01-05', '2013-01-06'],dtype='datetime64[ns]', freq='D')

（3）结构

>>>df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
# index 表示序号，columns表示列名称>>> dfA         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401
2013-01-06 -0.673690  0.113648 -1.478427  0.524988

>>>: df2 = pd.DataFrame({     'A' : 1.,....:                      'B' : pd.Timestamp('20130102'),....:                      'C' : pd.Series(1,index=list(range(4)),dtype='float32'),....:                      'D' : np.array([3] * 4,dtype='int32'),....:                      'E' : pd.Categorical(["test","train","test","train"]),....:                      'F' : 'foo' })....: >>>: df2A        B    C    D     E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo

2. 观察数据

（1）前n个（head），后n个（tail）

>>> df.head(2)A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236>>> df.tail(3)A         B         C         D
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401
2013-01-06 -0.673690  0.113648 -1.478427  0.524988

（2）展示序号（index）、列号（columns）、值（values）

>>>df.index
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04','2013-01-05', '2013-01-06'],dtype='datetime64[ns]', freq='D')>>> df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')>>> df.values
array([[ 0.4691, -0.2829, -1.5091, -1.1356],[ 1.2121, -0.1732,  0.1192, -1.0442],[-0.8618, -2.1046, -0.4949,  1.0718],[ 0.7216, -0.7068, -1.0396,  0.2719],[-0.425 ,  0.567 ,  0.2762, -1.0874],[-0.6737,  0.1136, -1.4784,  0.525 ]])

（3）快速数据统计describe

>>>df.describe()
              A         B         C         D
count  6.000000  6.000000  6.000000  6.000000
mean   0.073711 -0.431125 -0.687758 -0.233103
std    0.843157  0.922818  0.779887  0.973118
min   -0.861849 -2.104569 -1.509059 -1.135632
25%   -0.611510 -0.600794 -1.368714 -1.076610
50%    0.022070 -0.228039 -0.767252 -0.386188
75%    0.658444  0.041933 -0.034326  0.461706
max    1.212112  0.567020  0.276232  1.071804

（4）转置df.T

（5）按轴排序

降序：ascending=False

升序：ascending=True

横轴： df.sort_index(axis=1, ascending=False)

纵轴： df.sort_index(axis=0, ascending=False)

>>>df.sort_index(axis=1, ascending=False)D         C         B         A
2013-01-01 -1.135632 -1.509059 -0.282863  0.469112
2013-01-02 -1.044236  0.119209 -0.173215  1.212112
2013-01-03  1.071804 -0.494929 -2.104569 -0.861849
2013-01-04  0.271860 -1.039575 -0.706771  0.721555
2013-01-05 -1.087401  0.276232  0.567020 -0.424972
2013-01-06  0.524988 -1.478427  0.113648 -0.673690

（6）按值排序

>>> df.sort_values(by='B')A         B         C         D
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-06 -0.673690  0.113648 -1.478427  0.524988
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

3. 选择，与matlab类似

选择某列（ df.A == df['A']）

选择某个区间（df[0:3]）

按标签选择（df.loc[dates[0]]）

4. 数据缺失

用nan表示

舍去丢失数据的行 df.dropna(how='any')

补全丢失的数据 df.fillna(value=5)

判断是否缺失数据 pd.isna(df1)

5. 统计

求平均值 df.mean()

6. 使用函数

>>>df.apply(lambda x: x.max() - x.min())A    2.073961
B    2.671590
C    1.785291
D    0.000000
F    4.000000
dtype: float64

转载于:https://www.cnblogs.com/syyy/p/7908075.html

pandas学习笔记——阅读官方文档相关推荐

Jinja2学习笔记暨官方文档的翻译 -- 转载
为什么80%的码农都做不了架构师?>>> 呵呵, 刚刚看完Python模板引擎Jinja2的文档, 感觉很好, 觉得动态语言真是很好. 模板引擎竟然可以做的如此灵活....真 ...
Apache KafKa阅读官方文档心得
Apache KafKa阅读官方文档心得概念 Apache Kafka是一个分布式流媒体平台,流媒体平台有三个关键功能: 1.它允许您发布和订阅记录流.在这方面,它类似于消息队列或企业消息系统. ...
python@最容易上手的图形界面开发框架之一PySimpleGUI@PySimpleGUI学习路线和官方文档阅读
文章目录 PySimpleGUI 底层GUI框架及其状态自行扩展框架@The Chain Link Fence The PySimpleGUI "Family" The &quo ...
python笔记6-python官方文档之format()格式化详解
字符串格式化之format() 字符串的格式化是特别特别重要的一个知识点,自己将通过python官方文档来具体总结学习它,整体为如下一篇笔记文章,以备自己后来复习回顾. 一.format() ...
3、nginx设置简单的代理服务器-阅读官方文档
官网:Beginner's Guide 翻译部分:Setting Up a Simple Proxy Server 开始! One of the frequent uses of nginx is s ...
每天一小时python官方文档学习（一）————python的简单介绍
我们都知道,python的官方文档写得十分详尽,也是每一个学习python的人都绕不开的. 所以从今天开始,我每天都会用一小时学习python的官方文档,按照文档目录的顺序,摘录一些有用的语句,写下一 ...
linux3.10.53编译,根据官方文档在Linux下编译安装Apache
根据官方文档在Linux下编译安装Apache 前言永远记住官方文档才是最准确的安装手册,这篇文章仅为对官方文档的解读和补充,学习提升务必阅读官方文档: http://httpd.apache.or ...
什么！作为程序员你连英文版的官方文档都看不懂？
目录一.笔者英文基础介绍二.为啥程序员需要阅读官方文档? 三.如何才能无障碍阅读英文文档? 四.坚持!坚持!坚持! 五.来个约定吧! 这篇文章不聊技术,我们来聊一个某种程度上比技术更重要的话题:一 ...
CDH6官方文档中文系列(2)----Cloudera安装指南(安装前)
Cloudera安装指南最近在学习cdh6的官方文档,网上也比较难找到中文的文档. 其实官方英文文档的阅读难度其实并不是很高,所以在这里在学习官方文档的过程中,把它翻译成中文,在翻译的过程中加深学习 ...

pandas学习笔记——阅读官方文档

pandas学习笔记——阅读官方文档相关推荐

最新文章

热门文章