by Shubhi Asthana

通过Shubhi Asthana

Python中的Series和DataFrame (Series and DataFrame in Python)

A couple of months ago, I took the online course “Using Python for Research” offered by Harvard University on edX. While taking the course, I learned many concepts of Python, NumPy, Matplotlib, and PyPlot. I also had an opportunity to work on case studies during this course and was able to use my knowledge on actual datasets. For more information about this program, check out here.

几个月前，我参加了哈佛大学在edX上开设的在线课程“使用Python研究”。在学习本课程的同时，我学习了Python，NumPy，Matplotlib和PyPlot的许多概念。在本课程中，我还有机会进行案例研究，并能够将我的知识用于实际数据集。有关此程序的更多信息，请在此处查看。

I learned two important concepts in this course — Series and DataFrame. I want to introduce these to you through a short tutorial.

我在本课程中学习了两个重要的概念-Series和DataFrame。我想通过一个简短的教程向您介绍这些。

To start with the tutorial, lets get the latest source code of Python from the official website here.

要开始本教程，请从此处的官方网站获取最新的Python源代码。

Once you’ve installed Python is installed, you’ll use a graphical user interface called IDLE to work with Python.

安装Python后，将使用一个名为IDLE的图形用户界面来使用Python。

Let’s import Pandas to our workspace. Pandas is a Python library that provides data structures and data analysis tools for different functions.

让我们将Pandas导入我们的工作区。 Pandas是一个Python库，它为不同的功能提供数据结构和数据分析工具。

系列 (Series)

A Series is a one-dimensional object that can hold any data type such as integers, floats and strings. Let’s take a list of items as an input argument and create a Series object for that list.

系列是一维对象，可以保存任何数据类型，例如整数，浮点数和字符串。让我们以项目列表作为输入参数，并为该列表创建Series对象。

>>> import pandas as pd

>>> x = pd.Series([6,3,4,6])

>>> x

0 6

1 3

2 4

3 6

dtype: int64

The axis labels for the data as referred to as the index. The length of index must be the same as the length of data. Since we have not passed any index in the code above, the default index will be created with values [0, 1, … len(data) -1]

数据的轴标签称为索引。索引的长度必须与数据的长度相同。由于我们没有在上面的代码中传递任何索引，因此将使用值[0, 1, … len(data) -1]创建默认索引[0, 1, … len(data) -1]

Lets go ahead and define indexes for the data.

让我们继续为数据定义索引。

>>> x = pd.Series([6,3,4,6], index=[‘a’, ‘b’, ‘c’, ‘d’])

>>> x

a 6

b 3

c 4

d 6

dtype: int64

The index in left most column now refers to data in the right column.

现在，最左边一列的索引引用了右边一列的数据。

We can lookup the data by referring to its index:

我们可以通过引用其索引来查找数据：

>>> x[“c”]

Python gives us the relevant data for the index.

Python为我们提供了索引的相关数据。

One example of a data type is the dictionary defined below. The index and values correlate to keys and values. We can use the index to get the values of data corresponding to the labels in the index.

数据类型的一个示例是下面定义的字典。索引和值与键和值相关。我们可以使用索引来获取与索引中的标签相对应的数据值。

>>> data = {‘abc’: 1, ‘def’: 2, ‘xyz’: 3}

>>> pd.Series(data)

abc 1

def 2

xyz 3

dtype: int64

Another interesting feature in Series is having data as a scalar value. In that case, the data value gets repeated for each of the indexes defined.

系列中另一个有趣的功能是将数据作为标量值。在这种情况下，对于定义的每个索引，数据值都会重复。

>>> x = pd.Series(3, index=[‘a’, ‘b’, ‘c’, ‘d’])

>>> x

a 3

b 3

c 3

d 3

dtype: int64

数据框 (DataFrame)

A DataFrame is a two dimensional object that can have columns with potential different types. Different kind of inputs include dictionaries, lists, series, and even another DataFrame.

DataFrame是一个二维对象，可以包含具有潜在不同类型的列。不同种类的输入包括字典，列表，序列，甚至另一个DataFrame。

It is the most commonly used pandas object.

它是最常用的熊猫对象。

Lets go ahead and create a DataFrame by passing a NumPy array with datetime as indexes and labeled columns:

让我们继续通过传递一个以日期时间为索引和带标签的列的NumPy数组来创建一个DataFrame：

>>> import numpy as np

>>> dates = pd.date_range(‘20170505’, periods = 8)

>>> dates

DatetimeIndex([‘2017–05–05’, ‘2017–05–06’, ‘2017–05–07’, ‘2017–05–08’,

‘2017–05–09’, ‘2017–05–10’, ‘2017–05–11’, ‘2017–05–12’],

dtype=’datetime64[ns]’, freq=’D’)

>>> df = pd.DataFrame(np.random.randn(8,3), index=dates, columns=list(‘ABC’))

>>> df

A B C

2017–05–05 -0.301877 1.508536 -2.065571

2017–05–06 0.613538 -0.052423 -1.206090

2017–05–07 0.772951 0.835798 0.345913

2017–05–08 1.339559 0.900384 -1.037658

2017–05–09 -0.695919 1.372793 0.539752

2017–05–10 0.275916 -0.420183 1.744796

2017–05–11 -0.206065 0.910706 -0.028646

2017–05–12 1.178219 0.783122 0.829979

A DataFrame with a datetime range of 8 days gets created as shown above. We can view the top and bottom rows of the frame using df.head and df.tail:

如上所示，将创建日期时间范围为8天的DataFrame。我们可以使用df.head和df.tail查看框架的顶部和底部行：

>>> df.head()

A B C

2017–05–05 -0.301877 1.508536 -2.065571

2017–05–06 0.613538 -0.052423 -1.206090

2017–05–07 0.772951 0.835798 0.345913

2017–05–08 1.339559 0.900384 -1.037658

2017–05–09 -0.695919 1.372793 0.539752

>>> df.tail()

A B C

2017–05–08 1.339559 0.900384 -1.037658

2017–05–09 -0.695919 1.372793 0.539752

2017–05–10 0.275916 -0.420183 1.744796

2017–05–11 -0.206065 0.910706 -0.028646

2017–05–12 1.178219 0.783122 0.829979

We can observe a quick statistic summary of our data too:

我们也可以观察到我们数据的快速统计摘要：

>>> df.describe()

A B C

count 8.000000 8.000000 8.000000

mean 0.372040 0.729842 -0.109691

std 0.731262 0.657931 1.244801

min -0.695919 -0.420183 -2.065571

25% -0.230018 0.574236 -1.079766

50% 0.444727 0.868091 0.158633

75% 0.874268 1.026228 0.612309

max 1.339559 1.508536 1.744796

We can also apply functions to the data like cumulative sum, view histograms, merging DataFrames, concatenating and reshaping DataFrames.

我们还可以对数据应用函数，例如累积总和，查看直方图，合并DataFrame，连接和重塑DataFrame。

>>> df.apply(np.cumsum)

A B C

2017–05–05 -0.301877 1.508536 -2.065571

2017–05–06 0.311661 1.456113 -3.271661

2017–05–07 1.084612 2.291911 -2.925748

2017–05–08 2.424171 3.192296 -3.963406

2017–05–09 1.728252 4.565088 -3.423654

2017–05–10 2.004169 4.144905 -1.678858

2017–05–11 1.798104 5.055611 -1.707504

2017–05–12 2.976322 5.838734 -0.877526

You can read more details about these data structures here.

您可以在此处阅读有关这些数据结构的更多详细信息。

翻译自: https://www.freecodecamp.org/news/series-and-dataframe-in-python-a800b098f68/

Python中的Series和DataFrame相关推荐

【Python】Python常用的Series 和 Dataframe处理方法
Series 和 Dataframe格式的数据处理工作,有很多常用的也比较巧妙的小方法,现总结下,方便理解应用. 本文会已方法基础格式+代码样例的形式加以讲解说明. 一基础方法介绍 Series 和 ...
python中的series的结构_pandas 数据结构之Series的使用方法
1. Series Series 是一个类数组的数据结构,同时带有标签(lable)或者说索引(index). 1.1 下边生成一个最简单的Series对象,因为没有给Series指定索引,所以此时会 ...
python中fillna函数_Pandas DataFrame.fillna()例子
本文概述我们可以使用fillna()函数填充数据集中的空值. 句法 DataFrame.fillna(value=None, method=None, axis=None, inplace=Fals ...
python series用法_如何使用Python中的Series字典创建数据框？
数据框是一种二维数据结构,其中数据以表格格式存储,以行和列的形式. 它可以可视化为SQL数据表或excel工作表表示形式.可以使用以下构造函数创建它-pd.Dataframe(data, index, ...
python中set index_python pandas DataFrame.set_index用法及代码示例
使用现有列设置DataFrame索引. 使用一个或多个现有列或数组(长度正确)设置DataFrame索引(行标签).索引可以替换现有索引或在其上扩展. 参数: keys:label 或 array-l ...
用python计算pi的值_如何使用python中的series计算pi的值？
欢迎来到StackOverFlow.在因此,您的代码有一些问题: 首先,您应该在代码的开头使用import math语句(除非您刚刚排除了它).这允许您使用math.sqrt()和math.pow( ...
python中shift函数_pandas DataFrame.shift()函数
pandas DataFrame.shift()函数可以把数据移动指定的位数 period参数指定移动的步幅,可以为正为负.axis指定移动的轴,1为行,0为列. eg: 有这样一个DataFrame ...
python dataframe的某一列变为list_NumPy中的ndarray与Pandas的Series和DataFrame之间的区别与转换...
在数据分析中,经常涉及numpy中的ndarray对象与pandas的Series和DataFrame对象之间的转换,让一些开发者产生了困惑.本文将简单介绍这三种数据类型,并以金融市场数据为例,给出相 ...
python对excel某一列去重-python中怎么对dataframe列去重
python中对已经生成的Series,怎样组合成DataFrame 如 a = Series([1,2,3]) b = Series([2,3,4]) 怎样将a b组合成一个DataFzip函数接受 ...

Python中的Series和DataFrame

Python中的Series和DataFrame (Series and DataFrame in Python)

系列 (Series)

数据框 (DataFrame)

Python中的Series和DataFrame相关推荐

最新文章

热门文章