使用pandas循环数据帧的最有效方法是什么？ [重复]

本文翻译自：What is the most efficient way to loop through dataframes with pandas? [duplicate]

This question already has an answer here: 这个问题在这里已有答案：

How to iterate over rows in a DataFrame in Pandas? 如何迭代Pandas中的DataFrame中的行？ 17 answers 17个答案

I want to perform my own complex operations on financial data in dataframes in a sequential manner. 我希望以顺序方式对数据框中的财务数据执行我自己的复杂操作。

For example I am using the following MSFT CSV file taken from Yahoo Finance : 例如，我使用从Yahoo Finance获取的以下MSFT CSV文件：

Date,Open,High,Low,Close,Volume,Adj Close
2011-10-19,27.37,27.47,27.01,27.13,42880000,27.13
2011-10-18,26.94,27.40,26.80,27.31,52487900,27.31
2011-10-17,27.11,27.42,26.85,26.98,39433400,26.98
2011-10-14,27.31,27.50,27.02,27.27,50947700,27.27....

I then do the following: 然后我做以下事情：

#!/usr/bin/env python
from pandas import *df = read_csv('table.csv')for i, row in enumerate(df.values):date = df.index[i]open, high, low, close, adjclose = row#now perform analysis on open/close based on date, etc..

Is that the most efficient way? 这是最有效的方式吗？ Given the focus on speed in pandas, I would assume there must be some special function to iterate through the values in a manner that one also retrieves the index (possibly through a generator to be memory efficient)? 鉴于对熊猫速度的关注，我认为必须有一些特殊的函数来迭代遍历值，同时也检索索引（可能通过生成器来节省内存）？ df.iteritems unfortunately only iterates column by column. 遗憾的是， df.iteritems只能逐列迭代。

#1楼

参考：https://stackoom.com/question/Wsws/使用pandas循环数据帧的最有效方法是什么-重复

#2楼

The newest versions of pandas now include a built-in function for iterating over rows. 最新版本的pandas现在包含一个用于迭代行的内置函数。

for index, row in df.iterrows():# do some logic here

Or, if you want it faster use itertuples() 或者，如果你想更快地使用itertuples()

But, unutbu's suggestion to use numpy functions to avoid iterating over rows will produce the fastest code. 但是，unutbu建议使用numpy函数来避免遍历行将产生最快的代码。

#3楼

I checked out iterrows after noticing Nick Crawford's answer, but found that it yields (index, Series) tuples. 我注意到iterrows 福德的回答后检查了它，但发现它产生（索引，系列）元组。 Not sure which would work best for you, but I ended up using the itertuples method for my problem, which yields (index, row_value1...) tuples. 不确定哪个最适合你，但我最终使用itertuples方法来解决我的问题，它产生（index，row_value1 ...）元组。

There's also iterkv , which iterates through (column, series) tuples. 还有iterkv ，它遍历（列，系列）元组。

#4楼

Just as a small addition, you can also do an apply if you have a complex function that you apply to a single column: 只是作为一个小小的补充，如果您具有应用于单个列的复杂函数，也可以执行应用：

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html

df[b] = df[a].apply(lambda col: do stuff with col here)

#5楼

另一个建议是，如果行的子集共享允许您这样做的特征，则将groupby与矢量化计算结合起来。

#6楼

Like what has been mentioned before, pandas object is most efficient when process the whole array at once. 与前面提到的一样，pandas对象在一次处理整个数组时效率最高。 However for those who really need to loop through a pandas DataFrame to perform something, like me, I found at least three ways to do it. 然而对于那些真正需要循环通过pandas DataFrame来执行某些事情的人，比如我，我发现至少有三种方法可以做到这一点。 I have done a short test to see which one of the three is the least time consuming. 我做了一个简短的测试，看看三者中哪一个最耗时。

t = pd.DataFrame({'a': range(0, 10000), 'b': range(10000, 20000)})
B = []
C = []
A = time.time()
for i,r in t.iterrows():C.append((r['a'], r['b']))
B.append(time.time()-A)C = []
A = time.time()
for ir in t.itertuples():C.append((ir[1], ir[2]))
B.append(time.time()-A)C = []
A = time.time()
for r in zip(t['a'], t['b']):C.append((r[0], r[1]))
B.append(time.time()-A)print B

Result: 结果：

[0.5639059543609619, 0.017839908599853516, 0.005645036697387695]

This is probably not the best way to measure the time consumption but it's quick for me. 这可能不是衡量时间消耗的最佳方法，但它对我来说很快。

Here are some pros and cons IMHO: 以下是一些利弊恕我直言：

.iterrows(): return index and row items in separate variables, but significantly slower .iterrows（）：在单独的变量中返回索引和行项，但速度明显较慢
.itertuples(): faster than .iterrows(), but return index together with row items, ir[0] is the index .itertuples（）：比.iterrows（）快，但返回索引和行项，ir [0]是索引
zip: quickest, but no access to index of the row zip：最快，但无法访问该行的索引

使用pandas循环数据帧的最有效方法是什么？ [重复]相关推荐

python最快的循环方法_【转】【Python效率】五种Pandas循环方法效率对比
[Python效率]五种Pandas循环方法效率对比 - 文兄的文章 - 知乎 https://zhuanlan.zhihu.com/p/80880493 正文: 如果你使用过Python及Panda ...
循环下标_【转】【Python效率】五种Pandas循环方法效率对比
[Python效率]五种Pandas循环方法效率对比 - 文兄的文章 - 知乎 https://zhuanlan.zhihu.com/p/80880493 正文: 如果你使用过Python及Panda ...
python pandas合并单元格_利用Python pandas对Excel进行合并的方法示例
前言在网上找了很多Python处理Excel的方法和代码,都不是很尽人意,所以自己综合网上各位大佬的方法,自己进行了优化,具体的代码如下. 博主也是新手一枚,代码肯定有很多需要优化的地方,欢迎各位大 ...
pandas drop 删除行和列的方法
pandas drop 删除行和列的方法文章目录 pandas drop 删除行和列的方法删除行按行索引删除删除单行删除多行删除列按列索引删除(列本来是没有索引的,用df.columns ...
信号模型噪声服从零均值高斯分布_非高斯噪声下基于分数低阶循环谱的调制识别方法...
1 引言当前,绝大多数非高斯噪声的建模形式都为Alpha稳定分布噪声.首先,Alpha稳定分布符合中心极限定理,在理论上适合应用于实际场景中的噪声建模:其次,Alpha稳定分布由于其参数的可变性,包 ...
pandas重置索引的几种方法探究
pandas重置索引的几种方法探究 reset_index() reindex() set_index() 函数名字看起来非常有趣吧! 不仅如此. 需要探究. http://nbviewer.jupy ...
python读取文件路径乱码 linux_Python之pandas读写文件乱码的解决方法
Python之pandas读写文件乱码的解决方法 python读写文件有时候会出现 'XXX'编码不能打开XXX什么的,用记事本打开要读取的文件,另存为UTF-8编码,然后再用py去读应该可以了.如果 ...
S7-1200循环中断OB30的使用方法及示例程序
S7-1200循环中断OB30的使用方法及示例程序 OB组织块的类型: 如下表所示,优先级越大,优先级越高循环中断组织块: 循环中断OB30-OB38以及OB123-OB32767编号的OB. 循环 ...
Python for循环遍历字典(dict)的方法
本文主要Python中,Python2.x和Python3.x分别使用for循环遍历字典(dict)的方法,以及相关的示例代码. 原文地址:Python for循环遍历字典(dict)的方法

使用pandas循环数据帧的最有效方法是什么？ [重复]

#1楼

#2楼

#3楼

#4楼

#5楼

#6楼

使用pandas循环数据帧的最有效方法是什么？ [重复]相关推荐

最新文章

热门文章