本文翻译自:Difference between map, applymap and apply methods in Pandas

Can you tell me when to use these vectorization methods with basic examples? 你能告诉我什么时候使用这些矢量化方法和基本的例子吗?

I see that map is a Series method whereas the rest are DataFrame methods. 我看到map是一个Series方法,而其余的是DataFrame方法。 I got confused about apply and applymap methods though. 我对applyapplymap方法感到困惑。 Why do we have two methods for applying a function to a DataFrame? 为什么我们有两种方法将函数应用于DataFrame? Again, simple examples which illustrate the usage would be great! 再一次,说明用法的简单例子会很棒!




Straight from Wes McKinney's Python for Data Analysis book, pg. 直接来自Wes McKinney的Python for Data Analysis一书,pg。 132 (I highly recommended this book): 132(我强烈推荐这本书):

Another frequent operation is applying a function on 1D arrays to each column or row. 另一个常见的操作是将1D阵列上的函数应用于每个列或行。 DataFrame's apply method does exactly this: DataFrame的apply方法正是这样做的:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])In [117]: frame
Out[117]: b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548In [118]: f = lambda x: x.max() - x.min()In [119]: frame.apply(f)
b    1.133201
d    1.965980
e    2.829781
dtype: float64

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary. 许多最常见的数组统计信息(如sum和mean)都是DataFrame方法,因此使用apply不是必需的。

Element-wise Python functions can be used, too. 也可以使用元素化的Python函数。 Suppose you wanted to compute a formatted string from each floating point value in frame. 假设您想要从帧中的每个浮点值计算格式化字符串。 You can do this with applymap: 您可以使用applymap执行此操作:

In [120]: format = lambda x: '%.2f' % xIn [121]: frame.applymap(format)
Out[121]: b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

The reason for the name applymap is that Series has a map method for applying an element-wise function: 名称applymap的原因是Series有一个map方法来应用元素方面的函数:

In [122]: frame['e'].map(format)
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object

Summing up, apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series. 总结一下, apply在DataFrame的行/列基础上工作, applymap在DataFrame上按元素工作,并且map在一个Series上以元素方式工作。


@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. @jeremiahbuddha提到apply适用于行/列,而applymap按元素工作。 But it seems you can still use apply for element-wise computation.... 但似乎你仍然可以使用申请元素计算....

    frame.apply(np.sqrt)Out[102]: b         d         eUtah         NaN  1.435159       NaNOhio    1.098164  0.510594  0.729748Texas        NaN  0.456436  0.697337Oregon  0.359079       NaN       NaNframe.applymap(np.sqrt)Out[103]: b         d         eUtah         NaN  1.435159       NaNOhio    1.098164  0.510594  0.729748Texas        NaN  0.456436  0.697337Oregon  0.359079       NaN       NaN


Adding to the other answers, in a Series there are also map and apply . 添加到其他答案,在Series中还有地图和应用 。

Apply can make a DataFrame out of a series ; Apply可以从一个系列中创建一个DataFrame ; however, map will just put a series in every cell of another series, which is probably not what you want. 然而,map只会在另一个系列的每个单元格中放置一个系列,这可能不是你想要的。

In [40]: p=pd.Series([1,2,3])
In [41]: p
0    1
1    2
2    3
dtype: int64In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]: 0  1
0  1  1
1  2  2
2  3  3In [43]: p.map(lambda x: pd.Series([x, x]))
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object

Also if I had a function with side effects, such as "connect to a web server", I'd probably use apply just for the sake of clarity. 此外,如果我有一个副作用的功能,如“连接到Web服务器”,我可能只是为了清楚起见使用apply


Map can use not only a function, but also a dictionary or another series. Map不仅可以使用函数,还可以使用字典或其他系列。 Let's say you want to manipulate permutations . 假设你想操纵排列 。

Take 采取

1 2 3 4 5
2 1 4 5 3

The square of this permutation is 这种排列的平方是

1 2 3 4 5
1 2 5 3 4

You can compute it using map . 您可以使用map计算它。 Not sure if self-application is documented, but it works in 0.15.1 . 不确定是否记录了自我应用程序,但它在0.

In [39]: p=pd.Series([1,0,3,4,2])In [40]: p.map(p)
0    0
1    1
2    4
3    2
4    3
dtype: int64


Just wanted to point out, as I struggled with this for a bit 只是想指出,因为我有点挣扎

def f(x):if x < 0:x = 0elif x > 100000:x = 100000return xdf.applymap(f)

this does not modify the dataframe itself, has to be reassigned 这不会修改数据帧本身,必须重新分配

df = df.applymap(f)


Probably simplest explanation the difference between apply and applymap: 可能最简单的解释apply和applymap之间的区别:

apply takes the whole column as a parameter and then assign the result to this column apply将整列作为参数,然后将结果分配给此列

applymap takes the separate cell value as a parameter and assign the result back to this cell. applymap将单独的单元格值作为参数,并将结果分配回此单元格。

NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix. 注意如果apply返回单个值,则在赋值后将使用此值而不是列,最终将只有一行而不是矩阵。


