Pytorch矩阵乘法之torch.mul() 、 torch.mm() 及torch.matmul()的区别

torch.mul() 、 torch.mm() 及torch.matmul()的区别

一、简介

torch.mul(a, b) 是矩阵a和b对应位相乘，a和b的维度必须相等，比如a的维度是(1, 2)，b的维度是(1, 2)，返回的仍是(1, 2)的矩阵；
torch.mm(a, b) 是矩阵a和b矩阵相乘，比如a的维度是(1, 2)，b的维度是(2, 3)，返回的就是(1, 3)的矩阵。
torch.bmm() 强制规定维度和大小相同
torch.matmul() 没有强制规定维度和大小，可以用利用广播机制进行不同维度的相乘操作

二、具体使用

1、torch.mul(a, b)和torch.mm(a, b)

举例

import torcha = torch.rand(3, 4)
b = torch.rand(3, 4)
c = torch.rand(4, 5)print(torch.mul(a, b).size())  # 返回 1*2 的tensor
print(torch.mm(a, c).size())   # 返回 1*3 的tensor
print(torch.mul(a, c).size())  # 由于a、b维度不同，报错

输出

torch.Size([3, 4])
torch.Size([3, 5])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-27-aea68cb5481f> in <module>7 print(torch.mul(a, b).size())  # 返回 1*2 的tensor8 print(torch.mm(a, c).size())   # 返回 1*3 的tensor
----> 9 print(torch.mul(a, c).size())  # 由于a、b维度不同，报错
RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1

2、torch.bmm()

参考：https://pytorch.org/docs/stable/torch.html#torch.bmm

torch.bmm(input, mat2, out=None) → Tensor
torch.bmm()是tensor中的一个相乘操作，类似于矩阵中的A*B。

参数：

input，mat2：两个要进行相乘的tensor结构，两者必须是3D维度的，每个维度中的大小是相同的。
output：输出结果

并且相乘的两个矩阵，要满足一定的维度要求：input（p,m,n) * mat2(p,n,a) ->output(p,m,a)。这个要求，可以类比于矩阵相乘。前一个矩阵的列等于后面矩阵的行才可以相乘。

举例

import torch
x = torch.rand(2,4,5)
y = torch.rand(2,5,7)
print(torch.bmm(x,y).size())

输出

torch.Size([2, 4, 7])

3、torch.matmul()

torch.matmul()也是一种类似于矩阵相乘操作的tensor联乘操作。但是它可以利用python 中的广播机制，处理一些维度不同的tensor结构进行相乘操作。这也是该函数与torch.bmm()区别所在。

参数：

input,other：两个要进行操作的tensor结构

output:结果

一些规则约定：

（1）若两个都是1D（向量）的，则返回两个向量的点积

（2）若两个都是2D（矩阵）的，则按照（矩阵相乘）规则返回2D

（3）若input维度1D，other维度2D，则先将1D的维度扩充到2D（1D的维数前面+1），然后得到结果后再将此维度去掉，得到的与input的维度相同。即使作扩充（广播）处理，input的维度也要和other维度做对应关系。

输入

import torch
x = torch.rand(5) #1D
x1 = x.view(1,-1)
y = torch.rand(5,3) #2Dprint(x1.size())
print(x.size())
print(y.size())
print(torch.matmul(x,y),'\n',torch.matmul(x,y).size())
print(torch.matmul(x1,y),'\n',torch.matmul(x,y).size())

输出

torch.Size([1, 5])
torch.Size([5])
torch.Size([5, 3])
tensor([1.5374, 1.3291, 1.8289]) torch.Size([3])
tensor([[1.5374, 1.3291, 1.8289]]) torch.Size([3])

（4）若input是2D，other是1D，则返回两者的点积结果。（个人觉得这块也可以理解成给other添加了维度，然后再去掉此维度，只不过维度是(3, )而不是规则(3)中的( ,4)了，但是可能就是因为内部机制不同，所以官方说的是点积而不是维度的升高和下降）

举例

import torch
x = torch.rand(3) #1D
x1 = x.view(-1,1)
y = torch.rand(5,3) #2Dprint(x1.size())
print(x.size())
print(y.size())
print(torch.matmul(y,x),'\n',torch.matmul(y,x).size())
print(torch.matmul(y,x1),'\n',torch.matmul(y,x1).size())

输出

torch.Size([3, 1])
torch.Size([3])
torch.Size([5, 3])
tensor([0.6472, 0.7025, 0.2358, 0.2873, 0.5696]) torch.Size([5])
tensor([[0.6472],[0.7025],[0.2358],[0.2873],[0.5696]]) torch.Size([5, 1])

（5）如果一个维度至少是1D，另外一个大于2D，则返回的是一个批矩阵乘法（ a batched matrix multiply）

（a）若input是1D，other是大于2D的，则类似于规则(3)
（b）若other是1D，input是大于2D的，则类似于规则(4)
（c）若input和other都是3D的，则与torch.bmm()函数功能一样
（d）如果input中某一维度满足可以广播（扩充），那么也是可以进行相乘操作的。例如 input（j,1,n,m）* other (k,m,p) = output(j,k,n,p)

言而总之，总而言之：matmul()根据输入矩阵自动决定如何相乘。低维根据高维需求，合理广播。
参考文献：https://www.jianshu.com/p/e277f7fc67b3