序言

最近重新开始学习Pytorch, 期初抽时间把 PyTorch 中文文档给过了一遍, 主要是把几个重要的模块中的方法熟悉了一下, markdown记了数千行, 但是纸上谈兵的学习方法终究还是浮于表面, 在阅读 RE2RNN 和 RE-NET 的过程中, 笔者发现已经无法理解作者模型代码里的逻辑(菜), 很是苦恼;

之前翻pypi发现PyTorch从1.4.0后就没有给windows的wheel安装文件了, Tensorflow的GPU版本更新到2.0.x版本之后就不再支持CUDA9的显卡配置了, 加上旧笔记本一直没有安装Anaconda, Python的版本停滞在3.6.1, 里的Tensorflow和PyTorch的版本一直停留在1.8.0和1.4.0, 之后其实更新的方法跟之前有很大出入, 许多最新论文的项目代码已经不能再跑通; 新笔记本还没舍得装CUDA, 不过最近笔者发现一个 repository 囊括了从最老到最新, 所有平台上各种Python版本的torch和torchvision的wheel安装文件, 实在是帮了大忙;

于是本文主要围绕一个由MRR指标(Section 2.1)衍生出来的MRR损失函数的自定义编写问题展开, 第一部分从PyTorch的损失函数及优化器使用开始, 第三部分就笔者近期记录的一些零散的torch和torchvision库的使用点作为补充结束, 也许会持续更新, 也许会新开博客记录, 只是持续更新总成空谈, 很无奈; 第二部分将详细解析该损失函数的纯torch函数的编写思路, 并介绍一种Python内置的比较少见的排序方法, 希望能对各位朋友有所帮助;

序言
1 PyTorch中的损失函数与优化器
- 1.1 torch.nn中定义的损失函数
- 1.2 torch.optim中定义的优化器
- 1.3 损失函数与优化器在模型训练中的使用方法
- 1.4 自定义损失函数的编写方法
2 自定义的MRR损失函数
- 2.1 MRR指标
- 2.2 MRR损失函数定义
- 2.3 MRR损失函数实现
- - 2.4.1 一种有趣的尝试
  - 2.4.2 全torch方法实现
3 近期torch与torchvision要点记录
- 3.1 torch杂记
- - 3.1.1 torch.utils.data.DataLoader用法
  - 3.1.2 torch.permute 与 torch.transpose
  - 3.1.3 torch.sort
- 3.2 torchvision杂记
- - 3.2.1 torchvision.transforms模块用法
后记

1 PyTorch中的损失函数与优化器

1.1 torch.nn中定义的损失函数

PyTorch中文文档
文档中似乎没有写出所有的损失函数, 详细可以查看E:\Anaconda3\Lib\site-packages\torch\nn\modules\loss.py中的定义;

class torch.nn.L1Loss(size_average=True): 绝对平均误差(MAE);
torch.nn.MSELoss(size_average=True): 均方误差(MSE);
torch.nn.CrossEntropyLoss(weight=None, size_average=True): 交叉熵损失(常用);

weight可以输入一个1D张量, 包含n个权重值, 作为n个类别各自的权重, 这往往再样本标签分布不均时是很有意义的;

torch.nn.NLLLoss(weight=None, size_average=True): 负对数似然损失;
torch.nn.NLLLoss2d(weight=None, size_average=True): 通常针对图片类型的2D负对数似然损失;

使用示例:

m = nn.Conv2d(16, 32, (3, 3)).float()
loss = nn.NLLLoss2d()
# input is of size nBatch x nClasses x height x width
input = autograd.Variable(torch.randn(3, 16, 10, 10))
# each element in target has to have 0 <= value < nclasses
target = autograd.Variable(torch.LongTensor(3, 8, 8).random_(0, 4))
output = loss(m(input), target)
output.backward()

class torch.nn.KLDivLoss(weight=None, size_average=True): KL散度损失;

KL散度即相对熵, 与交叉熵类似, 通常是针对离散概率分布的估计损失;
如真实分布为 $[0.1, 0.4, 0.5]$ , 预测分布为 $[0.4, 0.2, 0.4]$ , 则KL散度计算公式如下: $D_{KL}(P|Q)=0.1×\log(\frac{0.1}{0.4})+0.4×\log(\frac{0.4}{0.2})+0.5×\log(\frac{0.5}{0.4})=0.25\\D_{KL}(Q|P)=0.4×\log(\frac{0.4}{0.1})+0.2×\log(\frac{0.2}{0.4})+0.4×\log(\frac{0.4}{0.5})=0.327\tag{1}$
由(1)式可以看出KL散度是不满足交换律的, 这是一般用交叉熵而非相对熵的原因之一;
另一个原因是, 从计算过程可以看出, 如果有某个离散分布概率为0, KL散度将趋于无穷;

torch.nn.BCELoss(weight=None, size_average=True): 二进制交叉熵;

文档中表述为用于计算Auto-Encoder的Reconstruction Error;

torch.nn.MarginRankingLoss(margin=0, size_average=True)
torch.nn.HingeEmbeddingLoss(size_average=True)
torch.nn.MultiLabelMarginLoss(size_average=True)
class torch.nn.SmoothL1Loss(size_average=True)
torch.nn.SoftMarginLoss(size_average=True)
torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=True)
torch.nn.CosineEmbeddingLoss(margin=0, size_average=True)
torch.nn.MultiMarginLoss(p=1, margin=1, weight=None, size_average=True)

1.2 torch.optim中定义的优化器

PyTorch中文文档
文档中似乎没有详细写出所有的优化器, 常用的是Adam, 可以观察E:\Anaconda3\Lib\site-packages\torch\optim目录下的所有函数即可;

torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-6, weight_decay=0)
torch.optim.Adagrad(params, lr=1.0, lr_decay=0, weight_decay=0, initial_accumulator_value=0, eps=1e-10)
torch.optim.Adam(params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0, amsgrad=False)
torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-6, weight_decay=0)
torch.optim.Adamax(params, lr=2e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0)
torch.optim.AdamW(params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=1e-2, amsgrad=False)
torch.optim.ASGD(params, lr=1e-2, lambd=1e-4, alpha=0.75, t0=1e6, weight_decay=0)
torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-7, tolerance_change=1e-9, history_size=100, line_search_fn=None)
torch.optim.RMSprop(params, lr=1e-2, alpha=0.99, eps=1e-8, weight_decay=0, momentum=0, centered=False)
torch.optim.Rprop(params, lr=1e-2, etas=(0.5, 1.2), step_sizes=(1e-6, 50))
torch.optim.SGD(params, lr=required, momentum=0, dampening=0, weight_decay=0, nesterov=False)
torch.optim.SparseAdam(params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8)

上述所有的params参数一般设置为model.parameters();

1.3 损失函数与优化器在模型训练中的使用方法

以下代码转载自 CSDN@dongyangY 的博客使用PyTorch实现CNN , 笔者觉得写得非常详细, 非常适合初学者对Pytorch从数据处理到模型训练以及最终的评估全流程的熟悉, 侵删!

import torch
from torch.utils import data
from torch.autograd import Variable
import torchvision as tv
from torchvision.datasets import mnist
from maplotlib import pyplot as pltdata_path = r'D:\code\python\project\other\torch\data'
model_saving_path = r'D:\code\python\project\other\torch\model\cnnnet.model'# 导入数据
transformer = tv.transforms.Compose([tv.transforms.ToTensor(),tv.transforms.Normalize([.5], [.5])
])train_data = mnist.MNIST(data_path, train=True, transform=transformer, download=False)
test_data = mnist.MNIST(data_path, train=False, transform=transformer, download=False)
train_loader = data.DataLoader(train_data, batch_size=128, shuffle=True)
test_loader = data.DataLoader(test_data, batch_size=100, shuffle=True)# 构建网络模型
class CNNnet(torch.nn.Module):def __init__(self):super(CNNnet,self).__init__()self.conv1 = torch.nn.Sequential(torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=2, padding=1),torch.nn.BatchNorm2d(16),torch.nn.ReLU(),)self.conv2 = torch.nn.Sequential(torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=2, padding=1),torch.nn.BatchNorm2d(32),torch.nn.ReLU())self.conv3 = torch.nn.Sequential(torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=2, padding=1),torch.nn.BatchNorm2d(64),torch.nn.ReLU())self.conv4 = torch.nn.Sequential(torch.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=2, stride=2, padding=0),torch.nn.BatchNorm2d(64),torch.nn.ReLU())self.mlp1 = torch.nn.Linear(in_features=2*2*64, out_features=100)self.mlp2 = torch.nn.Linear(in_features=100, out_features=10)   def forward(self, x):x = self.conv1(x)x = self.conv2(x)x = self.conv3(x)x = self.conv4(x)x = self.mlp1(x.view(x.size(0), -1))x = self.mlp2(x)return x
model = CNNnet()# 定义损失函数与优化器
loss_func = torch.nn.CrossEntropyLoss()
opt = torch.optim.Adam(model.parameters(), lr=0.001)# 训练网络: 损失函数调用后需要backward传播, 优化器则每次需要清零后再前进一步
loss_count = []
for epoch in range(2):for i, (x, y) in enumerate(train_loader):batch_x = Variable(x)          # torch.Size([128, 1, 28, 28])batch_y = Variable(y)          # torch.Size([128])out = model(batch_x)           # 获取最后输出: torch.Size([128, 10])loss = loss_func(out, batch_y) # 获取损失opt.zero_grad() # 清空上一步残余更新参数值loss.backward() # 误差反向传播, 计算参数更新值opt.step()      # 将参数更新值施加到net的parmeters上if i % 20 == 0:loss_count.append(loss)print('{}:\t'.format(i), loss.item())torch.save(model, model_saving_path)if i % 100 == 0:for a,b in test_loader:test_x = Variable(a)test_y = Variable(b)out = model(test_x)# print('test_out:\t', torch.max(out,1)[1])# print('test_y:\t', test_y)accuracy = torch.max(out, 1)[1].numpy() == test_y.numpy()print('accuracy:\t', accuracy.mean())break# 损失函数绘图
plt.figure('PyTorch_CNN_Loss')
plt.plot(loss_count, label='Loss')
plt.xlabel('step')
plt.ylabel('loss value')
plt.legend()
plt.show()

1.4 自定义损失函数的编写方法

方法: 继承torch.nn.Module类后, 编写其中的forward函数, 输入参数为真实标签值与预测标签值;
以交叉熵损失的源码为例:

from .. import functional as Fclass CrossEntropyLoss(_WeightedLoss):__constants__ = ['ignore_index', 'reduction']ignore_index: intdef __init__(self, weight: Optional[Tensor] = None, size_average=None, ignore_index: int = -100,reduce=None, reduction: str = 'mean') -> None:super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)self.ignore_index = ignore_indexdef forward(self, input: Tensor, target: Tensor) -> Tensor:return F.cross_entropy(input, target, weight=self.weight,ignore_index=self.ignore_index, reduction=self.reduction)

以下代码转载自 CSDN@我的辉的博客 pytorch系列教程(四)-自定义损失函数 , 侵删!

class Loss_yolov1(nn.Module):def __init__(self):super(Loss_yolov1,self).__init__()def forward(self, pred, labels):""":param pred: (batchsize,30,7,7)的网络输出数据:param labels: (batchsize,30,7,7)的样本标签数据:return: 当前批次样本的平均损失"""num_gridx, num_gridy = labels.size()[-2:]  # 划分网格数量num_b = 2  # 每个网格的bbox数量num_cls = 20  # 类别数量noobj_confi_loss = 0.  # 不含目标的网格损失(只有置信度损失)coor_loss = 0.  # 含有目标的bbox的坐标损失obj_confi_loss = 0.  # 含有目标的bbox的置信度损失class_loss = 0.  # 含有目标的网格的类别损失n_batch = labels.size()[0]  # batchsize的大小# 可以考虑用矩阵运算进行优化, 提高速度, 为了准确起见, 这里还是用循环for i in range(n_batch):  # batchsize循环for n in range(7):  # x方向网格循环for m in range(7):  # y方向网格循环if labels[i,4,m,n]==1:# 如果包含物体# 将数据(px,py,w,h)转换为(x1,y1,x2,y2)# 先将px,py转换为cx,cy, 即相对网格的位置转换为标准化后实际的bbox中心位置cx,xy# 然后再利用(cx-w/2,cy-h/2,cx+w/2,cy+h/2)转换为xyxy形式, 用于计算ioubbox1_pred_xyxy = ((pred[i,0,m,n]+m)/num_gridx - pred[i,2,m,n]/2,(pred[i,1,m,n]+n)/num_gridy - pred[i,3,m,n]/2,(pred[i,0,m,n]+m)/num_gridx + pred[i,2,m,n]/2,(pred[i,1,m,n]+n)/num_gridy + pred[i,3,m,n]/2)bbox2_pred_xyxy = ((pred[i,5,m,n]+m)/num_gridx - pred[i,7,m,n]/2,(pred[i,6,m,n]+n)/num_gridy - pred[i,8,m,n]/2,(pred[i,5,m,n]+m)/num_gridx + pred[i,7,m,n]/2,(pred[i,6,m,n]+n)/num_gridy + pred[i,8,m,n]/2)bbox_gt_xyxy = ((labels[i,0,m,n]+m)/num_gridx - labels[i,2,m,n]/2,(labels[i,1,m,n]+n)/num_gridy - labels[i,3,m,n]/2,(labels[i,0,m,n]+m)/num_gridx + labels[i,2,m,n]/2,(labels[i,1,m,n]+n)/num_gridy + labels[i,3,m,n]/2)iou1 = calculate_iou(bbox1_pred_xyxy,bbox_gt_xyxy)iou2 = calculate_iou(bbox2_pred_xyxy,bbox_gt_xyxy)# 选择iou大的bbox作为负责物体if iou1 >= iou2:coor_loss = coor_loss + 5 * (torch.sum((pred[i,0:2,m,n] - labels[i,0:2,m,n])**2) \+ torch.sum((pred[i,2:4,m,n].sqrt()-labels[i,2:4,m,n].sqrt())**2))obj_confi_loss = obj_confi_loss + (pred[i,4,m,n] - iou1)**2# iou比较小的bbox不负责预测物体, 因此confidence loss算在noobj中, 注意, 对于标签的置信度应该是iou2noobj_confi_loss = noobj_confi_loss + 0.5 * ((pred[i,9,m,n]-iou2)**2)else:coor_loss = coor_loss + 5 * (torch.sum((pred[i,5:7,m,n] - labels[i,5:7,m,n])**2) \+ torch.sum((pred[i,7:9,m,n].sqrt()-labels[i,7:9,m,n].sqrt())**2))obj_confi_loss = obj_confi_loss + (pred[i,9,m,n] - iou2)**2# iou比较小的bbox不负责预测物体, 因此confidence loss算在noobj中,注意, 对于标签的置信度应该是iou1noobj_confi_loss = noobj_confi_loss + 0.5 * ((pred[i, 4, m, n]-iou1) ** 2)class_loss = class_loss + torch.sum((pred[i,10:,m,n] - labels[i,10:,m,n])**2)else:  # 如果不包含物体noobj_confi_loss = noobj_confi_loss + 0.5 * torch.sum(pred[i,[4,9],m,n]**2)loss = coor_loss + obj_confi_loss + noobj_confi_loss + class_loss# 此处可以写代码验证一下loss的大致计算是否正确, 这个要验证起来比较麻烦, 比较简洁的办法是, 将输入的pred置为全1矩阵, 再进行误差检查, 会直观很多;return loss/n_batch

2 自定义的MRR损失函数

其实可能这个损失函数并没有什么意义, 至少目前笔者还没有看到过有人提出这个损失函数, 只是笔者突然想起去年ACM的Recsys2019的推荐系统竞赛里的评价指标是MRR, 这个指标是跟排序相关的, 当然本身这还是一个多分类问题, 所以简单用交叉熵作为损失函数也就可以了, 今天突发奇想会不会有一种MRR损失, 百度了一下没有看到有, 在torch的损失函数源码里也没有类似的, 其实torch的损失函数源码(版本号1.6.0)中确实是有三个未完成的损失函数的:


# TODO: L1HingeEmbeddingCriterion
# TODO: MSECriterion weight
# TODO: ClassSimplexCriterion

似乎也不是很搭嘎, 出于加深记忆的考虑笔者决定写一个MRR损失函数, 过程中也算是学到了一些小细节, 抛砖引玉;

2.1 MRR指标

MRR指标是Mean Reciprocal Rank的缩写, 翻译过来就是平均倒数排名, 设想这样一个场景, 在B站看完视频后下方总会有若干个推荐视频, 如果你点击了推荐的第二个视频, 那么算法得分就为 $\frac{1}{2}$ , 这称为hit@2; 如果你点击了推荐的第三个视频, 那么算法得分就为 $\frac{1}{3}$ , 当然如果你点击了推荐的第一个视频, 那么算法得分就是 $1$ , 最后评估指标将计算所有样本的在MRR指标上的算法得分的均值;

在ACM的Recsys2019的推荐系统竞赛中, 早年的相关论文给出的baseline评估指标大约是0.2, 即平均约为hit@5, 榜单上前十基本都做到了0.6以上, 排名第一最终超过了0.7, 可以说是相当高的MRR评估值了, 这个推荐系统大约是需要在30个选项中给出排序, 能达到0.7意味着平均来说几乎有一半的都达到了hit@1, 非常惊人的命中率;

笔者就比较菜了, 最后拼死拼活只调到了0.28, 大约是hit@3到hit@4的水平, 总之这种多项推荐的任务其实还是比较具有挑战性的, 如果只用精确度作为模型衡量指标是不够完全的;

2.2 MRR损失函数定义

当时笔者使用的是交叉熵损失函数, 那么能否直接将评估指标MRR做成损失函数呢, 笔者试着给出如下的MRR损失函数的定义:

假设需要对 $n$ 个类别进行排序推荐, 这 $n$ 个类别最终的概率分布(如softmax)向量为 ${\rm input} = y^{pred} = (y_1^{pred}, y_2^{pred},..., y_n^{pred})$ , 真实的概率分布为 ${\rm target} = y^{true} = (y_1^{true}, y_2^{true},..., y_n^{true})$

注意这里的 ${\rm target}$ 一般来说就是某个位置值为1, 其余位置都是0, 当然也未必强制如此, 如果真实标签就是一个带离散概率分布的向量, 也是一种不错的情况, 只是此时损失函数值已经无法减小到0;

定义 ${\rm rank_{input}} = (r_1, r_2,..., r_n)$ , 其中 $r_i$ 表示 $y_i^{pred}$ 是 $y^{pred}$ 中 $r_i$ - $\rm th$ - $\rm largest$ 的数字, 显然 ${\rm rank_{input}}$ 是 $(1, 2, 3, . . ., n)$ 的一个排列;
则MRR损失函数值定义为 ${\rm MRRLoss}(y^{pred}, y^{true}) = 1 - \sum_{i=1}^{n}y_i^{true}\frac{1}{r_i}$

简单验证如果所有的排序都做到了hit@1, 则显然损失函数值为0;
正如上面所述, 如果真实标签就是一个带离散概率分布的向量, 损失函数值将无法减小到0, 但是这也无所谓, 本身来说交叉熵也会有同样的问题;

2.3 MRR损失函数实现

定义好MRR损失函数后就需要开始实现啦!

本质上这里只有一个问题, 就是需要编写一个获得输入张量的排序索引的函数get_rank_index, 这将在下面详细论述, 笔者可以直接先把除排序索引外的逻辑写好:

# -*- coding: UTF-8 -*-
# @author: caoyang
# @email: caoyang@163.sufe.edu.cnimport numpy
import torchdef get_rank_index(tensor):# TODOpassclass MRRLoss(torch.nn.Module):"""MRR is the abbreviation of Mean Reciprocal Rank, which is a metric widely used in IR.It can be calculated as the average reciprocal of the order that prediction hits ground truth.For example: - The ground truth is 'cat', but got predicted sequence ('dog', 'mouse', 'cat', 'tiger').- It indicates that the hit is 3 and the metric rank is $\frac{1}{3}$.MRR loss is just slightly modified based on MRR as below:$${\rm input} = y^{pred} = (y_1^{pred}, y_2^{pred},..., y_n^{pred}) \\{\rm target} = y^{true} = (y_1^{true}, y_2^{true},..., y_n^{true}) \\{\rm rank_{input}} = (r_1, r_2,..., r_n) \\{\rm MRRLoss}(y^{pred}, y^{true}) = 1 - \sum_{i=1}^{n}y_i^{true}\frac{1}{r_i}$$ - Where $r_i$ indicates that $y_i^{pred}$ is the $r_i$-$\rm th$ largest digit among $y^{pred}$.- Without loss of generality, we assume that $\sum_{i=1}^{n}y_i^{pred}=\sum_{i=1}^{n}y_i^{true}=1$."""def __init__(self, score_function):super(MRRLoss, self).__init__()def forward(self, input, target):"""Calculate MRR loss of input and target.Note that input or target whose ${\rm dim}>2$ will be reshaped to ${\rm dim}=2$ by flatting the all the dims except ${\rm dim}_0$.:param input: **torch.FloatTensor**, predicted label whose shape is expected as (batch_size, n_candidates) or (n_candidates,).:param target: **torch.FloatTensor**, true label whose shape is expected as (batch_size, n_candidates) or (n_candidates,).:return: **torch.FloatTensor**, MRR loss of shape (1,1). """assert input.shape == target.shape                               # The shape of input must be the same as that of target.print(len(input.shape))if len(input.shape) == 1:                                      # Single sample loss.passelif len(input.shape) == 2:                                       # Loss of multiple samples in batch training.passelse:                                                              # Just reduce to **len(input.shape) == 2**batch_size = input.shape[0]input = input.reshape((batch_size, -1))rank_index = get_rank_index(input, dim=-1, descending=True)reciprocal_sorted_index = 1. / (rank_index + 1.)return 1 - torch.mean(torch.sum(mrr, -1))

2.4.1 一种有趣的尝试

事实上PyTorch的自定义模块, 不论是自定义层, 或是自定义优化器, 抑或这里的自定义损失函数, 尽量是用PyTorch中内置的运算函数来计算, 虽然笔者也不知道为什么, 但是就是有这种感觉, 否则凭什么PyTorch能高效的求导, 然后都不用写backward的逻辑就能反向传播? 注意到PyTorch训练过程中都是需要有loss.backward(), 如果不编写自定义损失函数的backward, 那就意味着PyTorch自身有一套反向传播的逻辑, 总之笔者觉得本文Section 1.4里举的那个自定义损失函数写得是比较丑的, 用了一堆循环肯定效率很差, 如果说错请轻喷[Facepalm];

回到MRR这里有一个比较麻烦的事情就是要取得排序的索引, 即要知道模型预测标签值, 这个事情其实并不是麻烦在怎么实现, 只是笔者不是很懂这种带排序逻辑的损失函数PyTorch会怎么求导再反向传播? 因为笔者在这种方法中测试loss.backward()确实是不报错的, 实话说有些神奇…

笔者找到了一个Python内置的方法:

def sorted_index(array):sorted_index_array = sorted(enumerate(array), key=lambda k: k[1])sorted_index = [x[0] for x in sorted_index_array]sorted_array = [x[1] for x in sorted_index_array]return sorted_index, sorted_arrayif __name__ == '__main__':array = [1,3,5,7,9,2,4,6,8,10]sorted_index, sorted_array = sorted_index(array)print(sorted_index)print(sorted_array)

输出结果:

[0, 5, 1, 6, 2, 7, 3, 8, 4, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

经过测试这里如果参数array是一个1D-tensor的话仍然是可以得出正确结果的, 因此直接把sorted_index函数当成get_rank_index即可; 事实上笔者测试下来确实不影响求导和反向传播,

这里存在的问题是sorted_index只能处理1D-tensor, 如果输入为第一维是batch_size的张量, 就必须要通过循环来处理了, 这将会是非常低效的办法, 但是这个方法确实也是比较直观的方式;

2.4.2 全torch方法实现

后来笔者发现了torch.sort, 好消息是该函数的返回值是带有index的, 坏消息是这个index不是笔者所期望的索引;

torch.sort(input, dim=None, descending=False, out=None) -> (Tensor, LongTensor)

torch.sort用于张量排序, 默认值dim=None在最后一个维度上进行排序;
返回值有两个:

第一个返回值是排好序的张量;
第二个返回值是排好序的张量中每个元素对应原先张量中的位置索引;
- 注意很多时候(如在计算MRR指标时)我们想要的是原先张量中每个元素的值排名, 这与第二个参数是有区别的;

用一个例子来解释:

import numpy
import torch
tensor = torch.FloatTensor(numpy.array([[.3, .2, .1, .4], [.3, .2, .1, .4]], dtype=numpy.float64))
sorted_tensor, index = torch.sort(tensor, descending=True)
print(sorted_tensor, type(tensor))
print(index, type(index))# Output:
"""
tensor([[0.4000, 0.3000, 0.2000, 0.1000],[0.4000, 0.3000, 0.2000, 0.1000]]) <class 'torch.Tensor'>
tensor([[3, 0, 1, 2],[3, 0, 1, 2]]) <class 'torch.Tensor'>
"""

对于向量 $[0.3, 0.2, 0.1, 0.4]$ 我们往往需要的是每个元素的排名向量 $[1, 2, 3, 0]$ , 而返回值是 $[3, 0, 1, 2]$ , 因此需要作相关处理;

问题来了, 这个相关处理应该怎么做? 怎么把 $[3, 0, 1, 2]$ 转为 $[1, 2, 3, 0]$ ?

实话说这年头确实是脑子生锈了, 想了很长时间, 一开始一直在想是不是还是只能用循环去这种置换逻辑了, 后来发现这不是一个简单的矩阵乘向量的问题么?

首先把 $[3, 0, 1, 2]$ 变成 $[[0, 0, 0, 1], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]]$ , 然后左乘一个横向量 $[0, 1, 2, 3]$ 即可得到 $[3, 0, 1, 2]$

这样完美的将排序索引转为矩阵乘法, 上述算法的第一步可以用torch.nn.functional模块中的one_hot函数解决, 第二步就是一个矩阵乘法, 进一步发现这是支持nD-tensor(n>1)的输入结果的运算的, 详细实现如下所示:

# -*- coding: UTF-8 -*-
# @author: caoyang
# @email: caoyang@163.sufe.edu.cnimport torch
import numpyfrom torch.nn import functional as Fdef get_rank_index(tensor, dim=-1, descending=True):"""Sort tensor and then return the rank index of each element in the original tensor.For example:>>> tensor = torch.FloatTensor(numpy.array([[.3, .2, .1, .4], [.2, .3, .1, .4], [.1, .3, .2, .4]], dtype=numpy.float64))>>> get_rank_index(tensor)tensor([[1, 2, 3, 0],[2, 1, 3, 0],[3, 1, 2, 0]]):param tensor: **torch.FloatTensor**, tensor of shape which is needed to be ranked.:param dim: **int**, dim on which we sort, default -1 means sort on the last dim.:param descending: **bool**, whether or not sort in descending order, default yes.:return rank_index: **torch.FloatTensor**, rank index of each element in the original tensor."""sorted_tensor, sorted_index = torch.sort(tensor, dim=dim, descending=descending)sorted_index_onehot = F.one_hot(sorted_index)ordered_sequence = torch.arange(0, sorted_index.shape[-1])rank_index = torch.matmul(ordered_sequence, sorted_index_onehot) return rank_index

这里注意torch.mm主要就是二维矩阵的乘法, torch.matmul可以支持三维以上矩阵的乘法, 比较依赖想象力;

将写好的get_rank_index添加到第二部分开头的# TODO中即可;

3 近期torch与torchvision要点记录

3.1 torch杂记

3.1.1 torch.utils.data.DataLoader用法

直接对给定的dataset制作批训练集生成器:

from torch.utils.data import DataLoader
dataset = np.array([[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]],[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]],[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]],[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]],
])
loader = DataLoader(dataset, batch_size=2, shuffle=True)
for batch in loader:print(batch)# Output:
"""
tensor([[[ 0,  1,  2,  3],[ 4,  5,  6,  7],[ 8,  9, 10, 11],[12, 13, 14, 15]],[[ 0,  1,  2,  3],[ 4,  5,  6,  7],[ 8,  9, 10, 11],[12, 13, 14, 15]]], dtype=torch.int32)
tensor([[[ 0,  1,  2,  3],[ 4,  5,  6,  7],[ 8,  9, 10, 11],[12, 13, 14, 15]],[[ 0,  1,  2,  3],[ 4,  5,  6,  7],[ 8,  9, 10, 11],[12, 13, 14, 15]]], dtype=torch.int32)
"""

对于torch自带的dataset也可以直接操作制作批训练集生成器(mnist.MNIST函数返回的数据集本身不是tensor类型, 里面包含了很多其他变量):

from torchvision.datasets import mnist
from torch.autograd import Variabletrain_data = mnist.MNIST(r'data\mnist', train=True, transform=transformer, download=False)
test_data = mnist.MNIST(r'data\mnist', train=False, transform=transformer, download=False)train_loader = data.DataLoader(train_data, batch_size=128, shuffle=True)
test_loader = data.DataLoader(test_data, batch_size=100, shuffle=True)# 调用数据
for i, (x, y) in enumerate(train_loader):batch_x = Variable(x) # torch.Size([128, 1, 28, 28])batch_y = Variable(y) # torch.Size([128])

3.1.2 torch.permute 与 torch.transpose

对于torch中相当数量的简单函数f, torch.f(tensor, *args)与tensor.f(*args)是等价的;
似乎除了矩阵转置外, 高维张量的维度置换并不那么直观, permute与reshape有本质区别;

torch.permute常用于置换维度次序, 如torchvision.transforms.ToTensor会将读入的图片型张量的channel维度从最后一维提到第一维, 如果想要置换回去就需要使用torch.permute;

查看torchvision.transforms.ToTensor源码可以发现做了img.permute((2, 0, 1))的置换, 那么逆置换就是img.permute((1, 2, 0));

torch.transpose每次只能置换两个维度的次序, PyTorch中文文档中的翻译有误导性, 容易理解成只能对二维矩阵进行转置;

高维张量转置: 以img.permute((1, 2, 0))为例, 等价于img.transpose(0,2).transpose(0,1);

3.1.3 torch.sort

torch.sort(input, dim=None, descending=False, out=None) -> (Tensor, LongTensor)

torch.sort用于张量排序, 默认值dim=None在最后一个维度上进行排序;
返回值有两个:

第一个返回值是排好序的张量;
第二个返回值是排好序的张量中每个元素对应原先张量中的位置索引;
- 注意很多时候(如在计算MRR指标时)我们想要的是原先张量中每个元素的值排名, 这与第二个参数是有区别的;

用一个例子来解释:

import numpy
import torch
tensor = torch.FloatTensor(numpy.array([[.3, .2, .1, .4], [.3, .2, .1, .4]], dtype=numpy.float64))
sorted_tensor, index = torch.sort(tensor, descending=True)
print(sorted_tensor, type(tensor))
print(index, type(index))# Output:
"""
tensor([[0.4000, 0.3000, 0.2000, 0.1000],[0.4000, 0.3000, 0.2000, 0.1000]]) <class 'torch.Tensor'>
tensor([[3, 0, 1, 2],[3, 0, 1, 2]]) <class 'torch.Tensor'>
"""

对于向量 $[0.3, 0.2, 0.1, 0.4]$ 我们往往需要的是每个元素的排名向量 $[1, 2, 3, 0]$ , 而返回值是 $[3, 0, 1, 2]$ , 因此需要作相关处理;

3.2 torchvision杂记

3.2.1 torchvision.transforms模块用法

PyTorch中文文档

torchvision.transforms.ToTensor

把一个取值范围是 $[0, 255]$ 的PIL.Image或者shape为 $(H, W, C)$ 的numpy.ndarray, 转换成shape为 $(C, H, W)$ , 取值范围是 $[0, 1.0]$ 的torch.FloadTensor;
- 注意会把channel(大部分图片的channel都是在第三个维度, channel维度值一般为3或4, 即RGB或RGBA)对应的维度提到了shape的最前面;
- 注意该变换并不是直接转为张量, 对于RGB值的图片型的张量, 观察源码可发现会作除以255的归一标准化;
- 不符合上述图片型张量的形式的张量(如输入二维矩阵), 将直接不作任何数值处理直接转为torch中的张量;
可以使用torchvision.transforms.ToPILImage作逆变换, 这两个函数互为反函数;
- 这是一个只针对PIL.Image输入的反函数, 即必然乘以255再返回成图片数据类型;

代码示例:

import torchvision as tv# torchvision.transforms.ToTensor
f = tv.transforms.ToTensor()
numpy2tensor = f(np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]))
image = cv2.imread(r'D:\media\image\wallpaper\1.jpg')
image2tensor = f(image)
print(numpy2tensor)
print(image.shape)
print(type(image2tensor))
print(image2tensor.shape)# torchvision.transforms.ToPILImage
f1 = tv.transforms.ToTensor()
f2 = tv.transforms.ToPILImage()
image = cv2.imread(r'D:\media\image\wallpaper\1.jpg')
image2tensor = f1(image)
tensor2image = f2(image2tensor)
print(type(tensor2image))
print(np.asarray(tensor2image))

torchvision.transforms.Normalize

这也是个很诡异的函数, 目前没有看出到底是怎么进行标准化的, 两个参数分别为mean与std;
也只能对图片型张量进行处理,