enable anomaly detection to find the operation that failed to compute its gradient, with torch.autog

关于pytorch中多个backward出现的问题：enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly True.

在执行代码中包含两个方向传播(backward)时，可能会出现这种问题：

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

什么情况下会出现这种问题，我们先构建一个场景：

import torch
from torch import nn as nn
from torch.nn import functional as F
from torch import optim

为了简化问题，构建两个相同的神经网络：

class Net_1(nn.Module):def __init__(self):super(Net_1, self).__init__()self.linear_1 = nn.Linear(1,10)self.linear_2 = nn.Linear(10,1)def forward(self,x):x = self.linear_1(x)x = F.relu(x)x = self.linear_2(x)x = F.softmax(x,dim=1)return xclass Net_2(nn.Module):def __init__(self):super(Net_2,self).__init__()self.linear_1 = nn.Linear(1,10)self.linear_2 = nn.Linear(10,1)def forward(self, x):x = self.linear_1(x)x = F.relu(x)x = self.linear_2(x)x = F.softmax(x,dim=1)return x

算法执行流程：
定义模型Net_1,Net_2、两个模型对应的优化器(Optimizer)optimizer_n1,optimizer_n2，以及损失函数criterion：

n_1 = Net_1()
n_2 = Net_2()optimizer_n1 = optim.Adam(n_1.parameters(),lr=0.001)
optimizer_n2 = optim.Adam(n_2.parameters(),lr=0.001)
criterion = nn.MSELoss()

执行过程如下：

for i in range(10):x = torch.randn(10,1).float()y = 2 * xpred_n1 = n_1(x)optimizer_n1.zero_grad()loss_n1 = criterion(y,pred_n1)loss_n1.backward()optimizer_n1.step()pred_n2 = n_2(pred_n1)optimizer_n2.zero_grad()loss_n2 = criterion(y,pred_n2)loss_n2.backward()optimizer_n2.step()

注意的点：该执行过程的特点是，第一个神经网络的pred_n1，它也参与了第二个神经网络的反向传播的过程。

我们知道的是，loss_n1.backward()操作在执行后，计算节点被保存了，但是计算图结构被释放掉了，导致第二个损失函数进行反向传播过程中需要使用第一次反向传播的计算图结构失败。

至此，我们在backward中使用retain_graph=True来保存它的计算图结构：
这里有一小细节：
实际上loss_n1,loss_n2的backward都可以添加参数retain_graph=True，我们之所以只在第一个里面添加，是因为如果第二个也加了，本次for循环中的计算图结构就堆积在了内存中，释放不掉，对内存是有负担的。所以一般情况下本次for循环的计算图使用完后，我们给它释放掉。

for i in range(10):x = torch.randn(10,1).float()y = 2 * xpred_n1 = n_1(x)optimizer_n1.zero_grad()loss_n1 = criterion(y,pred_n1)loss_n1.backward(retain_graph=True)optimizer_n1.step()pred_n2 = n_2(pred_n1)optimizer_n2.zero_grad()loss_n2 = criterion(y,pred_n2)loss_n2.backward()optimizer_n2.step()

修改了这个细节之后，重新运行代码：
仍然在出错。
这次出错的问题，就是标题的问题：
首先，按照它的要求，执行一次torch.autograd.set_detect_anomaly(True)：

import torch
from torch import nn as nn
from torch.nn import functional as F
from torch import optimtorch.autograd.set_detect_anomaly(True)

返回的报错结果如下：

D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\autograd\__init__.py:154: UserWarning: Error detected in AddmmBackward0. Traceback of forward call that caused the error:File "D:\code_work\reinforcement_learning\.pytest_cache\bark_test.py", line 49, in <module>pred_n1 = n_1(x)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_implreturn forward_call(*input, **kwargs)File "D:\code_work\reinforcement_learning\.pytest_cache\bark_test.py", line 18, in forwardx = self.linear_2(x)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_implreturn forward_call(*input, **kwargs)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\linear.py", line 103, in forwardreturn F.linear(input, self.weight, self.bias)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1848, in linearreturn torch._C._nn.linear(input, weight, bias)(Triggered internally at  ..\torch\csrc\autograd\python_anomaly_mode.cpp:104.)Variable._execution_engine.run_backward(
Traceback (most recent call last):File "D:\code_work\reinforcement_learning\.pytest_cache\bark_test.py", line 58, in <module>loss_n2.backward()File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\_tensor.py", line 307, in backwardtorch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\autograd\__init__.py", line 154, in backwardVariable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

无论是前面retain_graph=True报的错还是这个错误，都是指向loss_n2.backward(),上面的报错信息就详细地梳理了loss_n2反向传播的计算流程，和我们代码关联的只有一项：

x = self.linear_2(x)

在这个计算过程中，由于存在pred_n1的参与，我们要保证loss_n2.backward()执行过程中requires_grad=False，否则会发生冲突。
因此，我们在该神经网络中的初始部分使用detach操作，将它的requires_grad停止。即：

x = self.linear_1(x).detach()

为什么只添加在初始位置，其余位置不加？
很简单，因为只有神经网络的初始节点(叶节点)才能接收梯度(requires_grad=True)，非叶节点(隐藏层中的节点都是requires_grad=False)
损失函数结果本身是标量，自然也不会有梯度的。
至此，上述错误解决。修改后完整代码如下，大家可以对比一下：

import torch
from torch import nn as nn
from torch.nn import functional as F
from torch import optimclass Net_1(nn.Module):def __init__(self):super(Net_1, self).__init__()self.linear_1 = nn.Linear(1,10)self.linear_2 = nn.Linear(10,1)def forward(self,x):x = self.linear_1(x)x = F.relu(x)x = self.linear_2(x)x = F.softmax(x,dim=1)return xclass Net_2(nn.Module):def __init__(self):super(Net_2,self).__init__()self.linear_1 = nn.Linear(1,10)self.linear_2 = nn.Linear(10,1)def forward(self, x):x = self.linear_1(x).detach()x = F.relu(x)x = self.linear_2(x)x = F.softmax(x,dim=1)return xn_1 = Net_1()
n_2 = Net_2()optimizer_n1 = optim.Adam(n_1.parameters(),lr=0.001)
optimizer_n2 = optim.Adam(n_2.parameters(),lr=0.001)
criterion = nn.MSELoss()for i in range(10):x = torch.randn(10,1).float()y = 2 * xpred_n1 = n_1(x)optimizer_n1.zero_grad()loss_n1 = criterion(y,pred_n1)loss_n1.backward(retain_graph=True)optimizer_n1.step()pred_n2 = n_2(pred_n1)optimizer_n2.zero_grad()loss_n2 = criterion(y,pred_n2)loss_n2.backward()optimizer_n2.step()

enable anomaly detection to find the operation that failed to compute its gradient, with torch.autog相关推荐

异常检测综述（Anomaly Detection: A Survey）
Anomaly Detection: A Survey 异常检测综述: 异常检测是一个重要的问题,已经在不同的研究领域和应用领域进行了研究.许多异常检测技术是专门为某些应用领域开发的,而其他技术则更为 ...
(ch9) Deep Learning for Anomaly Detection: A Survey
Deep Learning for Anomaly Detection: A Survey https://www.researchgate.net/publication/330357393_Dee ...
【Paper】Deep Learning for Anomaly Detection:A survey
论文原文:PDF 论文年份:2019 论文被引:253(2020/10/05) 922(2022/03/26) 文章目录 ABSTRACT 1 Introduction 2 What are anom ...
吴恩达机器学习笔记55-异常检测算法的特征选择（Choosing What Features to Use of Anomaly Detection）
吴恩达机器学习笔记55-异常检测算法的特征选择(Choosing What Features to Use of Anomaly Detection) 对于异常检测算法,使用特征是至关重要的,下面谈谈 ...
Machine Learning week 9 quiz: Anomaly Detection
Anomaly Detection 5 试题 1. For which of the following problems would anomaly detection be a suitable ...
Pattern Discovery and Anomaly Detection via Knowledge Graph-学习笔记
Pattern Discovery and Anomaly Detection via Knowledge Graph 知识图谱使用实体及其关系对信息进行建模. 实体提取:使用统计模型或基于语言语法的 ...
文献学习(part83)--An Embedding Approach to Anomaly Detection
学习笔记,仅供参考,有错必纠还没更完,10号前更完文章目录 An Embedding Approach to Anomaly Detection 摘要 INTRODUCTION Contribut ...
入门机器学习(十八)--异常检测(Anomaly Detection)
异常检测(Anomaly Detection) 1. 问题动机(Problem Motivation) 2. 高斯分布(Gaussian Distribution) 3. 算法(Algorithm) ...
linux每日一练：Enable multithreading to use std::thread: Operation not permitted问题解决
linux每日一练:Enable multithreading to use std::thread: Operation not permitted问题解决在linux在需要使用c++11时会遇到 ...

enable anomaly detection to find the operation that failed to compute its gradient, with torch.autog

关于pytorch中多个backward出现的问题：enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly True.

enable anomaly detection to find the operation that failed to compute its gradient, with torch.autog相关推荐

最新文章

热门文章