关于pytorch中多个backward出现的问题:enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly True.

在执行代码中包含两个方向传播(backward)时,可能会出现这种问题:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

什么情况下会出现这种问题,我们先构建一个场景

import torch
from torch import nn as nn
from torch.nn import functional as F
from torch import optim

为了简化问题,构建两个相同的神经网络:

class Net_1(nn.Module):def __init__(self):super(Net_1, self).__init__()self.linear_1 = nn.Linear(1,10)self.linear_2 = nn.Linear(10,1)def forward(self,x):x = self.linear_1(x)x = F.relu(x)x = self.linear_2(x)x = F.softmax(x,dim=1)return xclass Net_2(nn.Module):def __init__(self):super(Net_2,self).__init__()self.linear_1 = nn.Linear(1,10)self.linear_2 = nn.Linear(10,1)def forward(self, x):x = self.linear_1(x)x = F.relu(x)x = self.linear_2(x)x = F.softmax(x,dim=1)return x

算法执行流程
定义模型Net_1,Net_2、两个模型对应的优化器(Optimizer)optimizer_n1,optimizer_n2,以及损失函数criterion

n_1 = Net_1()
n_2 = Net_2()optimizer_n1 = optim.Adam(n_1.parameters(),lr=0.001)
optimizer_n2 = optim.Adam(n_2.parameters(),lr=0.001)
criterion = nn.MSELoss()

执行过程如下:

for i in range(10):x = torch.randn(10,1).float()y = 2 * xpred_n1 = n_1(x)optimizer_n1.zero_grad()loss_n1 = criterion(y,pred_n1)loss_n1.backward()optimizer_n1.step()pred_n2 = n_2(pred_n1)optimizer_n2.zero_grad()loss_n2 = criterion(y,pred_n2)loss_n2.backward()optimizer_n2.step()

注意的点:该执行过程的特点是,第一个神经网络的pred_n1,它也参与了第二个神经网络的反向传播的过程。

我们知道的是,loss_n1.backward()操作在执行后,计算节点被保存了,但是计算图结构被释放掉了,导致第二个损失函数进行反向传播过程中需要使用第一次反向传播的计算图结构失败

至此,我们在backward中使用retain_graph=True来保存它的计算图结构:
这里有一小细节
实际上loss_n1,loss_n2backward都可以添加参数retain_graph=True,我们之所以只在第一个里面添加,是因为如果第二个也加了,本次for循环中的计算图结构就堆积在了内存中,释放不掉,对内存是有负担的。所以一般情况下本次for循环的计算图使用完后,我们给它释放掉

for i in range(10):x = torch.randn(10,1).float()y = 2 * xpred_n1 = n_1(x)optimizer_n1.zero_grad()loss_n1 = criterion(y,pred_n1)loss_n1.backward(retain_graph=True)optimizer_n1.step()pred_n2 = n_2(pred_n1)optimizer_n2.zero_grad()loss_n2 = criterion(y,pred_n2)loss_n2.backward()optimizer_n2.step()

修改了这个细节之后,重新运行代码:
仍然在出错
这次出错的问题,就是标题的问题:
首先,按照它的要求,执行一次torch.autograd.set_detect_anomaly(True)

import torch
from torch import nn as nn
from torch.nn import functional as F
from torch import optimtorch.autograd.set_detect_anomaly(True)

返回的报错结果如下:

D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\autograd\__init__.py:154: UserWarning: Error detected in AddmmBackward0. Traceback of forward call that caused the error:File "D:\code_work\reinforcement_learning\.pytest_cache\bark_test.py", line 49, in <module>pred_n1 = n_1(x)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_implreturn forward_call(*input, **kwargs)File "D:\code_work\reinforcement_learning\.pytest_cache\bark_test.py", line 18, in forwardx = self.linear_2(x)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_implreturn forward_call(*input, **kwargs)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\linear.py", line 103, in forwardreturn F.linear(input, self.weight, self.bias)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 1848, in linearreturn torch._C._nn.linear(input, weight, bias)(Triggered internally at  ..\torch\csrc\autograd\python_anomaly_mode.cpp:104.)Variable._execution_engine.run_backward(
Traceback (most recent call last):File "D:\code_work\reinforcement_learning\.pytest_cache\bark_test.py", line 58, in <module>loss_n2.backward()File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\_tensor.py", line 307, in backwardtorch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)File "D:\software\anaconda3\envs\pytorch\lib\site-packages\torch\autograd\__init__.py", line 154, in backwardVariable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

无论是前面retain_graph=True报的错还是这个错误,都是指向loss_n2.backward(),上面的报错信息就详细地梳理了loss_n2反向传播的计算流程,和我们代码关联的只有一项:

x = self.linear_2(x)

在这个计算过程中,由于存在pred_n1的参与,我们要保证loss_n2.backward()执行过程中requires_grad=False,否则会发生冲突。
因此,我们在该神经网络中的初始部分使用detach操作,将它的requires_grad停止。即:

x = self.linear_1(x).detach()

为什么只添加在初始位置,其余位置不加?
很简单,因为只有神经网络的初始节点(叶节点)才能接收梯度(requires_grad=True),非叶节点(隐藏层中的节点都是requires_grad=False)
损失函数结果本身是标量,自然也不会有梯度的。
至此,上述错误解决。修改后完整代码如下,大家可以对比一下:

import torch
from torch import nn as nn
from torch.nn import functional as F
from torch import optimclass Net_1(nn.Module):def __init__(self):super(Net_1, self).__init__()self.linear_1 = nn.Linear(1,10)self.linear_2 = nn.Linear(10,1)def forward(self,x):x = self.linear_1(x)x = F.relu(x)x = self.linear_2(x)x = F.softmax(x,dim=1)return xclass Net_2(nn.Module):def __init__(self):super(Net_2,self).__init__()self.linear_1 = nn.Linear(1,10)self.linear_2 = nn.Linear(10,1)def forward(self, x):x = self.linear_1(x).detach()x = F.relu(x)x = self.linear_2(x)x = F.softmax(x,dim=1)return xn_1 = Net_1()
n_2 = Net_2()optimizer_n1 = optim.Adam(n_1.parameters(),lr=0.001)
optimizer_n2 = optim.Adam(n_2.parameters(),lr=0.001)
criterion = nn.MSELoss()for i in range(10):x = torch.randn(10,1).float()y = 2 * xpred_n1 = n_1(x)optimizer_n1.zero_grad()loss_n1 = criterion(y,pred_n1)loss_n1.backward(retain_graph=True)optimizer_n1.step()pred_n2 = n_2(pred_n1)optimizer_n2.zero_grad()loss_n2 = criterion(y,pred_n2)loss_n2.backward()optimizer_n2.step()

enable anomaly detection to find the operation that failed to compute its gradient, with torch.autog相关推荐

  1. 异常检测综述(Anomaly Detection: A Survey)

    Anomaly Detection: A Survey 异常检测综述: 异常检测是一个重要的问题,已经在不同的研究领域和应用领域进行了研究.许多异常检测技术是专门为某些应用领域开发的,而其他技术则更为 ...

  2. (ch9) Deep Learning for Anomaly Detection: A Survey

    Deep Learning for Anomaly Detection: A Survey https://www.researchgate.net/publication/330357393_Dee ...

  3. 【Paper】Deep Learning for Anomaly Detection:A survey

    论文原文:PDF 论文年份:2019 论文被引:253(2020/10/05) 922(2022/03/26) 文章目录 ABSTRACT 1 Introduction 2 What are anom ...

  4. 吴恩达机器学习笔记55-异常检测算法的特征选择(Choosing What Features to Use of Anomaly Detection)

    吴恩达机器学习笔记55-异常检测算法的特征选择(Choosing What Features to Use of Anomaly Detection) 对于异常检测算法,使用特征是至关重要的,下面谈谈 ...

  5. Machine Learning week 9 quiz: Anomaly Detection

    Anomaly Detection 5 试题 1. For which of the following problems would anomaly detection be a suitable ...

  6. Pattern Discovery and Anomaly Detection via Knowledge Graph-学习笔记

    Pattern Discovery and Anomaly Detection via Knowledge Graph 知识图谱使用实体及其关系对信息进行建模. 实体提取:使用统计模型或基于语言语法的 ...

  7. 文献学习(part83)--An Embedding Approach to Anomaly Detection

    学习笔记,仅供参考,有错必纠 还没更完,10号前更完 文章目录 An Embedding Approach to Anomaly Detection 摘要 INTRODUCTION Contribut ...

  8. 入门机器学习(十八)--异常检测(Anomaly Detection)

    异常检测(Anomaly Detection) 1. 问题动机(Problem Motivation) 2. 高斯分布(Gaussian Distribution) 3. 算法(Algorithm) ...

  9. linux每日一练:Enable multithreading to use std::thread: Operation not permitted问题解决

    linux每日一练:Enable multithreading to use std::thread: Operation not permitted问题解决 在linux在需要使用c++11时会遇到 ...

最新文章

  1. 烂泥:net use与shutdown配合使用,本机重启远程服务器
  2. Web安全之文件包含漏洞
  3. 直播预告丨加速消费金融行业运营体系新升级,驱动经营提质增效!
  4. 状态机在VHDL中的实现
  5. linux 协议错误,在linux客户机上:协议错误,Vagrant无法挂载同步的文件夹_vagrant_开发99编程知识库...
  6. 多线程之CountDownLatch和CyclicBarrier的区别和用法
  7. 教你些技巧,用 Python 自动化办公做一些有趣的事情 太方便了
  8. 数据结构之各排序算法
  9. FFMPEG开源音视频项目学习汇总
  10. 读《遇见未知的自己》有感
  11. oracle教程之创建自己的锁定
  12. 几款Mac下载神器推荐,让你相见恨晚的MacBook神器
  13. python如何循环sql语句_sql语句的for循环语句怎么写
  14. “燕云十六将”之三弟王静
  15. 科技爱好者周刊(第 167 期):广告拦截器太过分了
  16. HTML5期末大作业:红酒销售网页网站设计——品牌红酒销售网页模板(4页) html网页设计期末大作业_网页设计平时作业
  17. SAP WORKFLOW 1创建一个简单的workflow helloworld
  18. cad模型轻量化_【技术帖】基于轻量化概念的碳纤维复合材料汽车保险杠设计
  19. PHP鲜花销售管理系统毕业设计
  20. matlab求传递函数在某个频率点的增益_EQ均衡器频率特性的原理——Q值与带宽、滤波器...

热门文章

  1. Blender建模笔记 | 大帅老猿threejs特训
  2. 电网调度智能防误操作系统
  3. 人狗鸡米安全过河matlab程序,人狗鸡米过河
  4. java程序连接MinIO 报错The request signature we calculated does not match the signature you provided.
  5. 第三方登陆--QQ登陆
  6. 2个Python学习网站制作教程
  7. 大学以来到大二的自我评价
  8. 超材料常用的仿真软件CST COMSOL HFSS指导实际操作
  9. 高通骁龙855发布,5G大幕拉开,新一轮手机大战在即
  10. 《麦肯锡方法》第四部分 麦肯锡生存之道 第14-16章-思维导图