【Bugs】RuntimeError: CUDA out of memory.

报错如下:

Traceback (most recent call last):File "xxx.py", line 110, in <module>loss.backward()File "/nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 185, in backwardtorch.autograd.backward(self, gradient, retain_graph, create_graph)File "/nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backwardallow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 15.78 GiB total capacity; 13.69 GiB already allocated; 91.50 MiB free; 14.53 GiB reserved in total by PyTorch)
Exception raised from malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:272 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x14c9ce19a1e2 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1e64b (0x14c9ce3f064b in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1f464 (0x14c9ce3f1464 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1faa1 (0x14c9ce3f1aa1 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #4: at::native::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) + 0x11e (0x14c9d10fc90e in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0xf33949 (0x14c9cf536949 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xf4d777 (0x14c9cf550777 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x10e9c7d (0x14ca0a2ecc7d in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x10e9f97 (0x14ca0a2ecf97 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) + 0xfa (0x14ca0a3f7a1a in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::native::mm_cuda(at::Tensor const&, at::Tensor const&) + 0x6c (0x14c9d05ebffc in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #11: <unknown function> + 0xf22a20 (0x14c9cf525a20 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #12: <unknown function> + 0xa56530 (0x14ca09c59530 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x14ca0a44181c in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: at::mm(at::Tensor const&, at::Tensor const&) + 0x4b (0x14ca0a3926ab in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: <unknown function> + 0x2ed0a2f (0x14ca0c0d3a2f in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: <unknown function> + 0xa56530 (0x14ca09c59530 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x14ca0a44181c in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: at::Tensor::mm(at::Tensor const&) const + 0x4b (0x14ca0a527cab in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #19: <unknown function> + 0x2d11c34 (0x14ca0bf14c34 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::generated::MmBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x294 (0x14ca0bf30814 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #21: <unknown function> + 0x3375bb7 (0x14ca0c578bb7 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #22: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x14ca0c574400 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #23: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x14ca0c574fa1 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #24: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x14ca0c56d119 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x14ca19d0ddea in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: <unknown function> + 0xbd6df (0x14ca5616b6df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #27: <unknown function> + 0x76db (0x14ca5a6356db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #28: clone + 0x3f (0x14ca5a35ea3f in /lib/x86_64-linux-gnu/libc.so.6)

解决方案:

CUDA_LAUNCH_BLOCKING=1 python xx.py失败

调小batch size解决。

参考资料

CUDA error: an illegal instruction was encountered

CUDA error: device-side assert triggered(insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:569) #27297

【Bugs】RuntimeError CUDA out of memory相关推荐

  1. 【PyTorch】RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm()

    完整报错信息. RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa ...

  2. 【已解决】探究CUDA out of memory背后原因,如何释放GPU显存?

    目录 1 问题背景 2 问题探索 2.1 CUDA固有显存 2.2 显存激活与失活 2.3 释放GPU显存 3 问题总结 4 告别Bug 1 问题背景 研究过深度学习的同学,一定对类似下面这个CUDA ...

  3. RuntimeError: CUDA out of memory

    报错内容: RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 2.00 GiB total capacity; ...

  4. python pytorch爆显存,内存溢出问题解决方法(总结)RuntimeError: CUDA out of memory.

    问题描述 在运行python程序时,随运行时间增长,内存疯狂增加,直至运行内存爆满,出现以下错误: RuntimeError: CUDA out of memory. 解决方法: 1.在模型验证和测试 ...

  5. RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB解决方案

    [问题描述] 在跑深度学习训练模型时报此错误. [解决方法] 1.尝试调小训练使用的batch size 2.尝试调小img size 3.修改电脑配置中页面文件的大小(我的python.anacon ...

  6. pytorch遇见RuntimeError: CUDA out of memory的解决

    RuntimeError: CUDA out of memory 1.查看是否其他程序占用显存 遇到此类错误后,对于py格式的文件来说,程序会进行终止,也就是当前程序占用的显存将会被释放.此时可用 w ...

  7. 【转】iphone - ios app maximum memory budget

    [转]iphone - ios app maximum memory budget https://stackoverflow.com/questions/5887248/ios-app-maximu ...

  8. RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 2; 3.95 GiB total capacity; 3.41

    pytorch报错:RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 2; 3.95 GiB total capa ...

  9. pytorch出现RuntimeError: CUDA out of memory.

    无论batch-size设置多小也是会出现这个问题的,我的原因是我将pytorch升级到了1.0.1,然后出现了这个问题 RuntimeError: CUDA out of memory. Tried ...

最新文章

  1. [codevs 1913] 数字梯形问题
  2. 贝塞尔结合CAShapeLayer绘制路线,CABasicAnimation实现的小动画
  3. loaction.reload(false)和location.reload(true) js发起请求
  4. Python_summary
  5. 【算法】图(一)拓扑排序的实现 图的邻接表算法 判断是否图G中存在环
  6. C++动态链接库的制作
  7. spring基础整理
  8. vue获取table一列数据_vue中比较重要的小知识点
  9. AudioBufferSourceNode
  10. Android与iOS/WP8跨平台整合设计与开发_专栏
  11. Linux部署安装JDK和Tomcat
  12. 1011. World Cup Betting (20)——PAT (Advanced Level) Practise
  13. c++ 读文件_C语言文件操作大全
  14. IKM-Java SE 8评估测试题挑战,测测你的基础水平
  15. matlab aic sic,请教ADF检验时AIC准则和SIC准则不一致时怎么办?
  16. Vue3快速学习、vue3视频学习、vue3实例上手教程
  17. 【VS2015】Win7 X64上面安装VS2015
  18. 中国车牌归属地数据库
  19. 从键盘输入一个三位整数n,分别求出n的个位数字、十位数字和百位数字
  20. 关于ntko从后台传输文档时发生文件存取错误,暨关于response使用的注意点

热门文章

  1. 大话设计模式之爱你一万年:第十四章 行为模式:命令模式:烧烤天天吃:2.命令模式概念
  2. 【贪心 位运算】JZOJ_3518 进化序列(evolve)
  3. usmile即将上市,小米、海尔争相发力,国产电动牙刷春天来了?
  4. 如何轻松找到竞品独立站?竞品独立站搜罗神器曝光!
  5. 21 Babylonjs入门进阶 自定义相机输入事件
  6. MySQL数据库基本语法(1)
  7. 为什么你总抓不住机会
  8. Java8新特性Stream的常见用法
  9. vue.runtime.esm.js?2b0e:619 [Vue warn]: Invalid prop: type check failed for prop “data“.问题的简单解决 通俗易懂
  10. IPO影子股掘金路线图