【Bugs】RuntimeError CUDA out of memory
【Bugs】RuntimeError: CUDA out of memory.
报错如下:
Traceback (most recent call last):File "xxx.py", line 110, in <module>loss.backward()File "/nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 185, in backwardtorch.autograd.backward(self, gradient, retain_graph, create_graph)File "/nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 127, in backwardallow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 15.78 GiB total capacity; 13.69 GiB already allocated; 91.50 MiB free; 14.53 GiB reserved in total by PyTorch)
Exception raised from malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:272 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x14c9ce19a1e2 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1e64b (0x14c9ce3f064b in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1f464 (0x14c9ce3f1464 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1faa1 (0x14c9ce3f1aa1 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #4: at::native::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) + 0x11e (0x14c9d10fc90e in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0xf33949 (0x14c9cf536949 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xf4d777 (0x14c9cf550777 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x10e9c7d (0x14ca0a2ecc7d in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x10e9f97 (0x14ca0a2ecf97 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) + 0xfa (0x14ca0a3f7a1a in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::native::mm_cuda(at::Tensor const&, at::Tensor const&) + 0x6c (0x14c9d05ebffc in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #11: <unknown function> + 0xf22a20 (0x14c9cf525a20 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #12: <unknown function> + 0xa56530 (0x14ca09c59530 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x14ca0a44181c in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: at::mm(at::Tensor const&, at::Tensor const&) + 0x4b (0x14ca0a3926ab in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: <unknown function> + 0x2ed0a2f (0x14ca0c0d3a2f in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: <unknown function> + 0xa56530 (0x14ca09c59530 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x14ca0a44181c in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: at::Tensor::mm(at::Tensor const&) const + 0x4b (0x14ca0a527cab in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #19: <unknown function> + 0x2d11c34 (0x14ca0bf14c34 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::generated::MmBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x294 (0x14ca0bf30814 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #21: <unknown function> + 0x3375bb7 (0x14ca0c578bb7 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #22: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x14ca0c574400 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #23: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x14ca0c574fa1 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #24: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x14ca0c56d119 in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x14ca19d0ddea in /nfsshare/apps/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: <unknown function> + 0xbd6df (0x14ca5616b6df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #27: <unknown function> + 0x76db (0x14ca5a6356db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #28: clone + 0x3f (0x14ca5a35ea3f in /lib/x86_64-linux-gnu/libc.so.6)
解决方案:
CUDA_LAUNCH_BLOCKING=1 python xx.py
失败
调小batch size解决。
参考资料
CUDA error: an illegal instruction was encountered
CUDA error: device-side assert triggered(insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:569) #27297
【Bugs】RuntimeError CUDA out of memory相关推荐
- 【PyTorch】RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm()
完整报错信息. RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa ...
- 【已解决】探究CUDA out of memory背后原因,如何释放GPU显存?
目录 1 问题背景 2 问题探索 2.1 CUDA固有显存 2.2 显存激活与失活 2.3 释放GPU显存 3 问题总结 4 告别Bug 1 问题背景 研究过深度学习的同学,一定对类似下面这个CUDA ...
- RuntimeError: CUDA out of memory
报错内容: RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 2.00 GiB total capacity; ...
- python pytorch爆显存,内存溢出问题解决方法(总结)RuntimeError: CUDA out of memory.
问题描述 在运行python程序时,随运行时间增长,内存疯狂增加,直至运行内存爆满,出现以下错误: RuntimeError: CUDA out of memory. 解决方法: 1.在模型验证和测试 ...
- RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB解决方案
[问题描述] 在跑深度学习训练模型时报此错误. [解决方法] 1.尝试调小训练使用的batch size 2.尝试调小img size 3.修改电脑配置中页面文件的大小(我的python.anacon ...
- pytorch遇见RuntimeError: CUDA out of memory的解决
RuntimeError: CUDA out of memory 1.查看是否其他程序占用显存 遇到此类错误后,对于py格式的文件来说,程序会进行终止,也就是当前程序占用的显存将会被释放.此时可用 w ...
- 【转】iphone - ios app maximum memory budget
[转]iphone - ios app maximum memory budget https://stackoverflow.com/questions/5887248/ios-app-maximu ...
- RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 2; 3.95 GiB total capacity; 3.41
pytorch报错:RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 2; 3.95 GiB total capa ...
- pytorch出现RuntimeError: CUDA out of memory.
无论batch-size设置多小也是会出现这个问题的,我的原因是我将pytorch升级到了1.0.1,然后出现了这个问题 RuntimeError: CUDA out of memory. Tried ...
最新文章
- [codevs 1913] 数字梯形问题
- 贝塞尔结合CAShapeLayer绘制路线,CABasicAnimation实现的小动画
- loaction.reload(false)和location.reload(true) js发起请求
- Python_summary
- 【算法】图(一)拓扑排序的实现 图的邻接表算法 判断是否图G中存在环
- C++动态链接库的制作
- spring基础整理
- vue获取table一列数据_vue中比较重要的小知识点
- AudioBufferSourceNode
- Android与iOS/WP8跨平台整合设计与开发_专栏
- Linux部署安装JDK和Tomcat
- 1011. World Cup Betting (20)——PAT (Advanced Level) Practise
- c++ 读文件_C语言文件操作大全
- IKM-Java SE 8评估测试题挑战,测测你的基础水平
- matlab aic sic,请教ADF检验时AIC准则和SIC准则不一致时怎么办?
- Vue3快速学习、vue3视频学习、vue3实例上手教程
- 【VS2015】Win7 X64上面安装VS2015
- 中国车牌归属地数据库
- 从键盘输入一个三位整数n,分别求出n的个位数字、十位数字和百位数字
- 关于ntko从后台传输文档时发生文件存取错误,暨关于response使用的注意点
热门文章
- 大话设计模式之爱你一万年:第十四章 行为模式:命令模式:烧烤天天吃:2.命令模式概念
- 【贪心 位运算】JZOJ_3518 进化序列(evolve)
- usmile即将上市,小米、海尔争相发力,国产电动牙刷春天来了?
- 如何轻松找到竞品独立站?竞品独立站搜罗神器曝光!
- 21 Babylonjs入门进阶 自定义相机输入事件
- MySQL数据库基本语法(1)
- 为什么你总抓不住机会
- Java8新特性Stream的常见用法
- vue.runtime.esm.js?2b0e:619 [Vue warn]: Invalid prop: type check failed for prop “data“.问题的简单解决 通俗易懂
- IPO影子股掘金路线图