训练报错:RuntimeError: CUDA error: device-side assert triggered
训练分类网络resnet18时发生报错:RuntimeError: CUDA error: device-side assert triggered
原始代码:
for batch_index, (images, labels) in enumerate(train_dataloader):if epoch <= args.warm:warmup_scheduler.step()images = images.cuda()labels = labels.cuda()optimizer.zero_grad()predicts = net(images)loss = 0for i, pi in enumerate(predicts):t = torch.full_like(pi, 0, device=device)if labels[i].long() > 0:print('labels:', labels[i])t[labels[i].long() - 1] = 1print('t', t)loss_curr = criterion(pi, t) # BCEloss += loss_currtrain_total_loss += loss_curr.item()loss.backward()optimizer.step()
发生报错后,将报错位置进行打印:
labels: tensor([7], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 0., 1., 0., 0.], device='cuda:0')
labels: tensor([4], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 1., 0., 0., 0., 0., 0.], device='cuda:0')
labels: tensor([8], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 0., 0., 1., 0.], device='cuda:0')
labels: tensor([8], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 0., 0., 1., 0.], device='cuda:0')
labels: tensor([1], device='cuda:0', dtype=torch.int32)
t tensor([1., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
labels: tensor([1], device='cuda:0', dtype=torch.int32)
t tensor([1., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
labels: tensor([1], device='cuda:0', dtype=torch.int32)
t tensor([1., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
labels: tensor([6], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 1., 0., 0., 0.], device='cuda:0')
labels: tensor([8], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 0., 0., 1., 0.], device='cuda:0')
labels: tensor([10], device='cuda:0', dtype=torch.int32)
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
t Traceback (most recent call last):File "train_infrared_01.py", line 194, in <module>print('t', t)File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 179, in __repr__return torch._tensor_str._str(self)File "/opt/conda/lib/python3.8/site-packages/torch/_tensor_str.py", line 372, in _strreturn _str_intern(self)File "/opt/conda/lib/python3.8/site-packages/torch/_tensor_str.py", line 352, in _str_interntensor_str = _tensor_str(self, indent)File "/opt/conda/lib/python3.8/site-packages/torch/_tensor_str.py", line 241, in _tensor_strformatter = _Formatter(get_summarized_data(self) if summarize else self)File "/opt/conda/lib/python3.8/site-packages/torch/_tensor_str.py", line 89, in __init__nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: device-side assert triggered
最终问题是因为标签越界,在设置fc层的类别的时候只设置了9类,但是总共类别是10类,所以产生报错;因为标签是从0开始计算的,所以少算了1类;
训练报错:RuntimeError: CUDA error: device-side assert triggered相关推荐
- 报错RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa,
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb ...
- Faster rcnn 训练coco2017数据报错 RuntimeError: CUDA error: device-side assert triggered
Faster rcnn 训练coco2017数据报错 RuntimeError: CUDA error: device-side assert triggered 使用faster rcnn训练自己的 ...
- 在yolo训练的时候又去测试就会报错:cuda error: out of memory
在yolo训练的时候又去测试就会报错:cuda error: out of memory, cuda.c Assertion '0' failed. 不过,如果是用的yolo-tiny.cfg的话是 ...
- 报错`RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 9.78 GiB total capaci
报错RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 9.78 GiB total capacity; 8. ...
- 已解决yolov5报错RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB
问题 RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 4.00 GiB total capacity; 2. ...
- RuntimeError: CUDA error: no kernel image is available for execution on the device
导致的原因一般都是显卡算力和cuda或者torch版本不匹配 比如在conda中安装的pytorch=1.5.0 cuda=10.2 错误:RuntimeError: CUDA error: no k ...
- PyTorch报错:RuntimeError: CUDA error: device-side assert triggered at /pytorch/aten/src/THC/generic
训练模型报错 RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/TH ...
- RTX 3090运行报错:RuntimeError: CUDA error: no kernel image is available for execution on the device
RuntimeError: CUDA error: no kernel image is available for execution on the device 安装适用于GeForce RTX ...
- (ubuntu)YOLOv5报错:RuntimeError: CUDA error: no kernel image is available for execution on the ...
笔者之前在另一台电脑成功跑通了自己修改的yolov5,但在新系统运行时报了如下错误 RuntimeError: CUDA error: no kernel image is available for ...
- AssertionError: Invalid device id 和RuntimeError: CUDA error: invalid device ordinal
我在使用torch多卡并行时出现了这个两个问题. ##问题一:AssertionError: Invalid device id,即无效的设备id 出现的原因:结合代码解释: import ...os ...
最新文章
- Django 第三方引用富文本编辑器6.1
- 带你搭一个SpringBoot+SpringData JPA的环境
- solaris10找安装包的地方
- gcc编译选项-o和-c介绍
- Python之tkinter:动态演示调用python库的tkinter带你进入GUI世界(Scale/Scale的Command)
- Scala _09样例类(case classes)隐式转换
- 佛山高新区构建大数据产业新生态
- nfine框架连接oracle,NFine快速开发框架(无后门)
- python 排队论_建模算法(七)——排队论模型
- 涉密计算机 桌面 及 屏保,符合国家保密要求的涉密计算机屏幕保护程序启动时间要求是不超10分钟 - 作业在线问答...
- 基于FPGA的VGA显示,简单的历程和注释(DE2-115)
- 在一夜暴富之前,我先一夜秃了头
- 有没有测试牙齿需不需要修正的软件,测一测,你的牙齿需要矫正吗?
- 手机变窃听器 !CIA正在盯着你
- BC26 计算三角形的周长和面积(海伦公式)
- [从零手写VIO|第五节]——后端优化实践——单目BA求解代码解析
- ❤️Windows系统❤️cmd命令+实用工具 大全❤️完整总结
- ViewPager 系列之 打造一个通用的 ViewPager
- 东北大学软件项目管理与过程改进复习提纲(2020)——第一章
- Qt设计师如何添加QToolBar工具栏