训练报错：RuntimeError: CUDA error: device-side assert triggered

训练分类网络resnet18时发生报错：RuntimeError: CUDA error: device-side assert triggered

原始代码：

for batch_index, (images, labels) in enumerate(train_dataloader):if epoch <= args.warm:warmup_scheduler.step()images = images.cuda()labels = labels.cuda()optimizer.zero_grad()predicts = net(images)loss = 0for i, pi in enumerate(predicts):t = torch.full_like(pi, 0, device=device)if labels[i].long() > 0:print('labels:', labels[i])t[labels[i].long() - 1] = 1print('t', t)loss_curr = criterion(pi, t) # BCEloss += loss_currtrain_total_loss += loss_curr.item()loss.backward()optimizer.step()

发生报错后，将报错位置进行打印：

labels: tensor([7], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 0., 1., 0., 0.], device='cuda:0')
labels: tensor([4], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 1., 0., 0., 0., 0., 0.], device='cuda:0')
labels: tensor([8], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 0., 0., 1., 0.], device='cuda:0')
labels: tensor([8], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 0., 0., 1., 0.], device='cuda:0')
labels: tensor([1], device='cuda:0', dtype=torch.int32)
t tensor([1., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
labels: tensor([1], device='cuda:0', dtype=torch.int32)
t tensor([1., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
labels: tensor([1], device='cuda:0', dtype=torch.int32)
t tensor([1., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
labels: tensor([6], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 1., 0., 0., 0.], device='cuda:0')
labels: tensor([8], device='cuda:0', dtype=torch.int32)
t tensor([0., 0., 0., 0., 0., 0., 0., 1., 0.], device='cuda:0')
labels: tensor([10], device='cuda:0', dtype=torch.int32)
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
t Traceback (most recent call last):File "train_infrared_01.py", line 194, in <module>print('t', t)File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 179, in __repr__return torch._tensor_str._str(self)File "/opt/conda/lib/python3.8/site-packages/torch/_tensor_str.py", line 372, in _strreturn _str_intern(self)File "/opt/conda/lib/python3.8/site-packages/torch/_tensor_str.py", line 352, in _str_interntensor_str = _tensor_str(self, indent)File "/opt/conda/lib/python3.8/site-packages/torch/_tensor_str.py", line 241, in _tensor_strformatter = _Formatter(get_summarized_data(self) if summarize else self)File "/opt/conda/lib/python3.8/site-packages/torch/_tensor_str.py", line 89, in __init__nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: device-side assert triggered

最终问题是因为标签越界，在设置fc层的类别的时候只设置了9类，但是总共类别是10类，所以产生报错；因为标签是从0开始计算的，所以少算了1类；

训练报错：RuntimeError: CUDA error: device-side assert triggered相关推荐

报错RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa,
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb ...
Faster rcnn 训练coco2017数据报错 RuntimeError: CUDA error: device-side assert triggered
Faster rcnn 训练coco2017数据报错 RuntimeError: CUDA error: device-side assert triggered 使用faster rcnn训练自己的 ...
在yolo训练的时候又去测试就会报错：cuda error: out of memory
在yolo训练的时候又去测试就会报错:cuda error: out of memory, cuda.c Assertion '0' failed. 不过,如果是用的yolo-tiny.cfg的话是 ...
报错`RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0； 9.78 GiB total capaci
报错RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 9.78 GiB total capacity; 8. ...
已解决yolov5报错RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB
问题 RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 4.00 GiB total capacity; 2. ...
RuntimeError: CUDA error: no kernel image is available for execution on the device
导致的原因一般都是显卡算力和cuda或者torch版本不匹配比如在conda中安装的pytorch=1.5.0 cuda=10.2 错误:RuntimeError: CUDA error: no k ...
PyTorch报错：RuntimeError: CUDA error: device-side assert triggered at /pytorch/aten/src/THC/generic
训练模型报错 RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/TH ...
RTX 3090运行报错：RuntimeError: CUDA error: no kernel image is available for execution on the device
RuntimeError: CUDA error: no kernel image is available for execution on the device 安装适用于GeForce RTX ...
（ubuntu）YOLOv5报错：RuntimeError: CUDA error: no kernel image is available for execution on the ...
笔者之前在另一台电脑成功跑通了自己修改的yolov5,但在新系统运行时报了如下错误 RuntimeError: CUDA error: no kernel image is available for ...
AssertionError: Invalid device id 和RuntimeError: CUDA error: invalid device ordinal
我在使用torch多卡并行时出现了这个两个问题. ##问题一:AssertionError: Invalid device id,即无效的设备id 出现的原因:结合代码解释: import ...os ...

训练报错：RuntimeError: CUDA error: device-side assert triggered

训练报错：RuntimeError: CUDA error: device-side assert triggered相关推荐

最新文章

热门文章