问题

在训练 Transformer 的过程中,pytorhc出现的问题:RuntimeError: cuda runtime error (59) : device-side assert triggered at C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src\THC/THCReduceAll.cuh:327

具体报错如下

C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src/THC/THCTensorIndex.cu:361: block: [80,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src\THC/THCReduceAll.cuh line=327 error=59 : device-side assert triggered
Traceback (most recent call last):File "C:\Users\AppData\Local\conda\conda\envs\yuanbo_pytorch\lib\site-packages\torch\nn\functional.py", line 3105, in multi_head_attention_forwardqkv_same = torch.equal(query, key) and torch.equal(key, value)
RuntimeError: cuda runtime error (59) : device-side assert triggered at C:/w/1/s/tmp_conda_3.6_155139/conda/conda-bld/pytorch_1565366019852/work/aten/src\THC/THCReduceAll.cuh:327

解决方法

debug了很久也没有找到问题所在,后来发现 GPU 不能正确定位异常位置,device改用 CPU 后才发现真正的错误:RuntimeError: index out of range: Tried to access index 103 out of table with 99 rows. at C:\w\1\s\tmp_conda_3.6_155139\conda\conda-bld\pytorch_1565366019852\work\aten\src\TH/generic/THTensorEvenMoreMath.cpp:237

原来是由于索引出错了,检查后发现,在 Transformer 的 decoder 做 position embedding 的时候,由于词表中的索引出错导致出现了 “RuntimeError: cuda runtime error (59) : device-side assert triggered”。重新制备词表即可。

RuntimeError: cuda runtime error (59) : device-side assert triggered相关推荐

  1. cuda runtime error (59) : device-side assert triggered when running transfer_learning_

    cuda runtime error (59) : device-side assert triggered when running transfer_learning_ 参考:https://bl ...

  2. pytorch测试报错:RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module

    模型在服务器多gpu上训练,测试在自己台式机上进行,只有一块gpu,测试报错: File "/home/fuxueping/sdb/PycharmProjects/face_recognit ...

  3. RuntimeError: cuda runtime error (77)

    项目场景: python语言,GPU环境,定义神经网络后,初始化神经网络,训练模型前,cat数据,出现异常:RuntimeError: cuda runtime error (77) 问题描述: 虚拟 ...

  4. RuntimeError: cuda runtime error (3) : initialization error at /pytorch/aten/src/THC/THCGeneral.cpp:

    第三个大坑:在python多进程中使用pytorch加载模型时报错: THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=5 ...

  5. 解决RuntimeError: cuda runtime error (30) : unknown error at /pytorch/aten/src/THC/THCGeneral.cpp:70N

    解决两个问题: (1)RuntimeError: cuda runtime error (30) : unknown error at /pytorch/aten/src/THC/THCGeneral ...

  6. RuntimeError: cuda runtime error (30) : unknown error at /pytorch/aten/src/THC/THCGeneral.cpp:50

    RuntimeError: cuda runtime error (30) : unknown error at /pytorch/aten/src/THC/THCGeneral.cpp:50 在运行 ...

  7. RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp

    RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp cud ...

  8. RuntimeError: cuda runtime error (8) : invalid device function at /pytorch/torch/lib/THC/generic/THC

    一般是 CUDA 编译的问题,编译时候使用的参数与使用的GPU显卡不一致 在 Makefile 里编译的时候有关 CUDA 的参数 arch=compute_61,code=sm_61 参数里边的 6 ...

  9. pytorch runtime error(59):device-side assert triggered at XXX

    跑模型的时候,遇到了这个问题,定位是在 char-embedding 中的 conv 层中,由于需要使用 pool1d, char embedding后的 size 为(N*seq_len, word ...

最新文章

  1. iOS手机 相册 相机(Picker Write)
  2. python:多线程
  3. 自动驾驶年度激辩:载货比载人更快,商业化应成为评价指标 | MEET2021
  4. Java高并发编程:HandlerThread
  5. c++ 读文件_Linux文件(文件夹)详解
  6. Redis入门(二)安装和基本操作
  7. [机器学习] 分类 --- Support Vector Machine (SVM)
  8. django项目开发1:搭建虚拟环境
  9. [vue] vue项目有使用过npm run build --report吗?
  10. P1420 最长连号(python3实现)
  11. Ajax 实现在WebForm中拖动控件并即时在服务端保存状态数据 (Asp.net 2.0)(示例代码下载)...
  12. 什么是eSIM技术,eSIM的工作原理以及为什么eSIM会很重要
  13. 【go】metrics基本使用
  14. jQuery学习之---效果
  15. linux 创建用户_用 Bash 脚本发送新用户帐户创建的邮件 | Linux 中国
  16. cmd 卸载mysql_彻底卸载MySQL图文教程
  17. 树莓派挂载硬盘/U盘以及分区教程
  18. 产品经理职责技能和所需证书
  19. PS轻松制作四种扁平化风格图标
  20. Python搭建代理池爬取拉勾网招聘信息

热门文章

  1. python股票交易接口实现股票分时图K线图及抓取level2行情的开发程序分析
  2. 计算机证书的编号查询
  3. 阿里专家:讲述支付宝内部架构剖析
  4. 产品健康度模型(5) 打分II
  5. 古剑奇谭服务器1月17维护,古剑奇谭WEB7月17日更新维护公告
  6. vue3.0引入百度地图并标记点
  7. 怎么样提取DVD光盘的视频文件
  8. 我的世界1.14刷雪机java版_我的世界全自动刷雪机图文攻略 手把手教你刷雪机怎么做...
  9. 谷歌浏览器跨域怎么设置
  10. Exploring Cross-Image Pixel Contrast for Semantic Segmentation