跑模型的时候出现了下面的错误(太长了,所以只保留了有用的关键信息)。在网上得知,出现这种错误的原因可能是显存空间不够,这有可能是使用的batch_size过大或者显卡被其他服务占用引起的。之后我查看了一下源码,偶然间发现代码里使用的n_gpu的默认值是4,我将其修改为1并重新运行代码之后,代码被成功执行。

结合网上搜索到的资源和我的这次试验,总结一下出现这个问题的原因:

  1. batch_size太大;
  2. 有其他模型在占用GPU资源;
  3. 对GPU数量的设置不符合实际(过大)。
2019-03-16 18:59:38.881528: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:38.881535: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:38.881540: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:38.881545: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX512F instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:38.881550: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:39.005554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-16 18:59:39.005820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: Tesla P4
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:00:07.0
Total memory: 7.43GiB
Free memory: 7.32GiB
2019-03-16 18:59:39.005851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2019-03-16 18:59:39.005858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y
2019-03-16 18:59:39.005868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P4, pci bus id: 0000:00:07.0)0%|                                                    | 0/46 [00:00<?, ?it/s]2019-03-16 19:00:05.441385: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 14.44MiB.  Current allocation summary follows.
2019-03-16 19:00:05.441859: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 14.44MiB was 8.00MiB, Chunk State:
2019-03-16 19:00:05.462553: I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size:
2019-03-16 19:00:05.462905: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:05.462917: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit:                  7463944192
InUse:                  7462481920
MaxInUse:               7462915328
NumAllocs:                    3978
MaxAllocSize:            1972741122019-03-16 19:00:05.463019: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:05.463075: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,3072]
2019-03-16 19:00:05.463170: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 14.44MiB.  Current allocation summary follows.
2019-03-16 19:00:05.463596: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 14.44MiB was 8.00MiB, Chunk State:
2019-03-16 19:00:05.484133: I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size:
2019-03-16 19:00:05.484464: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:05.484475: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit:                  7463944192
InUse:                  7462481920
MaxInUse:               7462915328
NumAllocs:                    3978
MaxAllocSize:            1972741122019-03-16 19:00:05.484576: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:05.484592: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,3072]
2019-03-16 19:00:05.530899: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.61MiB.  Current allocation summary follows.
2019-03-16 19:00:05.531407: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 3.61MiB was 2.00MiB, Chunk State:
2019-03-16 19:00:05.553057: I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size:
2019-03-16 19:00:05.553394: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:05.553404: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit:                  7463944192
InUse:                  7462481920
MaxInUse:               7462915328
NumAllocs:                    3978
MaxAllocSize:            1972741122019-03-16 19:00:05.553505: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:05.553531: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,768]
2019-03-16 19:00:05.553668: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.61MiB.  Current allocation summary follows.
2019-03-16 19:00:05.554103: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 3.61MiB was 2.00MiB, Chunk State:
2019-03-16 19:00:05.574314: I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size:
2019-03-16 19:00:05.574638: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:05.574666: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit:                  7463944192
InUse:                  7462481920
MaxInUse:               7462915328
NumAllocs:                    3978
MaxAllocSize:            1972741122019-03-16 19:00:05.574770: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:05.574786: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[1232,768]
2019-03-16 19:00:15.484765: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 14.44MiB.  Current allocation summary follows.
2019-03-16 19:00:15.485248: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 14.44MiB was 8.00MiB, Chunk State:
2019-03-16 19:00:15.506609: I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size:
2019-03-16 19:00:15.506956: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:15.506968: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit:                  7463944192
InUse:                  7462422528
MaxInUse:               7462915328
NumAllocs:                    3978
MaxAllocSize:            1972741122019-03-16 19:00:15.507082: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:15.507112: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,3072]
2019-03-16 19:00:25.507333: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 14.44MiB.  Current allocation summary follows.
2019-03-16 19:00:25.507912: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 14.44MiB was 8.00MiB, Chunk State:
2019-03-16 19:00:25.527807: I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size:
2019-03-16 19:00:25.528034: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:25.528044: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit:                  7463944192
InUse:                  7462422528
MaxInUse:               7462915328
NumAllocs:                    3978
MaxAllocSize:            1972741122019-03-16 19:00:25.528124: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:25.528148: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,3072]
Traceback (most recent call last):File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1327, in _do_callreturn fn(*args)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fnstatus, run_metadata)File "/anaconda3/lib/python3.5/contextlib.py", line 66, in __exit__next(self.gen)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_statuspywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,77,3072][[Node: model_2/h2/mlp/Pow = Pow[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](model_2/h2/mlp/c_fc/Reshape_2, model_2/h2/mlp/Pow/y)]][[Node: Mean_8/_2963 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_62814_Mean_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]During handling of the above exception, another exception occurred:Traceback (most recent call last):File "train.py", line 433, in <module>cost, _ = sess.run([clf_loss, train], {X_train:xmb, M_train:mmb, Y_train:ymb})File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 895, in runrun_metadata_ptr)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1124, in _runfeed_dict_tensor, options, run_metadata)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1321, in _do_runoptions, run_metadata)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _do_callraise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,77,3072][[Node: model_2/h2/mlp/Pow = Pow[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](model_2/h2/mlp/c_fc/Reshape_2, model_2/h2/mlp/Pow/y)]][[Node: Mean_8/_2963 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_62814_Mean_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]Caused by op 'model_2/h2/mlp/Pow', defined at:File "train.py", line 397, in <module>train, logits, clf_losses, lm_losses = mgpu_train(X_train, M_train, Y_train)File "train.py", line 203, in mgpu_trainclf_logits, clf_losses, lm_losses = model(*xs, train=True, reuse=do_reuse)File "train.py", line 172, in modelh = block(h, 'h%d'%layer, train=train, scale=True)File "train.py", line 145, in blockm = mlp(n, 'mlp', nx*4, train=train)File "train.py", line 135, in mlph = act(conv1d(x, 'c_fc', n_state, 1, train=train))File "train.py", line 23, in gelureturn 0.5*x*(1+tf.tanh(math.sqrt(2/math.pi)*(x+0.044715*tf.pow(x, 3))))File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 544, in powreturn gen_math_ops._pow(x, y, name=name)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1533, in _powresult = _op_def_lib.apply_op("Pow", x=x, y=y, name=name)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_opop_def=op_def)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_oporiginal_op=self._default_original_op, op_def=op_def)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__self._traceback = self._graph._extract_stack()  # pylint: disable=protected-accessResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,77,3072][[Node: model_2/h2/mlp/Pow = Pow[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](model_2/h2/mlp/c_fc/Reshape_2, model_2/h2/mlp/Pow/y)]][[Node: Mean_8/_2963 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_62814_Mean_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,77,3072]相关推荐

  1. 报错解决:ResourceExhaustedError: OOM when allocating tensor with shape

    报错解决:ResourceExhaustedError: OOM when allocating tensor with shape 早上在使用tensorflow时遇到如下报错: Traceback ...

  2. Resource exhausted: OOM when allocating tensor with shape[620,20000] and type float on /job:localhos

    在CPU下跑的时候并没有报错,换成GPU后就一直OOM,后来被提醒道,显存比内存小很多啊,所以改小了batch_size,并将网络结构进行了优化,减少了词向量的维度和隐藏层节点个数. 另外,在代码里添 ...

  3. OP_REQUIRES failed at conv_ops.cc:386 : Resource exhausted: OOM when allocating tensor with shape..

    tensorflow-gpu验证准确率是报错如上: 解决办法: 1. 加入os.environ['CUDA_VISIBLE_DEVICES']='2' 强制使用CPU验证-----慢 2.'batch ...

  4. 报错:ResourceExhaustedError OOM when allocating

    日萌社 人工智能AI:Keras PyTorch MXNet TensorFlow PaddlePaddle 深度学习实战(不定时更新) 报错:ResourceExhaustedError OOM w ...

  5. Mask_RCNN安装与踩过的坑

    一.Mask_RCNN下载 https://www.bilibili.com/video/BV1M7411x7is?t=629&p=5 按照上述教程的话,安装的是ballon例子的Mask_R ...

  6. Windows Tensorflow GPU安装

    GPU资源对神经网络模型的训练很重要,应充分利用电脑的显卡资源,加快模型的训练速度.这里是本人安装tensorflow-gpu的过程,记录了安装的步骤以及在每个步骤中参考的资料以及所遇到的坑. 大体步 ...

  7. keras训练过程中发生的一些报错及其解决办法

    ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,x,x,x] sh ...

  8. NLP之BERT英文阅读理解问答SQuAD 2.0超详细教程

    环境 linux python 3.6 tensorflow 1.12.0 文件准备工作 下载bert源代码 : https://github.com/google-research/bert 下载b ...

  9. BEGAN-hmi88代码调试

    来源于github上hmi88大佬的代码. 问题:InternalError: Blas GEMM launch failed : a.shape=(16, 64), b.shape=(64, 819 ...

  10. 令人绝望的TensorFlow-GPU,多种报错!!!

    为了加速自己搞模型的效率,再三考虑后终于决定换上GPU版本的TensorFlow 但是!!!! 这个鬼报错差点把我搞疯!!! 我不止一次的想过就放弃吧,老老实实的回去用CPU版本,但是我本身极好的素质 ...

最新文章

  1. 关于onclick操作,影响按钮submit提交
  2. 超级直播sop直播源.zip_双11首场虚拟直播,天猫超级直播开创直播新玩法
  3. spring入门-设值方法的差异
  4. 你真的会玩SQL吗?和平大使 内连接、外连接
  5. 【Servlet】获取并输出服务器获得的数据
  6. mysql insert 错误码_利用 MySQL 自身错误诊断区域-爱可生
  7. 从一个深度图里面导出NARF特征
  8. 第四届中国外贸电子商务大会:PayPal瞄准B2B2C模式
  9. 外媒:高通、微软和谷歌担忧英伟达收购Arm将损害竞争
  10. 销售联系客户 需要技巧
  11. [论文阅读] Active Learning for Deep Object Detection via Probabilistic Modeling
  12. AC自动机 HDU 2222
  13. 2步判断晶体管工作状态
  14. 学习使用DCMTK工具
  15. 【测试面试题每日一刷】22道接口测试面试题,附答案
  16. 苹果计算机cpu 型号怎么看,MacBook苹果电脑怎么查看cpu型号等配置详情
  17. 《激荡三十年》七、国企难破局—“裁缝神话”步鑫生
  18. 北京政协委员共商提高城市精治、共治、法治水平
  19. 2022年电工(技师)考试题及电工(技师)模拟试题
  20. 如何设置 RecyclerView Item内子控件点击事件

热门文章

  1. c语言大一,C语言复习 大一.doc
  2. python 3d绘图kmeans_使用Python matplotlib绘制3D多边形!
  3. 计算机仿真和vr的区别,你真的知道AR与VR的区别吗?
  4. fd在python_python中fd()是什么
  5. excel无法加密保存在HTML,如何excel加密后另存为后无需再输入密码/excel加密文件怎么解密?...
  6. 巧用Mac上的Spotlight搜索
  7. mscorsvw.exe是windows的什么进程!!
  8. ESP8266连接TFT(ST7789)配置说明
  9. 基于android校园新闻APP开发的设计与实现
  10. 2018年Android面试题含答案