ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,77,3072]
跑模型的时候出现了下面的错误(太长了,所以只保留了有用的关键信息)。在网上得知,出现这种错误的原因可能是显存空间不够,这有可能是使用的batch_size过大或者显卡被其他服务占用引起的。之后我查看了一下源码,偶然间发现代码里使用的n_gpu的默认值是4,我将其修改为1并重新运行代码之后,代码被成功执行。
结合网上搜索到的资源和我的这次试验,总结一下出现这个问题的原因:
- batch_size太大;
- 有其他模型在占用GPU资源;
- 对GPU数量的设置不符合实际(过大)。
2019-03-16 18:59:38.881528: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:38.881535: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:38.881540: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:38.881545: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX512F instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:38.881550: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2019-03-16 18:59:39.005554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-16 18:59:39.005820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: Tesla P4
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:00:07.0
Total memory: 7.43GiB
Free memory: 7.32GiB
2019-03-16 18:59:39.005851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2019-03-16 18:59:39.005858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2019-03-16 18:59:39.005868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P4, pci bus id: 0000:00:07.0)0%| | 0/46 [00:00<?, ?it/s]2019-03-16 19:00:05.441385: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 14.44MiB. Current allocation summary follows.
2019-03-16 19:00:05.441859: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 14.44MiB was 8.00MiB, Chunk State:
2019-03-16 19:00:05.462553: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2019-03-16 19:00:05.462905: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:05.462917: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 7463944192
InUse: 7462481920
MaxInUse: 7462915328
NumAllocs: 3978
MaxAllocSize: 1972741122019-03-16 19:00:05.463019: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:05.463075: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,3072]
2019-03-16 19:00:05.463170: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 14.44MiB. Current allocation summary follows.
2019-03-16 19:00:05.463596: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 14.44MiB was 8.00MiB, Chunk State:
2019-03-16 19:00:05.484133: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2019-03-16 19:00:05.484464: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:05.484475: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 7463944192
InUse: 7462481920
MaxInUse: 7462915328
NumAllocs: 3978
MaxAllocSize: 1972741122019-03-16 19:00:05.484576: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:05.484592: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,3072]
2019-03-16 19:00:05.530899: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.61MiB. Current allocation summary follows.
2019-03-16 19:00:05.531407: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 3.61MiB was 2.00MiB, Chunk State:
2019-03-16 19:00:05.553057: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2019-03-16 19:00:05.553394: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:05.553404: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 7463944192
InUse: 7462481920
MaxInUse: 7462915328
NumAllocs: 3978
MaxAllocSize: 1972741122019-03-16 19:00:05.553505: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:05.553531: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,768]
2019-03-16 19:00:05.553668: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.61MiB. Current allocation summary follows.
2019-03-16 19:00:05.554103: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 3.61MiB was 2.00MiB, Chunk State:
2019-03-16 19:00:05.574314: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2019-03-16 19:00:05.574638: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:05.574666: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 7463944192
InUse: 7462481920
MaxInUse: 7462915328
NumAllocs: 3978
MaxAllocSize: 1972741122019-03-16 19:00:05.574770: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:05.574786: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[1232,768]
2019-03-16 19:00:15.484765: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 14.44MiB. Current allocation summary follows.
2019-03-16 19:00:15.485248: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 14.44MiB was 8.00MiB, Chunk State:
2019-03-16 19:00:15.506609: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2019-03-16 19:00:15.506956: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:15.506968: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 7463944192
InUse: 7462422528
MaxInUse: 7462915328
NumAllocs: 3978
MaxAllocSize: 1972741122019-03-16 19:00:15.507082: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:15.507112: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,3072]
2019-03-16 19:00:25.507333: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 14.44MiB. Current allocation summary follows.
2019-03-16 19:00:25.507912: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 14.44MiB was 8.00MiB, Chunk State:
2019-03-16 19:00:25.527807: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2019-03-16 19:00:25.528034: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.95GiB
2019-03-16 19:00:25.528044: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 7463944192
InUse: 7462422528
MaxInUse: 7462915328
NumAllocs: 3978
MaxAllocSize: 1972741122019-03-16 19:00:25.528124: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2019-03-16 19:00:25.528148: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,77,3072]
Traceback (most recent call last):File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1327, in _do_callreturn fn(*args)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fnstatus, run_metadata)File "/anaconda3/lib/python3.5/contextlib.py", line 66, in __exit__next(self.gen)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_statuspywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,77,3072][[Node: model_2/h2/mlp/Pow = Pow[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](model_2/h2/mlp/c_fc/Reshape_2, model_2/h2/mlp/Pow/y)]][[Node: Mean_8/_2963 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_62814_Mean_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]During handling of the above exception, another exception occurred:Traceback (most recent call last):File "train.py", line 433, in <module>cost, _ = sess.run([clf_loss, train], {X_train:xmb, M_train:mmb, Y_train:ymb})File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 895, in runrun_metadata_ptr)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1124, in _runfeed_dict_tensor, options, run_metadata)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1321, in _do_runoptions, run_metadata)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _do_callraise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,77,3072][[Node: model_2/h2/mlp/Pow = Pow[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](model_2/h2/mlp/c_fc/Reshape_2, model_2/h2/mlp/Pow/y)]][[Node: Mean_8/_2963 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_62814_Mean_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]Caused by op 'model_2/h2/mlp/Pow', defined at:File "train.py", line 397, in <module>train, logits, clf_losses, lm_losses = mgpu_train(X_train, M_train, Y_train)File "train.py", line 203, in mgpu_trainclf_logits, clf_losses, lm_losses = model(*xs, train=True, reuse=do_reuse)File "train.py", line 172, in modelh = block(h, 'h%d'%layer, train=train, scale=True)File "train.py", line 145, in blockm = mlp(n, 'mlp', nx*4, train=train)File "train.py", line 135, in mlph = act(conv1d(x, 'c_fc', n_state, 1, train=train))File "train.py", line 23, in gelureturn 0.5*x*(1+tf.tanh(math.sqrt(2/math.pi)*(x+0.044715*tf.pow(x, 3))))File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 544, in powreturn gen_math_ops._pow(x, y, name=name)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1533, in _powresult = _op_def_lib.apply_op("Pow", x=x, y=y, name=name)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_opop_def=op_def)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_oporiginal_op=self._default_original_op, op_def=op_def)File "/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__self._traceback = self._graph._extract_stack() # pylint: disable=protected-accessResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,77,3072][[Node: model_2/h2/mlp/Pow = Pow[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](model_2/h2/mlp/c_fc/Reshape_2, model_2/h2/mlp/Pow/y)]][[Node: Mean_8/_2963 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_62814_Mean_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,77,3072]相关推荐
- 报错解决:ResourceExhaustedError: OOM when allocating tensor with shape
报错解决:ResourceExhaustedError: OOM when allocating tensor with shape 早上在使用tensorflow时遇到如下报错: Traceback ...
- Resource exhausted: OOM when allocating tensor with shape[620,20000] and type float on /job:localhos
在CPU下跑的时候并没有报错,换成GPU后就一直OOM,后来被提醒道,显存比内存小很多啊,所以改小了batch_size,并将网络结构进行了优化,减少了词向量的维度和隐藏层节点个数. 另外,在代码里添 ...
- OP_REQUIRES failed at conv_ops.cc:386 : Resource exhausted: OOM when allocating tensor with shape..
tensorflow-gpu验证准确率是报错如上: 解决办法: 1. 加入os.environ['CUDA_VISIBLE_DEVICES']='2' 强制使用CPU验证-----慢 2.'batch ...
- 报错:ResourceExhaustedError OOM when allocating
日萌社 人工智能AI:Keras PyTorch MXNet TensorFlow PaddlePaddle 深度学习实战(不定时更新) 报错:ResourceExhaustedError OOM w ...
- Mask_RCNN安装与踩过的坑
一.Mask_RCNN下载 https://www.bilibili.com/video/BV1M7411x7is?t=629&p=5 按照上述教程的话,安装的是ballon例子的Mask_R ...
- Windows Tensorflow GPU安装
GPU资源对神经网络模型的训练很重要,应充分利用电脑的显卡资源,加快模型的训练速度.这里是本人安装tensorflow-gpu的过程,记录了安装的步骤以及在每个步骤中参考的资料以及所遇到的坑. 大体步 ...
- keras训练过程中发生的一些报错及其解决办法
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,x,x,x] sh ...
- NLP之BERT英文阅读理解问答SQuAD 2.0超详细教程
环境 linux python 3.6 tensorflow 1.12.0 文件准备工作 下载bert源代码 : https://github.com/google-research/bert 下载b ...
- BEGAN-hmi88代码调试
来源于github上hmi88大佬的代码. 问题:InternalError: Blas GEMM launch failed : a.shape=(16, 64), b.shape=(64, 819 ...
- 令人绝望的TensorFlow-GPU,多种报错!!!
为了加速自己搞模型的效率,再三考虑后终于决定换上GPU版本的TensorFlow 但是!!!! 这个鬼报错差点把我搞疯!!! 我不止一次的想过就放弃吧,老老实实的回去用CPU版本,但是我本身极好的素质 ...
最新文章
- 关于onclick操作,影响按钮submit提交
- 超级直播sop直播源.zip_双11首场虚拟直播,天猫超级直播开创直播新玩法
- spring入门-设值方法的差异
- 你真的会玩SQL吗?和平大使 内连接、外连接
- 【Servlet】获取并输出服务器获得的数据
- mysql insert 错误码_利用 MySQL 自身错误诊断区域-爱可生
- 从一个深度图里面导出NARF特征
- 第四届中国外贸电子商务大会:PayPal瞄准B2B2C模式
- 外媒:高通、微软和谷歌担忧英伟达收购Arm将损害竞争
- 销售联系客户 需要技巧
- [论文阅读] Active Learning for Deep Object Detection via Probabilistic Modeling
- AC自动机 HDU 2222
- 2步判断晶体管工作状态
- 学习使用DCMTK工具
- 【测试面试题每日一刷】22道接口测试面试题,附答案
- 苹果计算机cpu 型号怎么看,MacBook苹果电脑怎么查看cpu型号等配置详情
- 《激荡三十年》七、国企难破局—“裁缝神话”步鑫生
- 北京政协委员共商提高城市精治、共治、法治水平
- 2022年电工(技师)考试题及电工(技师)模拟试题
- 如何设置 RecyclerView Item内子控件点击事件
热门文章
- c语言大一,C语言复习 大一.doc
- python 3d绘图kmeans_使用Python matplotlib绘制3D多边形!
- 计算机仿真和vr的区别,你真的知道AR与VR的区别吗?
- fd在python_python中fd()是什么
- excel无法加密保存在HTML,如何excel加密后另存为后无需再输入密码/excel加密文件怎么解密?...
- 巧用Mac上的Spotlight搜索
- mscorsvw.exe是windows的什么进程!!
- ESP8266连接TFT(ST7789)配置说明
- 基于android校园新闻APP开发的设计与实现
- 2018年Android面试题含答案