解决在使用rtx2060跑算法时遇到显存不足的问题

打算在服务器上测试下图像分割算法速度的时候遇到了可能是显存不足的问题。具体报错如下:

WARNING:tensorflow:From /home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2021-11-18 10:40:20.034439: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-11-18 10:40:20.153434: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-18 10:40:20.153808: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55aa738506d0 executing computations on platform CUDA. Devices:
2021-11-18 10:40:20.153822: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
2021-11-18 10:40:20.172688: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2021-11-18 10:40:20.173430: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55aa7394ad20 executing computations on platform Host. Devices:
2021-11-18 10:40:20.173477: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2021-11-18 10:40:20.173696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:04:00.0
totalMemory: 5.79GiB freeMemory: 5.63GiB
2021-11-18 10:40:20.173741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-11-18 10:40:20.174744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-18 10:40:20.174780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2021-11-18 10:40:20.174794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2021-11-18 10:40:20.174928: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5466 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:04:00.0, compute capability: 7.5)
/home/zhonghui/unet/logs/1110-v1.h5 model loaded.
./raw/J036SAG14.jpg
2021-11-18 10:40:21.113571: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-11-18 10:40:21.116589: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_callreturn fn(*args)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fnoptions, feed_dict, fetch_list, target_list, run_metadata)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrunrun_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.[[{{node block1_conv1/convolution}}]]During handling of the above exception, another exception occurred:Traceback (most recent call last):File "predict.py", line 91, in <module>predict('./raw/', './result/jiangxi_mask/')File "predict.py", line 64, in predictr_image = unet.detect_image(image)File "/home/zhonghui/unet/unet.py", line 100, in detect_imagepr = self.model.predict(img)[0]File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/keras/engine/training.py", line 1835, in predictverbose=verbose, steps=steps)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/keras/engine/training.py", line 1330, in _predict_loopbatch_outs = f(ins_batch)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2478, in __call__**self.session_kwargs)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in runrun_metadata_ptr)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _runfeed_dict_tensor, options, run_metadata)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_runrun_metadata)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_callraise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.[[node block1_conv1/convolution (defined at /home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3335) ]]Caused by op 'block1_conv1/convolution', defined at:File "predict.py", line 27, in <module>unet = Unet()File "/home/zhonghui/unet/unet.py", line 36, in __init__self.generate()File "/home/zhonghui/unet/unet.py", line 45, in generateself.model = unet(self.model_image_size, self.num_classes)File "/home/zhonghui/unet/nets/unet.py", line 18, in Unetfeat1, feat2, feat3, feat4, feat5 = VGG16(inputs)File "/home/zhonghui/unet/nets/vgg16.py", line 11, in VGG16name='block1_conv1')(img_input)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/keras/engine/topology.py", line 619, in __call__output = self.call(inputs, **kwargs)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/keras/layers/convolutional.py", line 168, in calldilation_rate=self.dilation_rate)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 3335, in conv2ddata_format=tf_data_format)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 851, in convolutionreturn op(input, filter)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__return self.conv_op(inp, filter)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__return self.call(inp, filter)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__name=self.name)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2ddata_format=data_format, dilations=dilations, name=name)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helperop_def=op_def)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_funcreturn func(*args, **kwargs)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_opop_def=op_def)File "/home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__self._traceback = tf_stack.extract_stack()UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.[[node block1_conv1/convolution (defined at /home/zhonghui/anaconda3/envs/py36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3335) ]]

在报错的瞬间通过nvidia-smi查看了显存使用情况。

显存几乎是占满的。所以可能是因为2060显卡显存只有6g不够用的问题。同样的代码在2080上跑没有出现问题。

解决办法如下:
在代码里添加以下代码

import tensorflow as tfconfig = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

目的是设置gpu动态使用显存。当然也可以设置固定使用多少百分比,比如用30%、70%。这样可以让一台机器同时开两三个AI项目。

再次运行下代码通过nvidia-smi看显存使用情况如下。

解决在使用rtx2060跑算法时遇到显存不足的问题相关推荐

  1. python吃显卡还是内存不足_解决Pytorch 训练与测试时爆显存(out of memory)的问题

    Pytorch 训练时有时候会因为加载的东西过多而爆显存,有些时候这种情况还可以使用cuda的清理技术进行修整,当然如果模型实在太大,那也没办法. 使用torch.cuda.empty_cache() ...

  2. Pytorch 训练与测试时爆显存(cuda out of memory)的终极解决方案,使用cpu(勿喷)

    Pytorch 训练与测试时爆显存(cuda out of memory)的终极解决方案,使用cpu(勿喷) 参见了很多方法,都没有用. 简单点,直接把gpu设成-1

  3. 【NVIDIA】GeForce-GTX-1080Ti单算法服务内存显存占用

    [NVIDIA]GeForce-GTX-1080Ti单算法服务内存显存占用 1.背景 2.关于Gunicorn 服务器 3.测试内存显存占用 3.1 使用1个工作进程 3.2 使用2个工作进程 3.3 ...

  4. RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB linux跑深度学习爆显存问题

    出现这条信息就说明当前环境中可分配给跑训练的显存不够了,有两种可能原因: 1.显卡的显存确实太小,训练网络的要求高于这张显卡.解决办法只有换大显存显卡. 2.系统的进程太多,有可能跑之前调试的代码进程 ...

  5. 训练时GPU显存太小问题、batchsize 的大小跟GPU的显存的关系

    参考链接:https://blog.csdn.net/lien0906/article/details/78863118 问题: tensorflow/core/common_runtime/bfc_ ...

  6. 解决pytorch训练时正常测试时爆显存的问题

    问题: Runtime Error: CUDA out of memory. 解决方案: 1.降低batch_size, 有时候管用,有时候不管用 2. 测试时,with torch.no_grad( ...

  7. 解决Pytorch 训练与测试时爆显存(out of memory)的问题

    Pytorch 训练时有时候会因为加载的东西过多而爆显存,有些时候这种情况还可以使用cuda的清理技术进行修整,当然如果模型实在太大,那也没办法. 使用torch.cuda.empty_cache() ...

  8. Pytorch---训练与测试时爆显存(out of memory)的一个解决方案(torch.cuda.empty_cache())

    Pytorch 训练时有时候会因为加载的东西过多而爆显存,有些时候这种情况还可以使用cuda的清理技术进行修整,当然如果模型实在太大,那也没办法. 使用torch.cuda.empty_cache() ...

  9. 开启XMP前后RTX4000显卡跑序列帧内存进显存速度对比

    环境:i9-7900x 显卡:rtx4000 内存:双通道4000MHZ 主板:MSI X299 开启XMP之前序列帧一张8K bmp内存进显存时间 开启XMP之后序列帧一张8K bmp内存进显存时间

  10. Pytorch 训练与测试时爆显存(out of memory)的一个解决方案

    Pytorch 训练时有时候会因为加载的东西过多而爆显存,有些时候这种情况还可以使用cuda的清理技术进行修整,当然如果模型实在太大,那也没办法. 使用torch.cuda.empty_cache() ...

最新文章

  1. Scanpy(三)可视化函数
  2. .NET与鲲鹏共展翅,昇腾九万里(一)
  3. 虚拟机usb接口连接失败_适用于汽车的USB接口连接器介绍与设计(好文分享)
  4. 苹果WWDC 2020回顾:来看看这个安卓味的iOS 14!
  5. 标准SPI、DUAL SPI、Quad SPI
  6. excel文件损坏修复绝招_ps文件损坏有修复的软件!超强开挂辅助神器
  7. python给pdf放置签名图片_利用python制作电子签名
  8. springboot word excel ppt 图片aspose 转换PDF 在线预览
  9. centos 7.6安装WeADMIN ITOSS步骤
  10. storm apache_Apache Storm很棒。 这就是为什么(以及如何)使用它的原因。
  11. 小米扫地机器人一直提示安装尘盒_小米扫地机器人怎么清理尘盒滤网?
  12. 苹果手机换电池对手机有影响吗_你知道你的手机电池多久换一次吗?
  13. python与数据挖掘 分类和预测
  14. 初级系列11.个人所得税问题
  15. 彩色图像处理(matlab)
  16. 云端IDE:阿里云机器学习与PAI-DSW | 《阿里云机器学习PAI-DSW入门指南》
  17. 【AI_数学知识】概率论
  18. npm下载依赖时的问题
  19. 虚拟机安装---模板机准备9(测试安装好的模板机)
  20. final修饰的变量就是常量?final修饰局部变量在栈还是堆还是常量池中?

热门文章

  1. 通过python实现网页录音的效果--思路
  2. FreeBSD安装与配置(转)
  3. 网易云课堂资源合集百度云分享
  4. android手机屏幕投影,安卓手机屏幕投影到电脑(笔记本)教程分享
  5. 推荐几款开发板TI AM335X NXP IMX6UL
  6. AI学习路线和书籍分享
  7. pdf 加深 扫描件_扫描文字字体如何加深 pdf扫描件字体加深
  8. 前端研习录(02)——CSS内联样式、内部样式及外部样式
  9. web端常用手机号,邮箱,税号,组织机构代码,银行卡号等JS正则校验表达式总结
  10. matlab axis函数_又是被Matlab整疯的一天!来学点简单操作!