RTX3060显卡比1060跑深度学习慢?

最近单位搞到1台装了rtx3060显卡到机器,我把之前项目代码上面一跑发现速度非常啦跨...!!!!

举个例子:视频目标检测推理原来能跑到60帧,但这货居然只能跑到12帧!!!!(tensorflow1)

然后我换了框架(tensorrt+pycuda)一顿搞,发现RTX3060显卡上到速度比我到笔记本1060显卡慢4倍!!!!

这简直给我带到了新世界,于是我用tensorflow写了一个demo:

import numpy as np
import time
import tensorflow as tfa=np.random.rand(100,100)
b=np.random.rand(100,100)
c= tf.matmul(a,b)with tf.Session() as sess:for i in range(10):t0=time.time()sess.run(c)print('time cost:{:.4f}'.format((time.time()-t0)*1000))

3060机器测定结果:

(AI) root@face-ai:~$ nvidia-smi
Thu Jul 15 10:48:43 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3060    Off  | 00000000:02:00.0 Off |                  N/A |
| 42%   49C    P2    43W / 170W |    849MiB / 12051MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1139      G   /usr/bin/gnome-shell                4MiB |
|    0   N/A  N/A      6905      C   python3                           841MiB |
+-----------------------------------------------------------------------------+
(AI) root@face-ai:~$ python3 test.py
2021-07-15 10:48:50.362846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From test.py:9: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.2021-07-15 10:48:58.212358: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-15 10:48:58.249094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.249440: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.282163: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.288839: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.290773: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.319544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.323162: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.326224: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.331603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.421741: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3499825000 Hz
2021-07-15 10:48:58.423567: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c5fdcc20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.423802: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-15 10:48:58.919241: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c606faf0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.919997: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 3060, Compute Capability 8.6
2021-07-15 10:48:58.923105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.934999: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.935367: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.935458: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.935535: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.935604: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.935679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.935753: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.937903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.938317: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:49:01.153241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:49:01.154207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0
2021-07-15 10:49:01.154511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N
2021-07-15 10:49:01.162712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9454 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060, pci bus id: 0000:02:00.0, compute capability: 8.6)
time cost:600.3177
time cost:17.2832
time cost:3.6066
time cost:2.5594
time cost:1.3814
time cost:1.4493
time cost:1.7078
time cost:2.7463
time cost:16.8326
time cost:3.1228

1060笔记本结果

a@a-G3-3579:/media/a$ nvidia-smi
Thu Jul 15 10:50:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   59C    P0    24W /  N/A |    494MiB /  6078MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4574      G   /usr/lib/xorg/Xorg                224MiB |
|    0   N/A  N/A      4777      G   /usr/bin/gnome-shell              212MiB |
|    0   N/A  N/A      5165      G   fcitx-qimpanel                     40MiB |
|    0   N/A  N/A      6374      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      6445      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      6488      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      7201      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13756      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13799      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13944      G   /usr/lib/firefox/firefox            1MiB |
+-----------------------------------------------------------------------------+
a@a-G3-3579:/media/a$ python3 test.py
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.np_resource = np.dtype([("resource", np.ubyte, 1)])
2021-07-15 10:50:56.135547: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-07-15 10:50:56.229574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-15 10:50:56.230025: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2063ff0 executing computations on platform CUDA. Devices:
2021-07-15 10:50:56.230041: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1060 with Max-Q Design, Compute Capability 6.1
2021-07-15 10:50:56.231739: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-07-15 10:50:56.232615: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x27288f0 executing computations on platform Host. Devices:
2021-07-15 10:50:56.232631: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2021-07-15 10:50:56.232716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.3415
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 5.39GiB
2021-07-15 10:50:56.232747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-07-15 10:50:56.233196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:50:56.233207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2021-07-15 10:50:56.233234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2021-07-15 10:50:56.233302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5220 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
time cost:58.0266
time cost:0.4869
time cost:0.3860
time cost:0.3378
time cost:0.3417
time cost:0.3548
time cost:0.2599
time cost:0.2871
time cost:0.2599
time cost:0.2649

这个速度实在太离谱了!!!!

也许是我哪个地方设置问题,如果有大佬知道怎么优化到话还欢迎指导

RTX3060显卡比1060跑深度学习慢?相关推荐

用胶带屏蔽PCIE接口解决兼容问题,150块的P104矿渣卡也能跑深度学习
P104虽然是矿卡,但是有魔改之后达到8G的显存以及相当于1060的核心,而且闲鱼价格只要150块左右.用来跑深度学习之类的任务或许还是相当具有性价比的.也就是说可以用一些便宜的二手硬件来搭建属于自己 ...
Windows 下安装 CUDA 和 Pytorch 跑深度学习
Windows 下安装 CUDA 和 Pytorch 跑深度学习一.安装cuda 11.3 1.1 download cuda 11.3 1.2 双击解压 1.3 测完安装是否成功二.安装Anac ...
CPU和GPU跑深度学习差别有多大？
作者:带萝卜链接:https://www.zhihu.com/question/273812506/answer/1271840613 来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商 ...
caffe linux跑自己数据,caffe+linux平台——跑深度学习的流程
大家好,放假在即,来整理一波在Linux上用caffe跑深度学习的流程,免得开学回来忘记. 以下地址均为我自己电脑上的地址,大家要照着跑请自行修改地址哦! 1.将训练和测试的图像放入examples- ...
英伟达新禁令：不能随便用GeForce显卡跑深度学习（挖矿可以）
维金允中发自凹非寺量子位出品 | 公众号 QbitAI 终于藏不住了. 正值西方国家欢度佳节之时,一份英伟达的surprise终于被发现. 是一条关于GeForce的禁令. 这款备受AI&q ...
不是Nvidia(英伟达)显卡可以安装CUDA跑深度学习算法吗？
不是Nvidia(英伟达)显卡可以安装CUDA跑深度学习算法吗? 答:不行! Cuda主要是面向Nvidia的GPU的.Intel和AMD的显示芯片都不能进行.所以,想要让cuda环境搭建在自己的Wi ...
用于跑深度学习的嵌入式硬件平台资料整理（一）
本来想温习下数电,模电,单片机,电路设计,外围配套端口和设备方面的知识,往底层硬件方面去,鉴于精力有限,初衷点是想把算法和硬件相结合,考虑到这些年主要是算法方面(图像处理.3D点云处理.深度学习)的积 ...
用于跑深度学习的嵌入式硬件平台资料整理（二）
博主之前已经有一篇博客有这方面的介绍,这段时间自己也熟悉了下树莓派(前面已有很多关于此的系列博客),所以这篇博客探讨的会更深入些,为后面的实战做准备. 用于跑深度学习的嵌入式硬件平台资料整理_竹叶青l ...
用外置显卡跑深度学习的一些注意事项
用外置显卡跑深度学习的一些注意事项华硕的显卡坞并不是说只能在华硕电脑上用,只要有雷电3接口的一般是可以用的.这个我专门问了客服的. 搜外置显卡弄深度学习建议用google搜,搜出的结果比百度多多了. ...

RTX3060显卡比1060跑深度学习慢?

RTX3060显卡比1060跑深度学习慢?相关推荐

最新文章

热门文章