最近单位搞到1台装了rtx3060显卡到机器,我把之前项目代码上面一跑发现速度非常啦跨...!!!!

举个例子:视频目标检测推理原来能跑到60帧,但这货居然只能跑到12帧!!!!(tensorflow1)

然后我换了框架(tensorrt+pycuda)一顿搞,发现RTX3060显卡上到速度比我到笔记本1060显卡慢4倍!!!!

这简直给我带到了新世界,于是我用tensorflow写了一个demo:

import numpy as np
import time
import tensorflow as tfa=np.random.rand(100,100)
b=np.random.rand(100,100)
c= tf.matmul(a,b)with tf.Session() as sess:for i in range(10):t0=time.time()sess.run(c)print('time cost:{:.4f}'.format((time.time()-t0)*1000))

3060机器测定结果:

(AI) root@face-ai:~$ nvidia-smi
Thu Jul 15 10:48:43 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3060    Off  | 00000000:02:00.0 Off |                  N/A |
| 42%   49C    P2    43W / 170W |    849MiB / 12051MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1139      G   /usr/bin/gnome-shell                4MiB |
|    0   N/A  N/A      6905      C   python3                           841MiB |
+-----------------------------------------------------------------------------+
(AI) root@face-ai:~$ python3 test.py
2021-07-15 10:48:50.362846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From test.py:9: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.2021-07-15 10:48:58.212358: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-15 10:48:58.249094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.249440: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.282163: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.288839: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.290773: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.319544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.323162: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.326224: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.331603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.421741: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3499825000 Hz
2021-07-15 10:48:58.423567: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c5fdcc20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.423802: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-15 10:48:58.919241: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c606faf0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.919997: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 3060, Compute Capability 8.6
2021-07-15 10:48:58.923105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.934999: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.935367: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.935458: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.935535: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.935604: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.935679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.935753: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.937903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.938317: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:49:01.153241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:49:01.154207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0
2021-07-15 10:49:01.154511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N
2021-07-15 10:49:01.162712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9454 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060, pci bus id: 0000:02:00.0, compute capability: 8.6)
time cost:600.3177
time cost:17.2832
time cost:3.6066
time cost:2.5594
time cost:1.3814
time cost:1.4493
time cost:1.7078
time cost:2.7463
time cost:16.8326
time cost:3.1228

1060笔记本结果

a@a-G3-3579:/media/a$ nvidia-smi
Thu Jul 15 10:50:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   59C    P0    24W /  N/A |    494MiB /  6078MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4574      G   /usr/lib/xorg/Xorg                224MiB |
|    0   N/A  N/A      4777      G   /usr/bin/gnome-shell              212MiB |
|    0   N/A  N/A      5165      G   fcitx-qimpanel                     40MiB |
|    0   N/A  N/A      6374      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      6445      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      6488      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      7201      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13756      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13799      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13944      G   /usr/lib/firefox/firefox            1MiB |
+-----------------------------------------------------------------------------+
a@a-G3-3579:/media/a$ python3 test.py
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.np_resource = np.dtype([("resource", np.ubyte, 1)])
2021-07-15 10:50:56.135547: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-07-15 10:50:56.229574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-15 10:50:56.230025: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2063ff0 executing computations on platform CUDA. Devices:
2021-07-15 10:50:56.230041: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1060 with Max-Q Design, Compute Capability 6.1
2021-07-15 10:50:56.231739: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-07-15 10:50:56.232615: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x27288f0 executing computations on platform Host. Devices:
2021-07-15 10:50:56.232631: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2021-07-15 10:50:56.232716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.3415
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 5.39GiB
2021-07-15 10:50:56.232747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-07-15 10:50:56.233196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:50:56.233207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2021-07-15 10:50:56.233234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2021-07-15 10:50:56.233302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5220 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
time cost:58.0266
time cost:0.4869
time cost:0.3860
time cost:0.3378
time cost:0.3417
time cost:0.3548
time cost:0.2599
time cost:0.2871
time cost:0.2599
time cost:0.2649

这个速度实在太离谱了!!!!

也许是我哪个地方设置问题,如果有大佬知道怎么优化到话还欢迎指导

RTX3060显卡比1060跑深度学习慢?相关推荐

  1. 用胶带屏蔽PCIE接口解决兼容问题,150块的P104矿渣卡也能跑深度学习

    P104虽然是矿卡,但是有魔改之后达到8G的显存以及相当于1060的核心,而且闲鱼价格只要150块左右.用来跑深度学习之类的任务或许还是相当具有性价比的.也就是说可以用一些便宜的二手硬件来搭建属于自己 ...

  2. Windows 下安装 CUDA 和 Pytorch 跑深度学习

    Windows 下安装 CUDA 和 Pytorch 跑深度学习 一.安装cuda 11.3 1.1 download cuda 11.3 1.2 双击解压 1.3 测完安装是否成功 二.安装Anac ...

  3. CPU和GPU跑深度学习差别有多大?

    作者:带萝卜 链接:https://www.zhihu.com/question/273812506/answer/1271840613 来源:知乎 著作权归作者所有.商业转载请联系作者获得授权,非商 ...

  4. caffe linux跑自己数据,caffe+linux平台——跑深度学习的流程

    大家好,放假在即,来整理一波在Linux上用caffe跑深度学习的流程,免得开学回来忘记. 以下地址均为我自己电脑上的地址,大家要照着跑请自行修改地址哦! 1.将训练和测试的图像放入examples- ...

  5. 英伟达新禁令:不能随便用GeForce显卡跑深度学习(挖矿可以)

    维金 允中 发自 凹非寺 量子位 出品 | 公众号 QbitAI 终于藏不住了. 正值西方国家欢度佳节之时,一份英伟达的surprise终于被发现. 是一条关于GeForce的禁令. 这款备受AI&q ...

  6. 不是Nvidia(英伟达)显卡可以安装CUDA跑深度学习算法吗?

    不是Nvidia(英伟达)显卡可以安装CUDA跑深度学习算法吗? 答:不行! Cuda主要是面向Nvidia的GPU的.Intel和AMD的显示芯片都不能进行.所以,想要让cuda环境搭建在自己的Wi ...

  7. 用于跑深度学习的嵌入式硬件平台资料整理(一)

    本来想温习下数电,模电,单片机,电路设计,外围配套端口和设备方面的知识,往底层硬件方面去,鉴于精力有限,初衷点是想把算法和硬件相结合,考虑到这些年主要是算法方面(图像处理.3D点云处理.深度学习)的积 ...

  8. 用于跑深度学习的嵌入式硬件平台资料整理(二)

    博主之前已经有一篇博客有这方面的介绍,这段时间自己也熟悉了下树莓派(前面已有很多关于此的系列博客),所以这篇博客探讨的会更深入些,为后面的实战做准备. 用于跑深度学习的嵌入式硬件平台资料整理_竹叶青l ...

  9. 用外置显卡跑深度学习的一些注意事项

    用外置显卡跑深度学习的一些注意事项 华硕的显卡坞并不是说只能在华硕电脑上用,只要有雷电3接口的一般是可以用的.这个我专门问了客服的. 搜外置显卡弄深度学习建议用google搜,搜出的结果比百度多多了. ...

最新文章

  1. react控制组件的显示或隐藏, 根据state判断元素显示隐藏 , setState不实时生效解决方法
  2. 【MATLAB】符号数学计算(八):符号分析可视化
  3. 数据包发包工具bittwist
  4. 你多久没换过壁纸了?新年了,换一换吧!
  5. MRP区域“MRP Area”的定义以及作用
  6. 谷歌、雅虎支持中文域名搜索 有助提升SEO
  7. jQueryEasyUI Messager基本使用
  8. 使用构建器模式来帮助您的单元测试
  9. Discuz2.5菜鸟解析-2
  10. 移动html触摸效果,JS实现移动端触屏拖拽功能
  11. C/C++/Objective-C经典书籍推荐
  12. Redhat/Ubuntu/Windows下安装Docker
  13. 将文件夹下所有csv文件转换成所有txt
  14. 如何拼局域网所有ip_如何查看局域网内所有ip?
  15. VSCode 常用编程字体
  16. 微信小程序,点击右上角返回箭头,返回指定页面
  17. 单目3D目标检测方法CaDDN解读
  18. 哈工大读研和找工作心得
  19. 【动手深度学习-笔记】注意力机制(一)注意力机制框架
  20. 单词倒排 与 IP整数转换

热门文章

  1. 【CSS】用CSS画太极图
  2. flutter 打开设置面板进行基本设置
  3. 3.23期货品种每日早盘建议
  4. 不用编程,实现三菱FX5U与罗克韦尔(AB)PLC之间实时通讯
  5. CF连杀喊话_WeGame修改
  6. Python爬虫:(亲测,已解决!)解决在使用谷歌浏览器的开发者工具时,没有Referer防盗链缺失问题。
  7. R语言使用glm函数构建拟泊松回归模型(quasi-Poisson regression)、family参数设置为quasipoisson、summary函数获取拟泊松回归模型汇总统计信息
  8. 电商后台管理系统(一)项目搭建
  9. ios小米手环6最新固件获取auth_key更换第三方表盘
  10. Excel快速入门02