RTX3060显卡比1060跑深度学习慢?
最近单位搞到1台装了rtx3060显卡到机器,我把之前项目代码上面一跑发现速度非常啦跨...!!!!
举个例子:视频目标检测推理原来能跑到60帧,但这货居然只能跑到12帧!!!!(tensorflow1)
然后我换了框架(tensorrt+pycuda)一顿搞,发现RTX3060显卡上到速度比我到笔记本1060显卡慢4倍!!!!
这简直给我带到了新世界,于是我用tensorflow写了一个demo:
import numpy as np
import time
import tensorflow as tfa=np.random.rand(100,100)
b=np.random.rand(100,100)
c= tf.matmul(a,b)with tf.Session() as sess:for i in range(10):t0=time.time()sess.run(c)print('time cost:{:.4f}'.format((time.time()-t0)*1000))
3060机器测定结果:
(AI) root@face-ai:~$ nvidia-smi
Thu Jul 15 10:48:43 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3060 Off | 00000000:02:00.0 Off | N/A |
| 42% 49C P2 43W / 170W | 849MiB / 12051MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1139 G /usr/bin/gnome-shell 4MiB |
| 0 N/A N/A 6905 C python3 841MiB |
+-----------------------------------------------------------------------------+
(AI) root@face-ai:~$ python3 test.py
2021-07-15 10:48:50.362846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From test.py:9: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.2021-07-15 10:48:58.212358: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-15 10:48:58.249094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.249440: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.282163: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.288839: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.290773: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.319544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.323162: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.326224: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.331603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.421741: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3499825000 Hz
2021-07-15 10:48:58.423567: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c5fdcc20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.423802: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-07-15 10:48:58.919241: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c606faf0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.919997: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 3060, Compute Capability 8.6
2021-07-15 10:48:58.923105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.934999: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.935367: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.935458: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.935535: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.935604: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.935679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.935753: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.937903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.938317: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:49:01.153241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:49:01.154207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0
2021-07-15 10:49:01.154511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N
2021-07-15 10:49:01.162712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9454 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060, pci bus id: 0000:02:00.0, compute capability: 8.6)
time cost:600.3177
time cost:17.2832
time cost:3.6066
time cost:2.5594
time cost:1.3814
time cost:1.4493
time cost:1.7078
time cost:2.7463
time cost:16.8326
time cost:3.1228
1060笔记本结果
a@a-G3-3579:/media/a$ nvidia-smi
Thu Jul 15 10:50:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| N/A 59C P0 24W / N/A | 494MiB / 6078MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 4574 G /usr/lib/xorg/Xorg 224MiB |
| 0 N/A N/A 4777 G /usr/bin/gnome-shell 212MiB |
| 0 N/A N/A 5165 G fcitx-qimpanel 40MiB |
| 0 N/A N/A 6374 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 6445 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 6488 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 7201 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 13756 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 13799 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 13944 G /usr/lib/firefox/firefox 1MiB |
+-----------------------------------------------------------------------------+
a@a-G3-3579:/media/a$ python3 test.py
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.np_resource = np.dtype([("resource", np.ubyte, 1)])
2021-07-15 10:50:56.135547: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-07-15 10:50:56.229574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-15 10:50:56.230025: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2063ff0 executing computations on platform CUDA. Devices:
2021-07-15 10:50:56.230041: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1060 with Max-Q Design, Compute Capability 6.1
2021-07-15 10:50:56.231739: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-07-15 10:50:56.232615: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x27288f0 executing computations on platform Host. Devices:
2021-07-15 10:50:56.232631: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2021-07-15 10:50:56.232716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.3415
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 5.39GiB
2021-07-15 10:50:56.232747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-07-15 10:50:56.233196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:50:56.233207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2021-07-15 10:50:56.233234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2021-07-15 10:50:56.233302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5220 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
time cost:58.0266
time cost:0.4869
time cost:0.3860
time cost:0.3378
time cost:0.3417
time cost:0.3548
time cost:0.2599
time cost:0.2871
time cost:0.2599
time cost:0.2649
这个速度实在太离谱了!!!!
也许是我哪个地方设置问题,如果有大佬知道怎么优化到话还欢迎指导
RTX3060显卡比1060跑深度学习慢?相关推荐
- 用胶带屏蔽PCIE接口解决兼容问题,150块的P104矿渣卡也能跑深度学习
P104虽然是矿卡,但是有魔改之后达到8G的显存以及相当于1060的核心,而且闲鱼价格只要150块左右.用来跑深度学习之类的任务或许还是相当具有性价比的.也就是说可以用一些便宜的二手硬件来搭建属于自己 ...
- Windows 下安装 CUDA 和 Pytorch 跑深度学习
Windows 下安装 CUDA 和 Pytorch 跑深度学习 一.安装cuda 11.3 1.1 download cuda 11.3 1.2 双击解压 1.3 测完安装是否成功 二.安装Anac ...
- CPU和GPU跑深度学习差别有多大?
作者:带萝卜 链接:https://www.zhihu.com/question/273812506/answer/1271840613 来源:知乎 著作权归作者所有.商业转载请联系作者获得授权,非商 ...
- caffe linux跑自己数据,caffe+linux平台——跑深度学习的流程
大家好,放假在即,来整理一波在Linux上用caffe跑深度学习的流程,免得开学回来忘记. 以下地址均为我自己电脑上的地址,大家要照着跑请自行修改地址哦! 1.将训练和测试的图像放入examples- ...
- 英伟达新禁令:不能随便用GeForce显卡跑深度学习(挖矿可以)
维金 允中 发自 凹非寺 量子位 出品 | 公众号 QbitAI 终于藏不住了. 正值西方国家欢度佳节之时,一份英伟达的surprise终于被发现. 是一条关于GeForce的禁令. 这款备受AI&q ...
- 不是Nvidia(英伟达)显卡可以安装CUDA跑深度学习算法吗?
不是Nvidia(英伟达)显卡可以安装CUDA跑深度学习算法吗? 答:不行! Cuda主要是面向Nvidia的GPU的.Intel和AMD的显示芯片都不能进行.所以,想要让cuda环境搭建在自己的Wi ...
- 用于跑深度学习的嵌入式硬件平台资料整理(一)
本来想温习下数电,模电,单片机,电路设计,外围配套端口和设备方面的知识,往底层硬件方面去,鉴于精力有限,初衷点是想把算法和硬件相结合,考虑到这些年主要是算法方面(图像处理.3D点云处理.深度学习)的积 ...
- 用于跑深度学习的嵌入式硬件平台资料整理(二)
博主之前已经有一篇博客有这方面的介绍,这段时间自己也熟悉了下树莓派(前面已有很多关于此的系列博客),所以这篇博客探讨的会更深入些,为后面的实战做准备. 用于跑深度学习的嵌入式硬件平台资料整理_竹叶青l ...
- 用外置显卡跑深度学习的一些注意事项
用外置显卡跑深度学习的一些注意事项 华硕的显卡坞并不是说只能在华硕电脑上用,只要有雷电3接口的一般是可以用的.这个我专门问了客服的. 搜外置显卡弄深度学习建议用google搜,搜出的结果比百度多多了. ...
最新文章
- react控制组件的显示或隐藏, 根据state判断元素显示隐藏 , setState不实时生效解决方法
- 【MATLAB】符号数学计算(八):符号分析可视化
- 数据包发包工具bittwist
- 你多久没换过壁纸了?新年了,换一换吧!
- MRP区域“MRP Area”的定义以及作用
- 谷歌、雅虎支持中文域名搜索 有助提升SEO
- jQueryEasyUI Messager基本使用
- 使用构建器模式来帮助您的单元测试
- Discuz2.5菜鸟解析-2
- 移动html触摸效果,JS实现移动端触屏拖拽功能
- C/C++/Objective-C经典书籍推荐
- Redhat/Ubuntu/Windows下安装Docker
- 将文件夹下所有csv文件转换成所有txt
- 如何拼局域网所有ip_如何查看局域网内所有ip?
- VSCode 常用编程字体
- 微信小程序,点击右上角返回箭头,返回指定页面
- 单目3D目标检测方法CaDDN解读
- 哈工大读研和找工作心得
- 【动手深度学习-笔记】注意力机制(一)注意力机制框架
- 单词倒排 与 IP整数转换
热门文章
- 【CSS】用CSS画太极图
- flutter 打开设置面板进行基本设置
- 3.23期货品种每日早盘建议
- 不用编程,实现三菱FX5U与罗克韦尔(AB)PLC之间实时通讯
- CF连杀喊话_WeGame修改
- Python爬虫:(亲测,已解决!)解决在使用谷歌浏览器的开发者工具时,没有Referer防盗链缺失问题。
- R语言使用glm函数构建拟泊松回归模型(quasi-Poisson regression)、family参数设置为quasipoisson、summary函数获取拟泊松回归模型汇总统计信息
- 电商后台管理系统(一)项目搭建
- ios小米手环6最新固件获取auth_key更换第三方表盘
- Excel快速入门02