Nvidia GPU信息nvidia-smi（Persistence-M持久性内存、Volatile Uncorr. ECC显存错误校正、GPU-Util显卡利用率、Compute M.显卡计算模式）

文章目录

ubuntu20.04 nvidia-smi指令信息


C:\Users\SIQI>cd C:\Program Files\NVIDIA Corporation\NVSMIC:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi -persistenced --user foo
Invalid combination of input arguments. Please run 'nvidia-smi -h' for help.C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi -persistenced
Invalid combination of input arguments. Please run 'nvidia-smi -h' for help.C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi -h
NVIDIA System Management Interface -- v441.08NVSMI provides monitoring information for Tesla and select Quadro devices.
The data is presented in either a plain text or an XML format, via stdout or a file.
NVSMI also provides several management operations for changing the device state.NVSMI为Tesla和某些Quadro设备提供监视信息。
数据通过标准输出或文件以纯文本或XML格式显示。
NVSMI还提供了几种管理操作来更改设备状态。Note that the functionality of NVSMI is exposed through the NVML C-based
library. See the NVIDIA developer website for more information about NVML.
Python wrappers to NVML are also available.  The output of NVSMI is
not guaranteed to be backwards compatible; NVML and the bindings are backwards
compatible.请注意，NVSMI的功能是通过基于NVML C的库公开的。 有关NVML的更多信息，请参见NVIDIA开发人员网站。
也可以使用NVML的Python包装器。 NVSMI的输出是
不保证向后兼容； NVML和绑定是向后兼容的。http://developer.nvidia.com/nvidia-management-library-nvml/
http://pypi.python.org/pypi/nvidia-ml-py/
Supported products:
- Full Support- All Tesla products, starting with the Kepler architecture- All Quadro products, starting with the Kepler architecture- All GRID products, starting with the Kepler architecture- GeForce Titan products, starting with the Kepler architecture
- Limited Support- All Geforce products, starting with the Kepler architecture
nvidia-smi [OPTION1 [ARG1]] [OPTION2 [ARG2]] ...-h,   --help                Print usage information and exit. 打印使用信息并退出。（nvidia-smi -h）LIST OPTIONS 列表选项:-L,   --list-gpus           Display a list of GPUs connected to the system. 显示连接到系统的GPU的列表。（nvidia-smi -l）-B,   --list-blacklist-gpus Display a list of blacklisted GPUs in the system.显示系统中列入黑名单的GPU的列表。（nvidia-smi -B）SUMMARY OPTIONS 摘要选项:<no arguments>              Show a summary of GPUs connected to the system.显示连接到系统的GPU的摘要。（nvidia-smi）[plus any of]-i,   --id=                 Target a specific GPU. 定位到特定的GPU。-f,   --filename=           Log to a specified file, rather than to stdout.登录到指定文件，而不是stdout。-l,   --loop=               Probe until Ctrl+C at specified second interval.以指定的第二时间间隔探测直到Ctrl + C。QUERY OPTIONS 查询选项:-q,   --query               Display GPU or Unit info.显示GPU或单元信息。（nvidia-smi -q）[plus any of 加上任何]-u,   --unit                Show unit, rather than GPU, attributes.显示单位而不是GPU属性。（nvidia-smi -q -u）-i,   --id=                 Target a specific GPU or Unit.定位到特定的GPU或单元。（nvidia-smi -q -i 1）-f,   --filename=           Log to a specified file, rather than to stdout.登录到指定文件，而不是stdout。-x,   --xml-format          Produce XML output.产生XML输出。--dtd                 When showing xml output, embed DTD.显示xml输出时，嵌入DTD。-d,   --display=            Display only selected information: 仅显示所选信息MEMORY,UTILIZATION, ECC, TEMPERATURE, POWER, CLOCK,COMPUTE, PIDS, PERFORMANCE, SUPPORTED_CLOCKS,PAGE_RETIREMENT, ACCOUNTING, ENCODER_STATS, FBC_STATS如：nvidia-smi -q -d MEMORYnvidia-smi -q -d MEMORY,UTILIZATIONFlags can be combined with comma e.g. ECC,POWER.Sampling data with max/min/avg is also returnedfor POWER, UTILIZATION and CLOCK display types.Doesn't work with -u or -x flags.标志可以与逗号结合使用，例如 ECC，电源。 对于POWER，UTILIZATION和CLOCK显示类型，还将返回具有max / min / avg的采样数据。 不适用于-u或-x标志。-l,   --loop=               Probe until Ctrl+C at specified second interval.以指定的第二时间间隔探测直到Ctrl + C。-lms, --loop-ms=            Probe until Ctrl+C at specified millisecond interval.以指定的毫秒间隔探测直到Ctrl + C。SELECTIVE QUERY OPTIONS 选择性查询选项:Allows the caller to pass an explicit list of properties to query.允许调用者传递明确的属性列表以进行查询。[one of]--query-gpu=                Information about GPU.有关GPU的信息。Call --help-query-gpu for more info.--query-supported-clocks=   List of supported clocks.支持的时钟列表。Call --help-query-supported-clocks for more info.--query-compute-apps=       List of currently active compute processes.当前活动的计算进程列表。Call --help-query-compute-apps for more info.--query-accounted-apps=     List of accounted compute processes.会计计算流程列表。Call --help-query-accounted-apps for more info.--query-retired-pages=      List of device memory pages that have been retired.已淘汰的设备内存页面列表。Call --help-query-retired-pages for more info.[mandatory 强制性的]--format=                   Comma separated list of format options:以逗号分隔的格式选项列表：csv - comma separated values (MANDATORY)noheader - skip the first line with column headersnounits - don't print units for numericalvalues[plus any of]-i,   --id=                 Target a specific GPU or Unit.-f,   --filename=           Log to a specified file, rather than to stdout.-l,   --loop=               Probe until Ctrl+C at specified second interval.-lms, --loop-ms=            Probe until Ctrl+C at specified millisecond interval.DEVICE MODIFICATION OPTIONS 设备修改选项:[any one of]-e,   --ecc-config=         Toggle ECC support: 0/DISABLED, 1/ENABLED-p,   --reset-ecc-errors=   Reset ECC error counts: 0/VOLATILE, 1/AGGREGATE-c,   --compute-mode=       Set MODE for compute applications:0/DEFAULT, 1/EXCLUSIVE_PROCESS,2/PROHIBITED-dm,  --driver-model=       Enable or disable TCC mode: 0/WDDM, 1/TCC-fdm, --force-driver-model= Enable or disable TCC mode: 0/WDDM, 1/TCCIgnores the error that display is connected.--gom=                Set GPU Operation Mode:0/ALL_ON, 1/COMPUTE, 2/LOW_DP-lgc  --lock-gpu-clocks=    Specifies <minGpuClock,maxGpuClock> clocks as apair (e.g. 1500,1500) that defines the rangeof desired locked GPU clock speed in MHz.Setting this will supercede application clocksand take effect regardless if an app is running.Input can also be a singular desired clock value(e.g. <GpuClockValue>).-rgc  --reset-gpu-clocksResets the Gpu clocks to the default values.-ac   --applications-clocks= Specifies <memory,graphics> clocks as apair (e.g. 2000,800) that defines GPU'sspeed in MHz while running applications on a GPU.-rac  --reset-applications-clocksResets the applications clocks to the default values.-acp  --applications-clocks-permission=Toggles permission requirements for -ac and -rac commands:0/UNRESTRICTED, 1/RESTRICTED-pl   --power-limit=        Specifies maximum power management limit in watts.-cc   --cuda-clocks=        Overrides or restores default CUDA clocks.In override mode, GPU clocks higher frequencies when running CUDA applications.Only on supported devices starting from the Volta series.Requires administrator privileges.0/RESTORE_DEFAULT, 1/OVERRIDE-am   --accounting-mode=    Enable or disable Accounting Mode: 0/DISABLED, 1/ENABLED-caa  --clear-accounted-appsClears all the accounted PIDs in the buffer.--auto-boost-default= Set the default auto boost policy to 0/DISABLEDor 1/ENABLED, enforcing the change only after thelast boost client has exited.--auto-boost-permission=Allow non-admin/root control over auto boost mode:0/UNRESTRICTED, 1/RESTRICTED[plus optional]-i,   --id=                 Target a specific GPU.-eow, --error-on-warning    Return a non-zero error for warnings.UNIT MODIFICATION OPTIONS:-t,   --toggle-led=         Set Unit LED state: 0/GREEN, 1/AMBER[plus optional]-i,   --id=                 Target a specific Unit.SHOW DTD OPTIONS:--dtd                 Print device DTD and exit.[plus optional]-f,   --filename=           Log to a specified file, rather than to stdout.-u,   --unit                Show unit, rather than device, DTD.--debug=                    Log encrypted debug information to a specified file.Device Monitoring:dmon                        Displays device stats in scrolling format."nvidia-smi dmon -h" for more information.daemon                      Runs in background and monitor devices as a daemon process.This is an experimental feature. Not supported on Windows baremetal"nvidia-smi daemon -h" for more information.replay                      Used to replay/extract the persistent stats generated by daemon.This is an experimental feature."nvidia-smi replay -h" for more information.Process Monitoring:pmon                        Displays process stats in scrolling format."nvidia-smi pmon -h" for more information.NVLINK:nvlink                      Displays device nvlink information. "nvidia-smi nvlink -h" for more information.CLOCKS:clocks                      Control and query clock information. "nvidia-smi clocks -h" for more information.ENCODER SESSIONS:encodersessions             Displays device encoder sessions information. "nvidia-smi encodersessions -h" for more information.FBC SESSIONS:fbcsessions                 Displays device FBC sessions information. "nvidia-smi fbcsessions -h" for more information.Please see the nvidia-smi documentation for more detailed information.C:\Program Files\NVIDIA Corporation\NVSMI>

ubuntu20.04 nvidia-smi指令信息

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.118.02   Driver Version: 440.118.02   CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:35:00.0 Off |                    0 |
| N/A   62C    P0    29W /  70W |   5268MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:36:00.0 Off |                    0 |
| N/A   54C    P0    28W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            Off  | 00000000:9C:00.0 Off |                    0 |
| N/A   51C    P0    26W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            Off  | 00000000:9D:00.0 Off |                    0 |
| N/A   52C    P0    27W /  70W |      0MiB / 15109MiB |      5%      Default |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      7311      C   /opt/tensorrtserver/bin/trtserver           5225MiB |
+-----------------------------------------------------------------------------+

释义：

Driver Version：显卡驱动版本号
CUDA Version：CUDA版本号
GPU Name：显卡名称
Persistence-M：是否支持持久性内存（Persistence-M是一种用于保存显卡驱动状态的特殊内存类型。如果启用了持久性内存，那么在显卡重启或掉电后，内存上的信息仍将保存下来。这种内存类型通常用于保存模型的长期数据，以便在不需要重新加载的情况下加速显卡操作。）
Bus-Id：显卡的总线ID（Bus-Id是显卡的总线ID，是该显卡在主板上的唯一标识。它是一个数字字符串，用于标识显卡在总线系统中的位置，方便系统识别和管理。每个显卡都有一个不同的总线ID，用于防止显卡混淆，以及为显卡提供管理和配置信息。）
Disp.A：显卡的显示状态（是否启用）（Disp.A是Display Active的缩写，表示是否有显示器激活，如果有显示器激活，则会显示Yes，否则显示No。显示器激活指的是在电脑系统中打开或使用外接的显示器，使其可以正常显示画面。）
Volatile Uncorr. ECC：是否启用显存错误校正（如果未启用则为0）（Volatile Uncorr. ECC——Volatile Uncorrectable Error Correction and Detection (VUECC)：是一种可变不可修正的错误校验与纠正（ECC）技术，它旨在在计算机存储器中检测和纠正位错误。它使用了特殊的硬件来监控计算机内部数据，并在发现任何差错时通过可靠的方法自动纠正它们。）
Fan：风扇的速度（N/A表示不支持）
Temp：显卡的温度（单位为°C）
Perf：显卡性能状态（P0表示最高性能）（Performance是一种性能指标，用于衡量计算机系统的性能。它可以测量内存带宽、CPU时钟速度、GPU处理器指令集等。）
Pwr:Usage/Cap：显卡功耗使用情况/功耗容量（单位为W）（Power Usage/Capacity）
Memory-Usage：显存使用情况
GPU-Util：显卡利用率（Graphics Processor Unit Utilization）（Memory-Usage指的是设备内存占用情况，它表示当前系统正在使用的内存量。GPU-Util指的是图形处理器的利用率，这个数值表示GPU正在处理的任务的比例，以及它的性能。）
Compute M.：显卡计算模式（Compute M是一种用于测量GPU计算性能的衡量标准。它主要有三种类型：GFLOPS，TMUs和ROPs。GFLOPS代表每秒可以执行的浮点运算次数，TMU（纹理映射单元）可以每秒处理的纹理块数，而ROPs（渲染输出单元）可以每秒进行的像素写入操作次数。）（Default指的是GFLOPS，它用于衡量GPU的浮点计算能力。）

此外，输出中还包含了一个进程列表，显示了使用了显卡内存的进程。

Nvidia GPU信息nvidia-smi（Persistence-M持久性内存、Volatile Uncorr. ECC显存错误校正、GPU-Util显卡利用率、Compute M.显卡计算模式）相关推荐

内存显存，cpu,GPU，显卡
内存显存,cpu,GPU 1 硬件上的区别 1 内存条 2 cpu如下图: 3 显存:属于显卡的组成部分,主要负责存储GPU需要处理的各种数据: 4 GPU:在显卡上,属于显卡的芯片,又称图形处理单 ...
【GPU结构与CUDA系列4】GPU存储资源：寄存器，本地内存，共享内存，缓存，显存等存储器细节
0 软件抽象和硬件结构对应关系的例子把GPU跟一个学校对应起来,学校里有教学楼.操场.食堂,还有老师和学生们:很快有领导(CPU)来检查卫生(需要执行的任务Host程序),因此这个学校的学生们要完成 ...
内存、CPU、显存、GPU
底盘中经常会出现资源.算力等词语,对于里面的CPU.内存.GPU.显存,究竟是什么,这里主要对此进行一个基础认识. 一.内存内存(Memory)也被称为内存储器,主要用来暂时存放CPU中的运算数据及 ...
keras系列︱keras是如何指定显卡且限制显存用量（GPU/CPU使用）
keras在使用GPU的时候有个特点,就是默认全部占满显存. 若单核GPU也无所谓,若是服务器GPU较多,性能较好,全部占满就太浪费了. 于是乎有以下五种情况: 1.指定GPU 2.使用固定显存的GP ...
GPU显存满了, 但是GPU利用率很低
训练效率低?GPU利用率上不去?快来看看别人家的tricks吧- batch size太小,导致模型的并行化程度太低.还有你的cpu.内存/显存带宽性能不足.
Windows监控：基于Prometheus+Grafana监控CPU、内存、磁盘、网络、GPU信息
目录 1. 系统环境准备 windows_exporter-0.13.0-amd64 Collectors 指标 nvidia_smi_exporter nvidia-smi.exe查看GPU信息 n ...
解决gpu没有运行进程，但是显存一直占用的方式
通常情况下,停止进程显存会释放但是如果在不正常情况关闭进程,可能不会释放,这个时候就会出现这样的情况: Mon Oct 19 16:00:00 2020 +--------------------- ...
ubuntu中显示本机的gpu_Ubuntu下实时查看Nvidia显卡显存占用情况和GPU温度
一.查看Nvidia显卡显存占用情况查看Nvidia显卡显存占用情况 nvidia-smi 效果如下: 显示的表格中: Fan: 风扇转速(0%–100%),N/A表示没有风扇 Temp: GPU温 ...
Ubuntu下实时查看Nvidia显卡显存占用情况和GPU温度
一.查看Nvidia显卡显存占用情况查看Nvidia显卡显存占用情况 nvidia-smi 效果如下: 显示的表格中: Fan: 风扇转速(0%–100%),N/A表示没有风扇 Temp: GPU温 ...

Nvidia GPU信息nvidia-smi（Persistence-M持久性内存、Volatile Uncorr. ECC显存错误校正、GPU-Util显卡利用率、Compute M.显卡计算模式）

文章目录

ubuntu20.04 nvidia-smi指令信息

Nvidia GPU信息nvidia-smi（Persistence-M持久性内存、Volatile Uncorr. ECC显存错误校正、GPU-Util显卡利用率、Compute M.显卡计算模式）相关推荐

最新文章

热门文章