前段时间做英伟达硬解得时候,显卡总是莫名挂掉,后来发现是因为显卡温度过高掉了。这几天找到CUDA中有NVML工具可以查看显卡信息,nvidia-smi也是基于这个工具包。

使用的CUDA版本为CUDA 8.0 。

1.给程序添加NVML

安装CUDA之后可以找到如下:

图1.NVML的例子

这里面包含的是NVML的一个例子。我的系统是64位的,可以找到NVML的lib和头文件如下:

图2.NVML的lib文件

图3.NVML头文件

在工程中包含NVML。我是新建的CUDA 8.0 Runtime工程,因为NVML包含在CUDA中,建CUDA 8.0 Runtime工程可以省去CUDA的配置工作,工程建立方法参见VS2013 VC++的.cpp文件调用CUDA的.cu文件中的函数

,CUDA 8.0为默认安装,系统为win10 64位。

在程序中直接包含NVML的头文件和lib文件即可:

#include "nvml.h"#pragma  comment(lib,"nvml.lib")

注意64位系统应该建立x64工程,因为在安装的CUDA中没有win32的nvml.lib。

2.NVML查询显卡信息

常用函数:

·nvmlInit()函数初始化NVML;

·nvmlDeviceGetCount(unsigned int *deviceCount)函数可以获得显卡数;

·nvmlDeviceGetHandleByIndex(unsigned int index, nvmlDevice_t *device)获取设备;

·nvmlDeviceGetName(nvmlDevice_t device, char *name, unsigned int length)查询设备的名称;

·nvmlDeviceGetPciInfo(nvmlDevice_t device, nvmlPciInfo_t *pci)获取PCI信息,对这个函数的重要性,例子中是这么说的

// pci.busId is very useful to know which device physically you're talking to
            // Using PCI identifier you can also match nvmlDevice handle to CUDA device.

·nvmlDeviceGetComputeMode(nvmlDevice_t device, nvmlComputeMode_t *mode)得到显卡当前所处的模式,模式由以下:

typedef enum nvmlComputeMode_enum
{
    NVML_COMPUTEMODE_DEFAULT           = 0,  //!< Default compute mode -- multiple contexts per device
    NVML_COMPUTEMODE_EXCLUSIVE_THREAD  = 1,  //!< Support Removed
    NVML_COMPUTEMODE_PROHIBITED        = 2,  //!< Compute-prohibited mode -- no contexts per device
    NVML_COMPUTEMODE_EXCLUSIVE_PROCESS = 3,  //!< Compute-exclusive-process mode -- only one context per device, usable from multiple threads at a time
   
    // Keep this last
    NVML_COMPUTEMODE_COUNT
} nvmlComputeMode_t;

·nvmlDeviceSetComputeMode(nvmlDevice_t device, nvmlComputeMode_t mode)可以修改显卡的模式;

·nvmlDeviceGetTemperatureThreshold(nvmlDevice_t device, nvmlTemperatureThresholds_t thresholdType, unsigned int *temp)查询温度阈值,具体有两种:

typedef enum nvmlTemperatureThresholds_enum
{
    NVML_TEMPERATURE_THRESHOLD_SHUTDOWN = 0,    // Temperature at which the GPU will shut down for HW protection
    NVML_TEMPERATURE_THRESHOLD_SLOWDOWN = 1,    // Temperature at which the GPU will begin slowdown
    // Keep this last
    NVML_TEMPERATURE_THRESHOLD_COUNT
} nvmlTemperatureThresholds_t;

当温度达到NVML_TEMPERATURE_THRESHOLD_SHUTDOWN 参数获取的温度时,显卡将自动关闭以保护硬件;当温度达到NVML_TEMPERATURE_THRESHOLD_SLOWDOWN参数获取的温度时,显卡的性能将下降。

·nvmlDeviceGetTemperature(nvmlDevice_t device, nvmlTemperatureSensors_t sensorType, unsigned int *temp)获取显卡当前温度;

·nvmlDeviceGetUtilizationRates(nvmlDevice_t device, nvmlUtilization_t *utilization)获取设备的使用率(原注释:Retrieves the current utilization rates for the device's major subsystems。不知道理解错了没有),使用率包括以下:

typedef struct nvmlUtilization_st
{
    unsigned int gpu;                //!< Percent of time over the past sample period during which one or more kernels was executing on the GPU
    unsigned int memory;             //!< Percent of time over the past sample period during which global (device) memory was being read or written
} nvmlUtilization_t;

·nvmlDeviceGetMemoryInfo(nvmlDevice_t device, nvmlMemory_t *memory)    Retrieves the amount of used, free and total memory available on the device, in bytes。

·nvmlDeviceGetBAR1MemoryInfo(nvmlDevice_t device, nvmlBAR1Memory_t *bar1Memory)   Gets Total, Available and Used size of BAR1 memory.(不知道这种与上一种有什么区别,有待后续学习)

·nvmlDeviceGetComputeRunningProcesses(nvmlDevice_t device, unsigned int *infoCount, nvmlProcessInfo_t *infos)    Get information about processes with a compute context on a device。应该是获取当前在使用显卡的程序信息。

·nvmlDeviceGetMaxClockInfo(nvmlDevice_t device, nvmlClockType_t type, unsigned int *clock)   Retrieves the maximum clock speeds for the device。包括以下:

typedef enum nvmlClockType_enum
{
    NVML_CLOCK_GRAPHICS  = 0,        //!< Graphics clock domain
    NVML_CLOCK_SM        = 1,        //!< SM clock domain
    NVML_CLOCK_MEM       = 2,        //!< Memory clock domain
    NVML_CLOCK_VIDEO     = 3,        //!< Video encoder/decoder clock domain
   
    // Keep this last
    NVML_CLOCK_COUNT //<! Count of clock types
} nvmlClockType_t;

·nvmlDeviceGetClockInfo(nvmlDevice_t device, nvmlClockType_t type, unsigned int *clock)   Retrieves the current clock speeds for the device.上面是获取最大的,这个是获取当前的。

代码示例:

#include "cuda_kernels.h"#include "nvml.h"#include <stdio.h>
#include <windows.h>
#include <winbase.h>
#include <tlhelp32.h>
#include <psapi.h>   #pragma comment(lib,"kernel32.lib")
#pragma comment(lib,"advapi32.lib")  #pragma  comment(lib,"nvml.lib")const char * convertToComputeModeString(nvmlComputeMode_t mode)
{switch (mode){case NVML_COMPUTEMODE_DEFAULT:return "Default";case NVML_COMPUTEMODE_EXCLUSIVE_THREAD:return "Exclusive_Thread";case NVML_COMPUTEMODE_PROHIBITED:return "Prohibited";case NVML_COMPUTEMODE_EXCLUSIVE_PROCESS:return "Exclusive Process";default:return "Unknown";}
}int main()
{cuAdd();nvmlReturn_t result;unsigned int device_count, i;// First initialize NVML libraryresult = nvmlInit();if (NVML_SUCCESS != result){printf("Failed to initialize NVML: %s\n", nvmlErrorString(result));printf("Press ENTER to continue...\n");getchar();return 1;}result = nvmlDeviceGetCount(&device_count);if (NVML_SUCCESS != result){printf("Failed to query device count: %s\n", nvmlErrorString(result));goto Error;}printf("Found %d device%s\n\n", device_count, device_count != 1 ? "s" : "");printf("Listing devices:\n");while (true){for (i = 0; i < device_count; i++){nvmlDevice_t device;char name[NVML_DEVICE_NAME_BUFFER_SIZE];nvmlPciInfo_t pci;nvmlComputeMode_t compute_mode;// Query for device handle to perform operations on a device// You can also query device handle by other features like:// nvmlDeviceGetHandleBySerial// nvmlDeviceGetHandleByPciBusIdresult = nvmlDeviceGetHandleByIndex(i, &device);if (NVML_SUCCESS != result){printf("Failed to get handle for device %i: %s\n", i, nvmlErrorString(result));goto Error;}result = nvmlDeviceGetName(device, name, NVML_DEVICE_NAME_BUFFER_SIZE);if (NVML_SUCCESS != result){printf("Failed to get name of device %i: %s\n", i, nvmlErrorString(result));goto Error;}// pci.busId is very useful to know which device physically you're talking to// Using PCI identifier you can also match nvmlDevice handle to CUDA device.result = nvmlDeviceGetPciInfo(device, &pci);if (NVML_SUCCESS != result){printf("Failed to get pci info for device %i: %s\n", i, nvmlErrorString(result));goto Error;}printf("%d. %s [%s]\n", i, name, pci.busId);// This is a simple example on how you can modify GPU's stateresult = nvmlDeviceGetComputeMode(device, &compute_mode);if (NVML_ERROR_NOT_SUPPORTED == result)printf("\t This is not CUDA capable device\n");else if (NVML_SUCCESS != result){printf("Failed to get compute mode for device %i: %s\n", i, nvmlErrorString(result));goto Error;}else{// try to change compute modeprintf("\t Changing device's compute mode from '%s' to '%s'\n",convertToComputeModeString(compute_mode),convertToComputeModeString(NVML_COMPUTEMODE_PROHIBITED));result = nvmlDeviceSetComputeMode(device, NVML_COMPUTEMODE_PROHIBITED);if (NVML_ERROR_NO_PERMISSION == result)printf("\t\t Need root privileges to do that: %s\n", nvmlErrorString(result));else if (NVML_ERROR_NOT_SUPPORTED == result)printf("\t\t Compute mode prohibited not supported. You might be running on\n""\t\t windows in WDDM driver model or on non-CUDA capable GPU.\n");else if (NVML_SUCCESS != result){printf("\t\t Failed to set compute mode for device %i: %s\n", i, nvmlErrorString(result));goto Error;}else{printf("\t Restoring device's compute mode back to '%s'\n",convertToComputeModeString(compute_mode));result = nvmlDeviceSetComputeMode(device, compute_mode);if (NVML_SUCCESS != result){printf("\t\t Failed to restore compute mode for device %i: %s\n", i, nvmlErrorString(result));goto Error;}}}printf("\n");printf("----- 温度 ----- \n");unsigned int temperature_threshold = 100;result = nvmlDeviceGetTemperatureThreshold(device, NVML_TEMPERATURE_THRESHOLD_SHUTDOWN, &temperature_threshold);if (NVML_SUCCESS != result){printf("device %i Failed to get NVML_TEMPERATURE_THRESHOLD_SHUTDOWN: %s\n", i, nvmlErrorString(result));}elseprintf("截止温度: %d 摄氏度  (Temperature at which the GPU will shut down for HW protection)\n", temperature_threshold);result = nvmlDeviceGetTemperatureThreshold(device, NVML_TEMPERATURE_THRESHOLD_SLOWDOWN, &temperature_threshold);if (NVML_SUCCESS != result){printf("device %i Failed NVML_TEMPERATURE_THRESHOLD_SLOWDOWN: %s\n", i, nvmlErrorString(result));}elseprintf("上限温度: %d 摄氏度  (Temperature at which the GPU will begin slowdown)\n", temperature_threshold);unsigned int temperature = 0;result = nvmlDeviceGetTemperature(device, NVML_TEMPERATURE_GPU, &temperature);if (NVML_SUCCESS != result){printf("device %i NVML_TEMPERATURE_GPU Failed: %s\n", i, nvmlErrorString(result));}elseprintf("当前温度: %d 摄氏度 \n", temperature);//使用率printf("\n");nvmlUtilization_t utilization;result = nvmlDeviceGetUtilizationRates(device, &utilization);if (NVML_SUCCESS != result){printf(" device %i nvmlDeviceGetUtilizationRates Failed : %s\n", i, nvmlErrorString(result));}else{printf("----- 使用率 ----- \n");printf("GPU 使用率: %lld %% \n", utilization.gpu);printf("显存使用率: %lld %% \n", utilization.memory);}//FB memoryprintf("\n");nvmlMemory_t memory;result = nvmlDeviceGetMemoryInfo(device, &memory);if (NVML_SUCCESS != result){printf("device %i nvmlDeviceGetMemoryInfo Failed : %s\n", i, nvmlErrorString(result));}else{printf("------ FB memory ------- \n");printf("Total installed FB memory: %lld bytes \n", memory.total);printf("Unallocated FB memory: %lld bytes \n", memory.free);printf("Allocated FB memory: %lld bytes \n", memory.used);}//BAR1 memoryprintf("\n");nvmlBAR1Memory_t bar1Memory;result = nvmlDeviceGetBAR1MemoryInfo(device, &bar1Memory);if (NVML_SUCCESS != result){printf("device %i  nvmlDeviceGetBAR1MemoryInfo Failed : %s\n", i, nvmlErrorString(result));}else{printf("------ BAR1 memory ------- \n");printf("Total BAR1 memory: %lld bytes \n", bar1Memory.bar1Total);printf("Unallocated BAR1 memory: %lld bytes \n", bar1Memory.bar1Free);printf("Allocated BAR1 memory: %lld bytes \n", bar1Memory.bar1Used);}//Information about running compute processes on the GPUprintf("\n");unsigned int infoCount;nvmlProcessInfo_t infos[999];result = nvmlDeviceGetComputeRunningProcesses(device, &infoCount, infos);if (NVML_SUCCESS != result){printf("Failed to get ComputeRunningProcesses for device %i: %s\n", i, nvmlErrorString(result));}else{HANDLE handle; //定义CreateToolhelp32Snapshot系统快照句柄       handle = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);//获得系统快照句柄     PROCESSENTRY32 *info; //定义PROCESSENTRY32结构字指     //PROCESSENTRY32 结构的 dwSize 成员设置成 sizeof(PROCESSENTRY32)      info = new PROCESSENTRY32;info->dwSize = sizeof(PROCESSENTRY32);//调用一次     Process32First 函数,从快照中获取进程列表
                Process32First(handle, info);//重复调用 Process32Next,直到函数返回 FALSE 为止
printf("------ Information about running compute processes on the GPU ------- \n");for (int i = 0; i < infoCount; i++){printf("PID: %d  显存占用:%lld bytes   ", infos[i].pid, infos[i].usedGpuMemory);while (Process32Next(handle, info) != FALSE){if (info->th32ProcessID == infos[i].pid){//printf("  %s\n", info->szExeFile);
HANDLE hProcess = NULL;//打开目标进程  hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, info->th32ProcessID);if (hProcess == NULL) {printf("\nOpen Process fAiled:%d\n", GetLastError());break;}char strFilePath[MAX_PATH];GetModuleFileNameEx(hProcess, NULL, strFilePath, MAX_PATH);printf(" %s\n", strFilePath);CloseHandle(hProcess);break;}}}delete info;CloseHandle(handle);}//BAR1 memoryprintf("\n");printf("------ Clocks ------- \n"); unsigned int max_clock;result = nvmlDeviceGetMaxClockInfo(device, NVML_CLOCK_GRAPHICS, &max_clock);if (NVML_SUCCESS != result){printf("device %i   nvmlDeviceGetMaxClockInfo Failed : %s\n", i, nvmlErrorString(result));}unsigned int clock;result = nvmlDeviceGetClockInfo(device, NVML_CLOCK_GRAPHICS, &clock);if (NVML_SUCCESS != result){printf("Failed to get NVML_CLOCK_GRAPHICS info for device %i: %s\n", i, nvmlErrorString(result));}else{printf("GRAPHICS: %6d Mhz   max clock :%d  \n", clock, max_clock);}result = nvmlDeviceGetMaxClockInfo(device, NVML_CLOCK_SM, &max_clock);if (NVML_SUCCESS != result){printf("Failed to get max NVML_CLOCK_SM for device %i: %s\n", i, nvmlErrorString(result));}result = nvmlDeviceGetClockInfo(device, NVML_CLOCK_SM, &clock);if (NVML_SUCCESS != result){printf("Failed to get current NVML_CLOCK_SM for device %i: %s\n", i, nvmlErrorString(result));}else{printf("      SM: %6d Mhz   max clock :%d   \n", clock, max_clock);}result = nvmlDeviceGetMaxClockInfo(device, NVML_CLOCK_MEM, &max_clock);if (NVML_SUCCESS != result){printf("Failed to get max NVML_CLOCK_MEM for device %i: %s\n", i, nvmlErrorString(result));}result = nvmlDeviceGetClockInfo(device, NVML_CLOCK_MEM, &clock);if (NVML_SUCCESS != result){printf("Failed to get current NVML_CLOCK_MEM for device %i: %s\n", i, nvmlErrorString(result));}else{printf("     MEM: %6d Mhz   max clock :%d   \n", clock, max_clock);}result = nvmlDeviceGetMaxClockInfo(device, NVML_CLOCK_VIDEO, &max_clock);if (NVML_SUCCESS != result){printf("Failed to get max NVML_CLOCK_VIDEO for device %i: %s\n", i, nvmlErrorString(result));}result = nvmlDeviceGetClockInfo(device, NVML_CLOCK_VIDEO, &clock);if (NVML_SUCCESS != result){printf("Failed to get current NVML_CLOCK_VIDEO for device %i: %s\n", i, nvmlErrorString(result));}else{printf("   VIDEO: %6d Mhz   max clock :%d   \n", clock, max_clock);}}printf("-------------------------------------------------------------------- \n");Sleep(1000);}Error:result = nvmlShutdown();if (NVML_SUCCESS != result)printf("Failed to shutdown NVML: %s\n", nvmlErrorString(result));system("pause");return 0;
}

虽然我已经把nvml.dll拷贝到运行目录,程序应该是可以正常运行了。也做一下nvidia-smi的环境配置,参考NVIDIA 显卡信息(CUDA信息的查看),我把他的复制到下面来:

1. nvidia-smi 查看显卡信息

nvidia-smi 指的是 NVIDIA System Management Interface;

在安装完成 NVIDIA 显卡驱动之后,对于 windows 用户而言,cmd 命令行界面还无法识别 nvidia-smi 命令,需要将相关环境变量添加进去。如将 NVIDIA 显卡驱动安装在默认位置,nvidia-smi 命令所在的完整路径应当为:

C:\Program Files\NVIDIA Corporation\NVSMI

也即将上述路径添加进 Path 系统环境变量中。

2. 查看 CUDA 信息

  • CUDA 的版本:

    • 进入命令行:nvcc -V

3.运行结果

图4.GeForce 940M查询结果

图5.Tesla P4查询结果

NVML对GeForce 940M的支持不怎么好,对Tesla P4支持得比较好。

工程源码:http://download.csdn.net/download/qq_33892166/9841800

转载于:https://www.cnblogs.com/betterwgo/p/6858806.html

NVML查询显卡信息相关推荐

  1. linux系统下查看 显卡 信息

    linux系统下查看 显卡 信息 命令:lspci |grep -i vga 如果是Nvidia显卡 1.基本命令:nvidia-smi nvidia-smi(The Nvidia System Ma ...

  2. windows平台下 c++获取 系统版本 网卡 内存 CPU 硬盘 显卡信息

    GetsysInfo.h: #ifndef _H_GETSYSINFO #define _H_GETSYSINFO#pragma once#include <afxtempl.h>clas ...

  3. 苹果手机的uuid查询_苹果Mac查询UUID信息, 硬盘接口类型的查询方法(图文)

    UUID是区别每一个苹果设备的唯一识别码,通过UUID我们可以看到IOS系统的MAC的显卡信息和硬盘接口信息等.那么MAC怎么查询UUID信息呢? 1.查询Apple ID:点击dock上面的&quo ...

  4. 联想台式计算机的设备序列号,WMI获取硬件信息封装函数方法(联想台式机出厂编号 CPUID BIOS序列号 硬盘信息 显卡信息 MAC地址)...

    今天玩了一把WMI,查询了一下电脑的硬件信息,感觉很多代码都是可以提取出来的,就自己把那些公共部分提出出来,以后如果要获取某部分的硬件信息就不用写一个一个的函数,比如获取MAC地址就写一个获取MAC地 ...

  5. Linux、ubuntu系统下查看显卡型号、显卡信息详解

    本文将介绍Linux系统下查看显卡硬件信息内容如下: 如何在ubuntu系统中查看显卡型号.(其他文章大多只介绍了命令而未介绍如何对应显卡型号) 在ubuntu系统中显卡详细信息,以及所代表的含义. ...

  6. ubuntu/centos 查看显卡信息

    ubuntu/centos查看显卡信息 lspci |grep -i vga    [适用于单块显卡信息查询] 输出:02:00.0 VGA compatible controller: NVIDIA ...

  7. WMI技术介绍和应用——查询硬件信息

    这个月实在太忙了,一直没有时间去继续写WMI的应用例子. 本来是希望将<WMI技术介绍和应用>系列博文写的像WMI百科全书般,但是貌似对这个技术感兴趣的同学并不多,所以我决定对部分知识点点 ...

  8. 用JDBC写一个学生管理系统(添加、删除、修改、查询学生信息)(二)

    本文上接用JDBC写一个学生管理系统(添加.删除.修改.查询学生信息) 这次主要是对上一文中的查询方法做一下调整,用创建内部类的方法来实现学生信息的查询. 我们先要定义一个接口IRowMapper: ...

  9. 用JDBC写一个学生管理系统(添加、删除、修改、查询学生信息)

    首先需要用Navicat Premium创建一个student表 用Java连接好MySQL数据库(需要copy一个mysql-connector-java-5.1.44-bin.jar包,该包可在网 ...

最新文章

  1. CAT 性能优化的实践和思考
  2. solidity编码规范
  3. 案例|自建or现成工具?小型创业团队敏捷研发探索
  4. 32.分配数组的方式
  5. 核磁共振测井设备市场现状及未来发展趋势分析
  6. 怎样用计算机xp命令修复软件,系统之家xp系统修复控制台命令使用方法
  7. 联想凌拓“开业大吉” 是试水还是全面变革的前兆?
  8. php 一键登录插件,帝国CMS一键登录插件(带后台管理)
  9. 电脑上如何禁止一切弹窗广告?永久关闭桌面弹出广告
  10. 大数据面试常见问题(七)——面试部分
  11. 基于pgpool-II读写分离+postgresql10主从从流复制高可用部署方案
  12. Android中使用Post带参数请求的方法
  13. Altium Designer导入Allegro17.4 PCB文件
  14. 一分钟了解阿里云产品:万网虚拟主机如何正确绑定域名
  15. 基于STM32的脉搏心率检测仪(OLED可以实时显示脉冲波形)
  16. 计算机系统的储存单位及换算关系,计算机储存单位和换算关系
  17. 无线网络现漏洞 信息泄露风险大增
  18. python爬虫-爬妹子图
  19. 树的重心——DFS求解
  20. node 批量下载百度图片壁纸

热门文章

  1. Windows下设置kiftd开机启动
  2. java小程序增删改查_用java编一个卡拉ok小程序 有增删改查就可以
  3. 厦门大学嘉庚学院的计算机科学与技术专业怎么样啊,张思民教授介绍计算机科学与技术专业和软件工程专业...
  4. 大卖家私域流量分享:从淘系私域到微信私域 品牌拓展私域流量的核心法则是…...
  5. 冒险岛服务器租用需要考虑哪些因素?
  6. 【2020-10-29】记一次WebSocket握手验证反爬虫
  7. Maven及插件安装
  8. 深入理解css3 3d变换
  9. FPGA之4K图像处理
  10. Why HTAP Matters