随想录(windows上cuda环境安装)
【 声明:版权所有,欢迎转载,请勿用于商业用途。 联系信箱:feixiaoxing @163.com】
cuda是nvidia公司用于gpu开发的一门语言。它来自于c,但是又对c进行了扩展。目前cuda广泛用于高性能计算、深度学习训练、嵌入式设备等各种应用场景。然而cuda入门容易,深入困难,如果只是基本概念学习,不足以对其有深刻的认识,因此最好结合具体的代码来一起开发,才能真正将gpu的作用发挥出来。
因此,我查看了个人笔记本,虽然显示的是nvidia mx150,一个比较低阶的移动GPU版本,但是也可以用cuda进行开发。所以利用下午这一段时间,安装了一下cuda环境,收获很多。
1、安装visual studio 2015
目前cuda支持vs2012、vs2013、vs2015、vs2017、vs2019。
2、下载cuda安装包
可以选择适合自己的cuda包,我这里下载的是cuda_10.2.89_441.22_win10.exe。
3、安装cuda软件
基本上不停的按下一步、下一步就可以了。中间cuda会先自解压,然后再进行安装。
4、确认cuda是否安装成功
在cmd下面输入nvcc --help,如果有打印信息,代表一切ok。
5、查找示例代码
如果安装没有问题,在C:\ProgramData\NVIDIA Corporation\CUDA Samples会看到一个目录,目录下有很多的示例代码。
6、编译示例代码
示例代码很多,分别是0_Simple、1_Utilities、2_Graphics、3_Imaging、4_Finance、5_Simulations、6_Advanced、7_CUDALibraries。刚开始的时候可以只选择编译一部分内容,比如0_Simple,如果编译、运行本身没多大问题,说明我们的安装是非常成功的。
7、简单的示例代码vectorAdd.cu
代码内容就是一个向量的计算,比较简洁,但是对我们了解cuda加速的原理足够了。
/*** Copyright 1993-2015 NVIDIA Corporation. All rights reserved.** Please refer to the NVIDIA end user license agreement (EULA) associated* with this source code for terms and conditions that govern your use of* this software. Any use, reproduction, disclosure, or distribution of* this software and related documentation outside the terms of the EULA* is strictly prohibited.**//*** Vector addition: C = A + B.** This sample is a very basic sample that implements element by element* vector addition. It is the same as the sample illustrating Chapter 2* of the programming guide with some additions like error checking.*/#include <stdio.h>// For the CUDA runtime routines (prefixed with "cuda_")
#include <cuda_runtime.h>#include <helper_cuda.h>
/*** CUDA Kernel Device code** Computes the vector addition of A and B into C. The 3 vectors have the same* number of elements numElements.*/
__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{int i = blockDim.x * blockIdx.x + threadIdx.x;if (i < numElements){C[i] = A[i] + B[i];}
}/*** Host main routine*/
int
main(void)
{// Error code to check return values for CUDA callscudaError_t err = cudaSuccess;// Print the vector length to be used, and compute its sizeint numElements = 50000;size_t size = numElements * sizeof(float);printf("[Vector addition of %d elements]\n", numElements);// Allocate the host input vector Afloat *h_A = (float *)malloc(size);// Allocate the host input vector Bfloat *h_B = (float *)malloc(size);// Allocate the host output vector Cfloat *h_C = (float *)malloc(size);// Verify that allocations succeededif (h_A == NULL || h_B == NULL || h_C == NULL){fprintf(stderr, "Failed to allocate host vectors!\n");exit(EXIT_FAILURE);}// Initialize the host input vectorsfor (int i = 0; i < numElements; ++i){h_A[i] = rand()/(float)RAND_MAX;h_B[i] = rand()/(float)RAND_MAX;}// Allocate the device input vector Afloat *d_A = NULL;err = cudaMalloc((void **)&d_A, size);if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector A (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Allocate the device input vector Bfloat *d_B = NULL;err = cudaMalloc((void **)&d_B, size);if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector B (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Allocate the device output vector Cfloat *d_C = NULL;err = cudaMalloc((void **)&d_C, size);if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector C (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Copy the host input vectors A and B in host memory to the device input vectors in// device memoryprintf("Copy input data from the host memory to the CUDA device\n");err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector B from host to device (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Launch the Vector Add CUDA Kernelint threadsPerBlock = 256;int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;printf("CUDA kernel launch with %d blocks of %d threads\n", blocksPerGrid, threadsPerBlock);vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements);err = cudaGetLastError();if (err != cudaSuccess){fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Copy the device result vector in device memory to the host result vector// in host memory.printf("Copy output data from the CUDA device to the host memory\n");err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Verify that the result vector is correctfor (int i = 0; i < numElements; ++i){if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5){fprintf(stderr, "Result verification failed at element %d!\n", i);exit(EXIT_FAILURE);}}printf("Test PASSED\n");// Free device global memoryerr = cudaFree(d_A);if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector A (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}err = cudaFree(d_B);if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector B (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}err = cudaFree(d_C);if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector C (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Free host memoryfree(h_A);free(h_B);free(h_C);printf("Done\n");return 0;
}
8、验证是否可以从visual studio创建nvida工程
9、从windows到ubuntu开发环境
windows上面的vs环境调试比较方便,整体使用比较容易。对于开发嵌入式设备的朋友来说,在移植到nvidia jetson ubuntu环境之前,最好还是在windows环境上优化好,这样可以节省不少的时间。
随想录(windows上cuda环境安装)相关推荐
- CUDA——Windows上CUDA的安装教程
1 致谢 感谢网友没有人喜欢一个人.Young和无飞天下提供的帮助, 原文链接如下: https://blog.csdn.net/u010618587/article/details/82940528 ...
- GPU 编程入门到精通(一)之 CUDA 环境安装
GPU 编程入门到精通(一)之 CUDA 环境安装 标签: cudagpunvidia GPU 编程入门到精通(一)之 CUDA 环境安装 标签: cudagpunvidia 2014-04-11 2 ...
- php swoole环境搭建,windows系统php环境安装swoole具体步骤
Swoole原本不支持在Windows下安装的,所以我们要安装Cygwin来使用.在安装Cygwin下遇到了很多坑,百度经验上的文档不是很全,所以我把自己安装Cygwin和Swoole写下来相当于对自 ...
- Windows下RStudio环境安装
Windows下RStudio环境安装 # RStudio关键词搜索: # 下载RStudio-1.4.1717.exe # # 一般情况下下载个免费的就好了: # RStudio-1.4.1717. ...
- Windows下Rtools环境安装
Windows下Rtools环境安装 你是否经常见这个提示: WARNING: Rtools is required to build R packages but is not currently ...
- 如何在windows上使用VMware安装macOS虚拟机
如何在windows上使用VMware安装macOS虚拟机 一.准备工作 1.1 安装 VMware 1.2 下载macOS的安装包 1.3 下载VMware虚拟机解锁安装苹果系统工具 二.解锁VMw ...
- 理清 NVIDIA,CUDA,cuDNN,nvcc 关系 CUDA 环境安装
文章目录 概念介绍 CUDA 环境安装 服务器基础环境安装 基础软件 SSH 和 SFTP Anaconda 替换 apt 源 Git freeglut NVIDIA 驱动安装 CUDA-toolki ...
- php pear 安装扩展,windows上应用pear安装php扩展
windows上使用pear安装php扩展 在windows上使用pear安装php扩展 相关网站: http://www.php.net http://pear.php.net http://pea ...
- Windows 11 dapr 环境安装
安装其实很简单按照dapr官网操作步骤其实没什么大问题的,不过在没有好的网络访问github的情况下真是有点揪心啦!推荐使用 工具 fastgithub 访问githubFast:Github让Git ...
最新文章
- MySql笔记之数据表
- matlab怎么设置x轴距,MATLAB及其在电气工程中的应用苏小林第四章.ppt
- leetcode 581. Shortest Unsorted Continuous Subarray | 581. 最短无序连续子数组(单调栈)
- python运动目标检测与跟踪_基于OpenCV的运动目标检测与跟踪
- session对象的使用
- linux创建虚拟账号,linux vsftpd 创建虚拟用户 过程记录
- python subplots_python fig,ax = plt.subplots()
- Hyperledger Fabric教程(7)--启动fabric区块链网络 如何查看节点日志
- win10设置护眼模式
- HDU 5762 Teacher Bo (水题)
- [生存志] 第3节 序 汉字的韵脚和格律(下)
- 热电偶 matlab,基于MATLAB的陶瓷窑炉温度与热电偶热电势关系的数学模型研究
- unity基础(3)——从Unity Asset store获取资源
- 账号被罚了,申诉的结果出来了,果然
- 量化交易5-backtrader编写均线策略
- 双连通分量(DCC)
- SIP 请求方法(4)-CANCEL OPTIONS
- WIN10共享打印机连接出现0x0000011b错误代码无法共享打印
- 下载全免费瑞星升级包的好地方
- 数据中心模块化、标准化、预制化、定制化、智能化……傻傻分不清楚?大咖来帮你!...