随想录（windows上cuda环境安装）

cuda是nvidia公司用于gpu开发的一门语言。它来自于c，但是又对c进行了扩展。目前cuda广泛用于高性能计算、深度学习训练、嵌入式设备等各种应用场景。然而cuda入门容易，深入困难，如果只是基本概念学习，不足以对其有深刻的认识，因此最好结合具体的代码来一起开发，才能真正将gpu的作用发挥出来。

因此，我查看了个人笔记本，虽然显示的是nvidia mx150，一个比较低阶的移动GPU版本，但是也可以用cuda进行开发。所以利用下午这一段时间，安装了一下cuda环境，收获很多。

1、安装visual studio 2015

目前cuda支持vs2012、vs2013、vs2015、vs2017、vs2019。

2、下载cuda安装包

可以选择适合自己的cuda包，我这里下载的是cuda_10.2.89_441.22_win10.exe。

3、安装cuda软件

基本上不停的按下一步、下一步就可以了。中间cuda会先自解压，然后再进行安装。

4、确认cuda是否安装成功

在cmd下面输入nvcc --help，如果有打印信息，代表一切ok。

5、查找示例代码

如果安装没有问题，在C:\ProgramData\NVIDIA Corporation\CUDA Samples会看到一个目录，目录下有很多的示例代码。

6、编译示例代码

示例代码很多，分别是0_Simple、1_Utilities、2_Graphics、3_Imaging、4_Finance、5_Simulations、6_Advanced、7_CUDALibraries。刚开始的时候可以只选择编译一部分内容，比如0_Simple，如果编译、运行本身没多大问题，说明我们的安装是非常成功的。

7、简单的示例代码vectorAdd.cu

代码内容就是一个向量的计算，比较简洁，但是对我们了解cuda加速的原理足够了。

/*** Copyright 1993-2015 NVIDIA Corporation.  All rights reserved.** Please refer to the NVIDIA end user license agreement (EULA) associated* with this source code for terms and conditions that govern your use of* this software. Any use, reproduction, disclosure, or distribution of* this software and related documentation outside the terms of the EULA* is strictly prohibited.**//*** Vector addition: C = A + B.** This sample is a very basic sample that implements element by element* vector addition. It is the same as the sample illustrating Chapter 2* of the programming guide with some additions like error checking.*/#include <stdio.h>// For the CUDA runtime routines (prefixed with "cuda_")
#include <cuda_runtime.h>#include <helper_cuda.h>
/*** CUDA Kernel Device code** Computes the vector addition of A and B into C. The 3 vectors have the same* number of elements numElements.*/
__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{int i = blockDim.x * blockIdx.x + threadIdx.x;if (i < numElements){C[i] = A[i] + B[i];}
}/*** Host main routine*/
int
main(void)
{// Error code to check return values for CUDA callscudaError_t err = cudaSuccess;// Print the vector length to be used, and compute its sizeint numElements = 50000;size_t size = numElements * sizeof(float);printf("[Vector addition of %d elements]\n", numElements);// Allocate the host input vector Afloat *h_A = (float *)malloc(size);// Allocate the host input vector Bfloat *h_B = (float *)malloc(size);// Allocate the host output vector Cfloat *h_C = (float *)malloc(size);// Verify that allocations succeededif (h_A == NULL || h_B == NULL || h_C == NULL){fprintf(stderr, "Failed to allocate host vectors!\n");exit(EXIT_FAILURE);}// Initialize the host input vectorsfor (int i = 0; i < numElements; ++i){h_A[i] = rand()/(float)RAND_MAX;h_B[i] = rand()/(float)RAND_MAX;}// Allocate the device input vector Afloat *d_A = NULL;err = cudaMalloc((void **)&d_A, size);if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector A (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Allocate the device input vector Bfloat *d_B = NULL;err = cudaMalloc((void **)&d_B, size);if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector B (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Allocate the device output vector Cfloat *d_C = NULL;err = cudaMalloc((void **)&d_C, size);if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector C (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Copy the host input vectors A and B in host memory to the device input vectors in// device memoryprintf("Copy input data from the host memory to the CUDA device\n");err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector B from host to device (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Launch the Vector Add CUDA Kernelint threadsPerBlock = 256;int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;printf("CUDA kernel launch with %d blocks of %d threads\n", blocksPerGrid, threadsPerBlock);vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements);err = cudaGetLastError();if (err != cudaSuccess){fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Copy the device result vector in device memory to the host result vector// in host memory.printf("Copy output data from the CUDA device to the host memory\n");err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Verify that the result vector is correctfor (int i = 0; i < numElements; ++i){if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5){fprintf(stderr, "Result verification failed at element %d!\n", i);exit(EXIT_FAILURE);}}printf("Test PASSED\n");// Free device global memoryerr = cudaFree(d_A);if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector A (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}err = cudaFree(d_B);if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector B (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}err = cudaFree(d_C);if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector C (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}// Free host memoryfree(h_A);free(h_B);free(h_C);printf("Done\n");return 0;
}

8、验证是否可以从visual studio创建nvida工程

9、从windows到ubuntu开发环境

windows上面的vs环境调试比较方便，整体使用比较容易。对于开发嵌入式设备的朋友来说，在移植到nvidia jetson ubuntu环境之前，最好还是在windows环境上优化好，这样可以节省不少的时间。

随想录（windows上cuda环境安装）相关推荐

CUDA——Windows上CUDA的安装教程
1 致谢感谢网友没有人喜欢一个人.Young和无飞天下提供的帮助, 原文链接如下: https://blog.csdn.net/u010618587/article/details/82940528 ...
GPU 编程入门到精通（一）之 CUDA 环境安装
GPU 编程入门到精通(一)之 CUDA 环境安装标签: cudagpunvidia GPU 编程入门到精通(一)之 CUDA 环境安装标签: cudagpunvidia 2014-04-11 2 ...
php swoole环境搭建,windows系统php环境安装swoole具体步骤
Swoole原本不支持在Windows下安装的,所以我们要安装Cygwin来使用.在安装Cygwin下遇到了很多坑,百度经验上的文档不是很全,所以我把自己安装Cygwin和Swoole写下来相当于对自 ...
Windows下RStudio环境安装
Windows下RStudio环境安装 # RStudio关键词搜索: # 下载RStudio-1.4.1717.exe # # 一般情况下下载个免费的就好了: # RStudio-1.4.1717. ...
Windows下Rtools环境安装
Windows下Rtools环境安装你是否经常见这个提示: WARNING: Rtools is required to build R packages but is not currently ...
如何在windows上使用VMware安装macOS虚拟机
如何在windows上使用VMware安装macOS虚拟机一.准备工作 1.1 安装 VMware 1.2 下载macOS的安装包 1.3 下载VMware虚拟机解锁安装苹果系统工具二.解锁VMw ...
理清 NVIDIA，CUDA，cuDNN，nvcc 关系 CUDA 环境安装
文章目录概念介绍 CUDA 环境安装服务器基础环境安装基础软件 SSH 和 SFTP Anaconda 替换 apt 源 Git freeglut NVIDIA 驱动安装 CUDA-toolki ...
php pear 安装扩展,windows上应用pear安装php扩展
windows上使用pear安装php扩展在windows上使用pear安装php扩展相关网站: http://www.php.net http://pear.php.net http://pea ...
Windows 11 dapr 环境安装
安装其实很简单按照dapr官网操作步骤其实没什么大问题的,不过在没有好的网络访问github的情况下真是有点揪心啦!推荐使用工具 fastgithub 访问githubFast:Github让Git ...

随想录（windows上cuda环境安装）

随想录（windows上cuda环境安装）相关推荐

最新文章

热门文章