前言

虚拟机里的linux系统似乎不能安装nvidia显卡驱动，在虚拟机Ubuntu系统上跑tensorflow只能使用cpu。。看来还得去物理机装双系统或者用服务器了
因为，安装nvidia驱动报错：
sudo sh cuda_11.2.2_460.32.03_linux.run出现错误：

Installation failed. See log at /var/log/cuda-installer.log for details.

显示nvidia驱动安装失败
安装NVIDIA驱动，官网下载，https://www.nvidia.cn/Download/index.aspx?lang=cn，

su root
sh NVIDIA-Linux-x86_64-515.65.01.run

报错：
WARNING: You do not appear to have an NVIDIA GPU supported by the 515.65.01
NVIDIA Linux graphics driver installed in this system. For further
details, please see the appendix SUPPORTED NVIDIA GRAPHICS CHIPS in
the README available on the Linux driver download page at
www.nvidia.com.
查找原因后发现，虚拟机里的linux系统似乎不能安装nvidia显卡驱动，使用

ubuntu-drivers devices

看到的也只有VMWare的驱动

不过，以下在VMWare Ubuntu18.04安装失败的过程也可以看作我的笔记，供日后参考。
以下是我的安装过程：

一、更新源（有时下载时有用，有时没啥用，可跳过也可先加上）

为了方便，可以下载vim：

sudo apt-get install vim

若报错如下图，则

sudo apt-get update
sudo apt-get install vim不行的话再
sudo rm /var/lib/dpkg/lock
sudo apt-get install vim

继续：

sudo vim /etc/apt/sources.list打开sources.list后把光标移到末尾，按i进入编辑，添加清华源、阿里源：
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse按Esc键，再输入 :wq 后回车，保存并退出。输入命令更新源地址:
sudo apt-get update

二、下载安装CUDA、cudnn

开始之前，先查看对应版本：https://tensorflow.google.cn/install/source
我安装tensorflow-gpu-2.6.0、CUDA11.2（nvidia显卡驱动好像对应≥460.32.03）、cudnn8.1、GCC7.3.1

1、下载CUDA：

https://developer.nvidia.cn/cuda-toolkit-archive
找到对应版本

复制链接到迅雷下载，嘎嘎快。下载完成后拖入到虚拟机主目录可以新建文件夹。

2、下载cudnn

https://developer.nvidia.cn/rdp/cudnn-archive

拖入虚拟机

4、安装CUDA

参考：linux安装CUDA+cuDNN
Ubuntu 配置多个版本cuda(10.0、10.1)
以下是我的安装过程：
(1)安装CUDA:
先查看是否安装了GCC，因为下一步可能报错（错误见下）：

gcc -v

没安装的话安装gcc，注意版本配对：

sudo apt install gcc

gcc -v

显示是系统默认的7.5.0版本，tensorflow2.6.0官方给出的gcc版本是7.3.1，没找到，先试试下一句能不能成功验证gcc版本

sudo sh cuda_11.2.2_460.32.03_linux.run可能遇到的报错：Failed to verify gcc version. See log at /var/log/cuda-installer.log for details.如未报错
输入accpet
如果勾选了Driver安装，报错，则重来，按回车取消Driver，自行安装nvidia驱动（我在虚拟机里无法安装），光标移到install回车

此时

nvidia-smi

仍报错（因为虚拟机没安装nvidia驱动），物理机上安装可以参照前言部分
安装后nvidia-smi，如遇：

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
则

sudo apt-get install dkms
sudo dkms install -m nvidia -v 515.65.01

(2)添加环境变量

sudo vim ~/.bashrc
光标移动到末尾，按i，进入编辑export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"按esc键退出vim编辑器，再输入:wq保存文件并退出。
输入以下命令，激活更新后的环境变量：
source ~/.bashrc注意，上面路径中是用/cuda而不是/cuda-11.2，因为接下来需要通过软链接，以实现多个CUDA版本共存。输入下面代码，即可完成软链接的生成，其中/cuda-11.2替换成自己的cuda安装目录名称。sudo rm -rf /usr/local/cuda  #删除之前生成的软链接
sudo ln -s /usr/local/cuda-11.2 /usr/local/cuda  #生成新的软链接如果安装了多个版本的CUDA，也可使用上述两行命令进行版本切换
最后nvcc -V显示CUDA版本即完成

至此

@ubuntu:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
@ubuntu:~$ ls /usr/src | grep nvidia
nvidia-515.65.01

nvidia-smi应该能成功显示

(3)安装cudnn

tar -xzvf  /home/qmj/cudnnfiles/cudnn-11.2-linux-x64-v8.1.1.33.tgz
解压后生成名为CUDA的文件夹跟cuda_11.2.2_460.32.03_linux.run在同一个文件夹下sudo cp /home/qmj/CUDAfiles/cuda/include/cudnn*.h /usr/local/cuda/include/
sudo cp /home/qmj/CUDAfiles/cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*#查看cudnn版本
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

完成

三、安装pip

打算偷懒，不下载安装python，直接使用系统自带的python3.6。。。
安装pip和依赖包并升级

sudo apt-get install python3-pip python3-dev
sudo pip3 install --upgrade pip

四、安装pycharm

下载，拖入到Ubuntu主目录：https://www.jetbrains.com/pycharm/download/#section=linux

解压
tar -xzvf pycharm-community-2022.2.tar
安装
. pycharm.sh

以后可以在pycharm.sh所在的文件夹下使用

sh pycharm.sh &

来打开pycharm

参考：安装pycharm

五、安装tensorflow-gpu

pip3 install tensorflow-gpu==2.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/
太慢可以换阿里源，否则跳过这条：
pip3 install tensorflow-gpu==2.6.0 -i https://mirrors.aliyun.com/pypi/simple/?spm=a2c6h.25603864.0.0.7a345992gApCnw

pychram创建项目时interpreter选择python3.6，并勾选inherit啥啥啥就可以用上所有packages

我的tensorflow-gpu跑得有点不够快。。一会想想办法。

六、安装其他包

sudo apt-get install python3-pandas

修改最后面的包名称即可，太慢就在后面加源，末尾添加 -i https://啥啥啥

附录

安装gcc7.3.0
https://support.huaweicloud.com/instg-9000-A800_9000_9010/atlastrain_03_0062.html
需要先安装C/C++编译器
sudo apt install gcc g++
然后
以下步骤请在root用户下执行：

（1）sudo passwd root
设置密码（设置过的可跳过）
su root
进入root用户权限（退出用exit，回车）

（2）下载gcc-7.3.0.tar.gz，下载地址为 https://mirrors.tuna.tsinghua.edu.cn/gnu/gcc/gcc-7.3.0/gcc-7.3.0.tar.gz。
安装gcc时候会占用大量临时空间，所以先执行下面的命令清空/tmp目录：
sudo rm -rf /tmp/*

安装依赖。
(1) centos/bclinux执行如下命令安装：

yum install bzip2

(2) ubuntu/debian执行如下命令安装：

apt-get install bzip2

编译安装gcc。
进入gcc-7.3.0.tar.gz源码包所在目录，解压源码包，命令为：
tar -zxvf gcc-7.3.0.tar.gz

进入解压后的文件夹，执行如下命令下载gcc依赖包：
cd gcc-7.3.0
./contrib/download_prerequisites

如果执行上述命令报错，需要执行如下命令在“gcc-7.3.0/”文件夹下下载依赖包：

wget http://gcc.gnu.org/pub/gcc/infrastructure/gmp-6.1.0.tar.bz2
wget http://gcc.gnu.org/pub/gcc/infrastructure/mpfr-3.1.4.tar.bz2
wget http://gcc.gnu.org/pub/gcc/infrastructure/mpc-1.0.3.tar.gz
wget http://gcc.gnu.org/pub/gcc/infrastructure/isl-0.16.1.tar.bz2

下载好上述依赖包后，重新执行以下命令：

./contrib/download_prerequisites

如果上述命令校验失败，需要确保依赖包为一次性下载成功，无重复下载现象。

执行配置、编译和安装命令：
./configure --enable-languages=c,c++ --disable-multilib --with-system-zlib --prefix=/usr/local/gcc7.3.0

make -j15 # 通过grep -w processor /proc/cpuinfo|wc -l查看cpu数，示例为15，用户可自行设置相应参数。（make -j4 用了1小时，下文有可能遇到的报错和解决方法）

make install

注意：
其中“–prefix”参数用于指定gcc7.3.0安装路径，用户可自行配置，但注意不要配置为“/usr/local”及“/usr”，因为会与系统使用软件源默认安装的gcc相冲突，导致系统原始gcc编译环境被破坏。示例指定为“/usr/local/gcc7.3.0”。

（3）配置环境变量。
当用户执行训练时，需要用到gcc升级后的编译环境，因此要在训练脚本中配置环境变量，通过如下命令配置。

export LD_LIBRARY_PATH= i n s t a l l p a t h / l i b 64 : {install_path}/lib64: installpath/lib64:{LD_LIBRARY_PATH}

其中${install_path}为4.c中配置的gcc7.3.0安装路径，本示例为“/usr/local/gcc7.3.0/”。

说明：
本步骤为用户在需要用到gcc升级后的编译环境时才配置环境变量。

以下为 make -j4 时的报错：

1、
root@ubuntu:/home/qmj/gcc-7.3.0# make -j4

Command ‘make’ not found, but can be installed with:

apt install make
apt install make-guile

安装make即可

2、
make -j4
make[3]: 离开目录“/home/qmj/gcc-7.3.0/build-x86_64-pc-linux-gnu/libiberty”
make[2]: 离开目录“/home/qmj/gcc-7.3.0”
Makefile:25224: recipe for target ‘stage1-bubble’ failed
make[1]: *** [stage1-bubble] Error 2
make[1]: 离开目录“/home/qmj/gcc-7.3.0”
Makefile:941: recipe for target ‘all’ failed
make: *** [all] Error 2
或
make[2]: 离开目录“/home/qmj/gcc-7.3.0”
Makefile:25224: recipe for target ‘stage1-bubble’ failed
make[1]: *** [stage1-bubble] Error 2
make[1]: 离开目录“/home/qmj/gcc-7.3.0”
Makefile:941: recipe for target ‘all’ failed
make: *** [all] Error 2
或
configure: error: C++ compiler missing or inoperational
Makefile:11605: recipe for target ‘configure-stage1-libcpp’ failed
make[2]: *** [configure-stage1-libcpp] Error 1
make[2]: 离开目录“/home/qmj/gcc-7.3.0”
Makefile:25224: recipe for target ‘stage1-bubble’ failed
make[1]: *** [stage1-bubble] Error 2
make[1]: 离开目录“/home/qmj/gcc-7.3.0”
Makefile:941: recipe for target ‘all’ failed
make: *** [all] Error 2

解决：
exit
回车，退出root
sudo apt-get install g++
再
su root
sudo rm -rf /tmp/*
cd gcc-7.3.0
./configure --enable-languages=c,c++ --disable-multilib --with-system-zlib --prefix=/usr/local/gcc7.3.0
make -j4

2、
…/…/./gcc/lto-compress.c:34:10: fatal error: zlib.h: 没有那个文件或目录
#include <zlib.h>
^~~~~~~~
compilation terminated.
Makefile:1099: recipe for target ‘lto-compress.o’ failed
make[3]: *** [lto-compress.o] Error 1
make[3]: *** 正在等待未完成的任务…
rm gcc.pod
make[3]: 离开目录“/home/qmj/gcc-7.3.0/host-x86_64-pc-linux-gnu/gcc”
Makefile:4555: recipe for target ‘all-stage1-gcc’ failed
make[2]: *** [all-stage1-gcc] Error 2
make[2]: 离开目录“/home/qmj/gcc-7.3.0”
Makefile:25224: recipe for target ‘stage1-bubble’ failed
make[1]: *** [stage1-bubble] Error 2
make[1]: 离开目录“/home/qmj/gcc-7.3.0”
Makefile:941: recipe for target ‘all’ failed
make: *** [all] Error 2

解决：
exit
回车，退出root
sudo apt-get install zlib1g-dev
再
su root
sudo rm -rf /tmp/*
cd gcc-7.3.0
./configure --enable-languages=c,c++ --disable-multilib --with-system-zlib --prefix=/usr/local/gcc7.3.0
make -j4

3、
libtool: link: ranlib .libs/libtsan.a
libtool: link: rm -fr .libs/libtsan.lax
libtool: link: ( cd “.libs” && rm -f “libtsan.la” && ln -s “…/libtsan.la” “libtsan.la” )
make[4]: 离开目录“/home/qmj/gcc-7.3.0/x86_64-pc-linux-gnu/libsanitizer/tsan”
make[4]: 进入目录“/home/qmj/gcc-7.3.0/x86_64-pc-linux-gnu/libsanitizer”
true “AR_FLAGS=rc” “CC_FOR_BUILD=gcc” “CFLAGS=-g -O2” “CXXFLAGS=-g -O2 -D_GNU_SOURCE” “CFLAGS_FOR_BUILD=-g -O2” “CFLAGS_FOR_TARGET=-g -O2” “INSTALL=/usr/bin/install -c” “INSTALL_DATA=/usr/bin/install -c -m 644” “INSTALL_PROGRAM=/usr/bin/install -c” “INSTALL_SCRIPT=/usr/bin/install -c” “JC1FLAGS=” “LDFLAGS=” “LIBCFLAGS=-g -O2” “LIBCFLAGS_FOR_TARGET=-g -O2” “MAKE=make” "MAKEINFO=/home/qmj/gcc-7.3.0/missing makeinfo --split-size=5000000 --split-size=5000000 " “PICFLAG=” “PICFLAG_FOR_TARGET=” “SHELL=/bin/bash” “RUNTESTFLAGS=” “exec_prefix=/usr/local/gcc7.3.0” “infodir=/usr/local/gcc7.3.0/share/info” “libdir=/usr/local/gcc7.3.0/lib” “prefix=/usr/local/gcc7.3.0” “includedir=/usr/local/gcc7.3.0/include” “AR=ar” “AS=/home/qmj/gcc-7.3.0/host-x86_64-pc-linux-gnu/gcc/as” “LD=/home/qmj/gcc-7.3.0/host-x86_64-pc-linux-gnu/gcc/collect-ld” “LIBCFLAGS=-g -O2” “NM=/home/qmj/gcc-7.3.0/host-x86_64-pc-linux-gnu/gcc/nm” “PICFLAG=” “RANLIB=ranlib” “DESTDIR=” DO=all multi-do # make
make[4]: 离开目录“/home/qmj/gcc-7.3.0/x86_64-pc-linux-gnu/libsanitizer”
make[3]: 离开目录“/home/qmj/gcc-7.3.0/x86_64-pc-linux-gnu/libsanitizer”
make[2]: 离开目录“/home/qmj/gcc-7.3.0/x86_64-pc-linux-gnu/libsanitizer”
make[1]: 离开目录“/home/qmj/gcc-7.3.0”

完成了？
在root下接着
make install

出现：
Libraries have been installed in:
/usr/local/gcc7.3.0/lib/…/lib64

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR’
flag during linking and do at least one of the following:

add LIBDIR to the `LD_LIBRARY_PATH’ environment variable
during execution
add LIBDIR to the `LD_RUN_PATH’ environment variable
during linking
use the `-Wl,-rpath -Wl,LIBDIR’ linker flag
have your system administrator add LIBDIR to `/etc/ld.so.conf’

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.

make[4]: 对“install-data-am”无需做任何事。
make[4]: 离开目录“/home/qmj/gcc-7.3.0/x86_64-pc-linux-gnu/libatomic”
make[3]: 离开目录“/home/qmj/gcc-7.3.0/x86_64-pc-linux-gnu/libatomic”
make[2]: 离开目录“/home/qmj/gcc-7.3.0/x86_64-pc-linux-gnu/libatomic”
make[1]: 离开目录“/home/qmj/gcc-7.3.0”

完成！

Ubuntu18.04安装CUDA、cudnn、pycharm、tensorflow-gpu相关推荐

ubuntu查看cudnn是否安装成功_深度学习之目标检测系列（0） -ubuntu18.04+RTX2080Ti+cuda+cudnn安装...
ubuntu18.04使用cudnn的过程中会遇见很多问题,而且搜索很多文章发现比较杂乱,我这里总结一下,方便小伙伴少走几步坑. 我的服务器硬件配置信息 (2019-01购买) CPU: 英特尔(in ...
笔记本显卡1660Ti，Ubuntu18.04安装Cuda、Cudnn、Anaconda、PyTorch、Tensorflow和Keras
/**********************************************************************/ 注:写到最后不能放图片了,安装PyTorch.Tens ...
ubuntu18.04安装CUDA
ubuntu18.04安装CUDA 文章目录 ubuntu18.04安装CUDA 1.先要安装显卡驱动 2. CUDA官网下载toolkit 3. 安装CUDA 4. 配置环境变量 5. 查看CUDA ...
cudnn下载_记录新电脑安装Ubuntu18.04，CUDA, cuDNN全过程
不得不说,装机一时爽,一直装机一直爽.现在对于bug还有些期待了,要是安装的过程中不出现bug反而还在担心.话不多说,直接进入主题. 安装Ubuntu18.04: 下载系统镜像: https://ub ...
Ubuntu20.04安装CUDA+CUDNN+Conda+PyTorch
步骤: 1.安装显卡驱动: 2.安装CUDA: 3.安装CUDNN: 4.安装Conda: 5.安装Pytorch. 一.系统和硬件信息 1.Ubuntu 20.04 2.显卡:1050Ti 二.安装 ...
惠普暗夜精灵4双系统ubuntu18.04+CUDA10.1+cudnn+pytorch+tensorflow+ros
安装ubuntu18.04: 本机器型号安装ubuntu没有什么疑难杂症,跟着网上大佬的教程,下载镜像文件,做启动盘安装即可. 分区方案: efi 2G swap 16G / 500G 启动引导选择e ...
Ubuntu 20.04安装CUDA CUDNN 手把手带你撸
新手先看这之前一直使用CPU做训练,最近手上多了台单卡1080Ti显卡主机,于是开始研究GPU训练.用GPU训练一定会使用CUDA了,刚开始接触CUDA非常非常头痛,对小白很不友好,我几乎整了整整一 ...
Ubuntu18.04 安装CUDA前应注意的显卡、驱动版本信息
一.确定显卡. 安装CUDA,首先要确定自己的显卡是N卡(英伟达 NVIDIA的标志),对于笔记本来说,通过键盘右下角贴的标签是AMD(A卡)还是NVIDIA(N卡)就可以判断. 二.查看显卡型号. ...
Ubuntu18.04安装cuda+ [ERROR]: Install of driver component failed.的解决
目录前言安装cuda 配置环境变量前言在Ubuntu18.04中运行 nvcc -V 出现以下报错 -bash: nvcc: command not found 发现是服务器中没有安装cuda ...

Ubuntu18.04安装CUDA、cudnn、pycharm、tensorflow-gpu

前言