Environments:
- Ubuntu 16.04 x64
- NVIDIA driver 380 -> 410
- CUDA 9.0 -> 10.1
- CuDNN -> 7.6.1
- NCCL -> 2.4.7
Step 1: Remove old version of NVIDIA driver.
$ sudo apt-get purge nvidia*
Step 2: Add repository url to source list.
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update
Step 3: Install NVIDIA driver 410.
# option 1
$ sudo apt-get install nvidia-410
# if the package is not located, try other names.
$ sudo apt-get install nvidia-driver-410
Step 4: Install CUDA 10.1.
- Choose specific version of CUDA deb from HERE
$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.1.168-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_10.1.168-1_amd64.deb
…
(Reading database ... 268266 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1604_10.1.168-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1604 (10.1.168-1) ...
dpkg: error processing archive cuda-repo-ubuntu1604_10.1.168-1_amd64.deb (--install):
trying to overwrite '/etc/apt/sources.list.d/cuda.list', which is also in package cuda-repo-ubuntu1704 9.0.176-1
Errors were encountered while processing:
cuda-repo-ubuntu1604_10.1.168-1_amd64.deb
To fix this problem, we add
--force-overwrite
argument to the command.
$ sudo dpkg -i --force-overwrite cuda-repo-ubuntu1604_10.1.168-1_amd64.deb
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install cuda
Step 5: Install CuDNN 7.6 and NCCL 2.4.
◎Installing from a Tar File
- In order to download cuDNN, ensure you are registered for the NVIDIA Developer Program.
- Go to: NVIDIA cuDNN home page.
- Click Download.
- Complete the short survey and click Submit.
- Accept the Terms and Conditions. A list of available download versions of cuDNN displays.
- Select the cuDNN version you want to install. A list of available resources displays.
- Install Link Redirect Trace Tool in Chrome.
- Select ‘cuDNN for Linux’, which is a Tar file.
- Get redirected link from the tool.
Download and unzip the cuDNN package.
$ wget [url-of-cudnn-tar-file]
$ tar -zxvf cudnn-10.1-linux-x64-v7.6.1.34.tgz
Soft-link CUDA directory path.
$ cd /usr/local
$ sudo ln -s cuda-10.1 cuda
Copy the following files into the CUDA Toolkit directory, and change the file permissions.
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
◎Installing from a Debian File
- Choose specific version of cuDNN deb from HERE
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7_7.6.1.34-1+cuda10.1_amd64.deb
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7-dev_7.6.1.34-1+cuda10.1_amd64.deb
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl2_2.4.7-1+cuda10.1_amd64.deb
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl-dev_2.4.7-1+cuda10.1_amd64.deb
$ sudo dpkg -i libcudnn7_7.6.1.34-1+cuda10.1_amd64.deb
$ sudo dpkg -i libcudnn7-dev_7.6.1.34-1+cuda10.1_amd64.deb
$ sudo dpkg -i libnccl2_2.4.7-1+cuda10.1_amd64.deb
$ sudo dpkg -i libnccl-dev_2.4.7-1+cuda10.1_amd64.deb
Step 6: Setup environment variables.
Follow
NVIDIA official instruction to setup variables.
$ vi ~/.bashrc
Append these lines to the file for 64 bit operation system.
export PATH=/usr/local/cuda/bin:${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Step 7: Reboot system and enable persistence mode.
$ sudo reboot now
…
$ nvidia-smi -pm 1
$ nvidia-smi
Fri Jul 5 07:06:21 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 35C P0 24W / 300W | 136MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2336 G /usr/lib/xorg/Xorg 63MiB |
| 0 2819 G gnome-shell 72MiB |
+-----------------------------------------------------------------------------+
References: