Install/Upgrade to The Newest Version of NVIDIA driver, CUDA and CuDNN libraries.

Environments:

  • Ubuntu 16.04 x64
  • NVIDIA driver 380 -> 410
  • CUDA 9.0 -> 10.1
  • CuDNN -> 7.6.1
  • NCCL -> 2.4.7

Step 1: Remove old version of NVIDIA driver.

$ sudo apt-get purge nvidia*

Step 2: Add repository url to source list.

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update

Step 3: Install NVIDIA driver 410.

 # option 1
$ sudo apt-get install nvidia-410
# if the package is not located, try other names.
$ sudo apt-get install nvidia-driver-410

Step 4: Install CUDA 10.1.

  • Choose specific version of CUDA deb from HERE
$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.1.168-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_10.1.168-1_amd64.deb
…
(Reading database ... 268266 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1604_10.1.168-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1604 (10.1.168-1) ...
dpkg: error processing archive cuda-repo-ubuntu1604_10.1.168-1_amd64.deb (--install):
 trying to overwrite '/etc/apt/sources.list.d/cuda.list', which is also in package cuda-repo-ubuntu1704 9.0.176-1
Errors were encountered while processing:
 cuda-repo-ubuntu1604_10.1.168-1_amd64.deb
To fix this problem, we add --force-overwrite argument to the command.
$ sudo dpkg -i --force-overwrite cuda-repo-ubuntu1604_10.1.168-1_amd64.deb
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install cuda

Step 5: Install CuDNN 7.6 and NCCL 2.4.

◎Installing from a Tar File
  1. In order to download cuDNN, ensure you are registered for the NVIDIA Developer Program.
  2. Go to: NVIDIA cuDNN home page.
  3. Click Download.
  4. Complete the short survey and click Submit.
  5. Accept the Terms and Conditions. A list of available download versions of cuDNN displays.
  6. Select the cuDNN version you want to install. A list of available resources displays.
  7. Install Link Redirect Trace Tool in Chrome.
  8. Select ‘cuDNN for Linux’, which is a Tar file.
  9. Get redirected link from the tool.
Download and unzip the cuDNN package.
$ wget [url-of-cudnn-tar-file]
$ tar -zxvf cudnn-10.1-linux-x64-v7.6.1.34.tgz

Soft-link CUDA directory path.
$ cd /usr/local
$ sudo ln -s cuda-10.1 cuda

Copy the following files into the CUDA Toolkit directory, and change the file permissions.
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

◎Installing from a Debian File
  • Choose specific version of cuDNN deb from HERE
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7_7.6.1.34-1+cuda10.1_amd64.deb
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7-dev_7.6.1.34-1+cuda10.1_amd64.deb
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl2_2.4.7-1+cuda10.1_amd64.deb
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl-dev_2.4.7-1+cuda10.1_amd64.deb
$ sudo dpkg -i libcudnn7_7.6.1.34-1+cuda10.1_amd64.deb
$ sudo dpkg -i libcudnn7-dev_7.6.1.34-1+cuda10.1_amd64.deb
$ sudo dpkg -i libnccl2_2.4.7-1+cuda10.1_amd64.deb
$ sudo dpkg -i libnccl-dev_2.4.7-1+cuda10.1_amd64.deb

Step 6: Setup environment variables.

Follow NVIDIA official instruction to setup variables.
$ vi ~/.bashrc

Append these lines to the file for 64 bit operation system.
export PATH=/usr/local/cuda/bin:${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Step 7: Reboot system and enable persistence mode.

$ sudo reboot now
…
$ nvidia-smi -pm 1
$ nvidia-smi
Fri Jul  5 07:06:21 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    24W / 300W |    136MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2336      G   /usr/lib/xorg/Xorg                            63MiB |
|    0      2819      G   gnome-shell                                   72MiB |
+-----------------------------------------------------------------------------+

References:


Share:

0 意見:

張貼留言