Install Multiple version of CUDA Libraries (CUDA 9.x and 10.x) on the Same Machine.

In my computer, I installed CUDA 10 for my project. But some old project needs to be run with CUDA 9.


Environments:

  • Ubuntu 16.04
  • Anaconda (Assume installed)
  • CUDA 9.x and 10.x 
  • PyTorch

Step 1: Install CUDA 9.0

Follow the official TensorFlow guide to install CUDA libraries.
# Add NVIDIA package repository
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
sudo apt install ./cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt update

# Install the NVIDIA driver
# Issue with driver install requires creating /usr/lib/nvidia
sudo mkdir /usr/lib/nvidia
sudo apt-get install --no-install-recommends nvidia-410
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install CUDA and tools. Include optional NCCL 2.x
sudo apt install cuda9.0 cuda-cublas-9-0 cuda-cufft-9-0 cuda-curand-9-0 \
    cuda-cusolver-9-0 cuda-cusparse-9-0 libcudnn7=7.2.1.38-1+cuda9.0 \
    libnccl2=2.2.13-1+cuda9.0 cuda-command-line-tools-9-0

Step 2: Install CUDA 10.0

Follow the official TensorFlow guide to install CUDA libraries.
# Add NVIDIA package repositories
# Add HTTPS support for apt-key
sudo apt-get install gnupg-curl
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
# Issue with driver install requires creating /usr/lib/nvidia
sudo mkdir /usr/lib/nvidia
sudo apt-get install --no-install-recommends nvidia-410
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-0 \
    libcudnn7=7.4.1.5-1+cuda10.0  \
    libcudnn7-dev=7.4.1.5-1+cuda10.0

Step 3: Link CUDA To User Environment

Append its installation directory to the $LD_LIBRARY_PATH environmental variable. Assume that /usr/local/cuda/ is linked to cuda-10 as default.
$ vim ~/.bashrc
...
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
# Then save

Step 4: Create Anaconda environment

Let's create a new environment named torch-cuda9. The environment will be created in the installed path of Anaconda ~/anaconda3/envs/
$ conda create -n torch-cuda9 python==3.6
# After created ...
$ conda activate torch-cuda9
$ conda install pytorch torchvision cudatoolkit=9.0 -c pytorch

Step 5: Link CUDA To Anaconda Environment

Anaconda can run any bash scripts each time when the environment is activated. Such scripts should be placed in the following path:
/etc/conda/activate.d/
In our case the path is:
/home/chunming/anaconda3/envs/torch-cuda9/etc/conda/activate.d/
Here is how we can create an executable script that will be executed by Anaconda each time when the environment is activated by user:
$ mkdir -p ~/anaconda3/envs/torch-cuda9/etc/conda/activate.d
$ touch ~/anaconda3/envs/torch-cuda9/etc/conda/activate.d/activate.sh
$ vim ~/anaconda3/envs/torch-cuda9/etc/conda/activate.d/activate.sh
$ chmod +x ~/anaconda3/envs/torch-cuda9/etc/conda/activate.d/activate.sh
Then add these commands to the script.
#!/bin/sh
ORIGINAL_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/extras/CUPTI/lib64:/lib/nccl/cuda-9:$LD_LIBRARY_PATH
We are doing two things here:
  • saving original value of the LD_LIBRARY_PATH (we will need it later) 
  • updating LD_LIBRARY_PATH (so it includes reference to the cuda9.0) Let's re-activate our environment for running the script.
$ conda deactivate
$ conda activate torch-cuda9


Step 6: Check CUDA version

$ echo $LD_LIBRARY_PATH
/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/extras/CUPTI/lib64:/lib/nccl/cuda-9

$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.version.cuda
'9.0.176'

Step 7: Recover CUDA Link to User Environment

$ mkdir -p ~/anaconda3/envs/torch-cuda9/etc/conda/deactivate.d
$ touch ~/anaconda3/envs/torch-cuda9/etc/conda/deactivate.d/deactivate.sh
$ vim ~/anaconda3/envs/torch-cuda9/etc/conda/deactivate.d/deactivate.sh
$ chmod +x ~/anaconda3/envs/torch-cuda9/etc/conda/deactivate.d/deactivate.sh
Add these lines to the script
#!/bin/sh

export LD_LIBRARY_PATH=$ORIGINAL_LD_LIBRARY_PATH
unset ORIGINAL_LD_LIBRARY_PATH
Check LD_LIBRARY_PATH again.
$ conda deactivate
...
$echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64::/usr/local/cuda/extras/CUPTI/lib64
Now it's all done.

References:

Share:

0 意見:

張貼留言