TensorFlow + Jupyter + NVidia GPU + Docker + Anaconda + Google Cloud Platform


credit to Allen Day in medium

Contents

  1. Sign up on Google Cloud Platform Free Tier
  2. Create virtual machine instance
  3. SSH in browser and terminal
  4. Install NVIDIA GPU driver and toolkit
  5. Docker and Nvidia-docker
  6. Install Docker and Nvidia-docker
  7. Install Anaconda
  8. Install additional packages

1. Sign up on Google Cloud Platform Free Tier

Click here to sign up for 12 months and $300 free credit to get you started. Always Free products to keep you going.

2. Create ports in firewall


There two options to configure firewall for Jupyter notebook and Tensorboard.
  • Option 1: VPC Network → Firewall rules


  • Option 2: Create ports using command-line
# jupyter
gcloud compute firewall-rules create jupyter --allow tcp:8888-8889 --source-ranges 0.0.0.0/0

# tensorboard
gcloud compute firewall-rules create tensorboard --allow tcp:6006 --source-ranges 0.0.0.0/0

3. Create virtual machine instance.

Follow @howkhang's instruction or @Allen Day's instruction to upgrade to paid account and create virtual machine instance. List the specification here.
  • Request for increase in quota for GPU
    • IAM & Admin → Quotas:
      • Region: choose a zone with NVIDIA K80 GPU and Intel Broadwell CPU.
      • Select NVIDIA K80 GPUs (without “preemptible”) → Edit Quotas → Change to “1”Submit Request.
  • Receive email approval of quota increase
  • Create your virtual machine instance
    • Compute Engine → VM instances → Create
      • Cores: 4 vCPU  
      • Memory: 26 GB  
      • CPU platform: Intel Broadwell or later 
      • GPUs: 1 with NVIDIA Tesla K80 
      • Boot disk: Ubuntu 16.04 LTS, 250 GB (SSD charges extra money), I used 250 GB
      • Firewall: Check 'Allow HTTP/HTTPS traffic'
      • NetworkingNetwork tags: jupyter, tensorboard
    • (Optional) Convert the IP address to static
      • VPC network → External IP addresses: Convert IP address to “Static” and give it a name. (Static IP charges US$0.01/hour at time of writing).
Estimated cost

3. SSH in browser and terminal

There are a few of methods to connect to your instance. See Google official document here.
  • Option 1: Connect in browser window
  • Option 2: Connect in terminal
    1. Download and install Google Cloud SDK.
    2. Run gcloud init“ to initialize and link to you account.
    3. Run “gcloud compute ssh <your instance name>”. <your instance name> is as same as the image shown in Option 1.
    4. (Option) to switch user, run command “gcloud compute ssh <your user name>@<your instance name>
    5. SSH shall be connected successfully:

4. Install NVIDIA GPU driver and toolkit

I have written another tutorial to install NVIDIA driver, CUDA tookit and cuDNN library. Click and read this 

→ ’Upgrade to The Newest Version of NVIDIA driver, CUDA and CuDNN libraries

Follow the steps to install NVIDIA driver, CUDA toolkit and cuDNN library.

If CUDA install is successful, running this command will display a table describing an available Tesla K80 GPU.
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    75W / 149W |      0MiB / 11441MiB |     89%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+


◎Set environment variables

Add the environment to .bashrc under home directory.
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

5. Install Docker and NVIDIA-docker with TensorFlow container


◎Install Docker and  NVIDIA-docker

sudo apt-get -y install \
apt-transport-https ca-certificates curl software-properties-common

#### Install Docker
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update && sudo apt-get install -y docker-ce

#### Install Nvidia Docker
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

◎Make sure docker container can see the GPU

sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    71W / 149W |      0MiB / 11441MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

◎Launch TensorFlow environment with Jupyter and Tensorboard

You can replace tensorflow/tensorflow:latest-gpu with a specific built container.
#### [1] This will start container automatically
nvidia-docker run -dit --restart unless-stopped -p 8888:8888 -p 6006:6006 --name tensorflow tensorflow/tensorflow:latest-gpu jupyter notebook --allow-root

#### [2] This will start container manually
nvidia-docker run -it -p 8888:8888 -p 6006:6006 --name tensorflow tensorflow/tensorflow:latest-gpu jupyter notebook --allow-root

◎More containers

See all available tags for additional containers, such as release candidates or nightly builds.

◎List containers

$ nvidia-docker ps -a
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
cea9902468a5        tensorflow_gpu_jupyter   "/run_jupyter.sh --a…"   8 seconds ago       Up 5 seconds        0.0.0.0:6006->6006/tcp, 0.0.0.0:8888->8888/tcp   tensorflow-py3

◎Launch a container

$ nvidia-docker start -ai <CONTAINER ID|NAME>

◎Stop a container

$ nvidia-docker stop <CONTAINER ID|NAME>

◎Delete a container

$ nvidia-docker rm <CONTAINER ID|NAME>
or
$ nvidia-docker rmi <IMAGE NAME>

◎More Docker commands

See all base commands for Docker

6. Install Anaconda

See official guide to install TensorFlow using Anaconda.
CONDA_INSTALL="Anaconda3-5.1.0-Linux-x86_64.sh"
wget https://repo.anaconda.com/archive/${CONDA_INSTALL} 
chmod +x ${CONDA_INSTALL} 
./${CONDA_INSTALL} 
Note that we say yes here to add install location to .bashrc
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/chunming/.bashrc ? [yes|no]
[no] >>> yes
...

### Remember to run this.
$ source ~/.bashrc

◎Create an environment

conda create -n tensorflow-py3 pip python=3.6
source activate tensorflow-py3

◎Install TensorFlow

Option 1: Recommended
Choose a TensorFlow python package at here. Replace the TF_URL with the URL you picked.
TF_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.9.0-cp36-cp36m-linux_x86_64.whl
pip install --ignore-installed --upgrade $TF_URL

Option 2: Use Conda
conda install -c conda-forge tensorflow-gpu

Option 3: Use pip
pip install tensorflow-gpu

7. Install additional packages

To install packages, you can run pip install since you've activated the environment.
pip install matplotlib opencv-python scikit-image PILLOW sklearn keras 

8. Error Message

You may encounter an error like Importerror libcublas.so.9.0 cannot open shared object file no such file or directory or ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory during the installation.
Follow Changjiang's instruction to fix the problem.

◎Uninstall old version of CUDA Toolkit

Assume that we have older version of CUDA and cuDNN 6.
sudo apt-get purge cuda
sudo apt-get purge libcudnn6
sudo apt-get purge libcudnn6-dev

After uninstallation, repeat the steps of CUDA and cuDNN installation.

◎Add environment variables

Set up the development environment by modifying the PATH and LD_LIBRARY_PATH variables, also add them to the end of .bashrc file
export PATH=/usr/local/cuda-9.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH

◎Reboot the system to load the NVIDIA drivers.

I encountered an error ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory since upgraded to cuda 10 and cudnn 7.5.  Follow fabricatedmath's instruction to fix the problem
conda install cudatoolkit
conda install cudnn

9. Test

See how to use GPU in TensorFlow official guide
import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

**IMPORTANT: Remember to shutdown your VM instance when you're done or you will incur charges.**


References






Share:

0 意見:

張貼留言