# DGX-1 Box Introduction

I have a new toy, this is the DGX-1 box that I am going to use today.

It has no connection to Internet, therefore I have to come with some challenges to run what I want on it.

I have no admin access, therefore I limited only by docker containers or native programs for Linux.

Let's start from detecting the version of the DGX-1 and GPU cards on this box.

After running:

nvidia-smi

I have got this output:

Wed Dec 11 13:14:55 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145                Driver Version: 384.145                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   38C    P0    44W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   39C    P0    43W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
| N/A   39C    P0    42W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
| N/A   38C    P0    43W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
| N/A   40C    P0    40W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   40C    P0    45W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   43C    P0    45W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   40C    P0    47W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

It looks like the installed version of Nvidia drivers is 384.145, that corresponds to CUDA version 9.0 through this table:

CUDA 10.2 (10.2.89) 	>= 440.33
CUDA 10.1 (10.1.105) 	>= 418.39
CUDA 10.0 (10.0.130) 	>= 410.48
CUDA 9.2 (9.2.88) 	>= 396.26
CUDA 9.1 (9.1.85) 	>= 390.46
CUDA 9.0 (9.0.76) 	>= 384.81
CUDA 8.0 (8.0.61 GA2) 	>= 375.26
CUDA 8.0 (8.0.44) 	>= 367.48
CUDA 7.5 (7.5.16) 	>= 352.31
CUDA 7.0 (7.0.28) 	>= 346.46

That is good, I know the version and I can try to run this docker container:

docker run --runtime nvidia nvidia/cuda:9.0-base nvidia-smi

I am getting an error:

Unable to find image 'nvidia/cuda:9.0-base' locally
docker: Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp XX.XX.XX.XX:443: getsockopt: connection refused.
See 'docker run --help'.

This box is not connected to Internet, therefore I need to download image on another machine, save it, copy and load to DGX-1.

After running the same command on another machine, image was downloaded and stored.

docker image ls
REPOSITORY                                 TAG                      IMAGE ID            CREATED             SIZE
nvidia/cuda                                9.0-base                 3c57055e68a2        2 weeks ago         140MB

Now, I am ready to dump the image and compress on the fly:

docker save nvidia/cuda:9.0-base | gzip > nvidia_cuda_9_0_base.tar.gz

Here it is

50269763 Dec 11 14:00 nvidia_cuda_9_0_base.tar.gz

This image is ready to upload to DGX-1 machine. And that I did.

After copying the image, let's decompress it and load to the box.

gzip -d nvidia_cuda_9_0_base.tar.gz
docker load -i nvidia_cuda_9_0_base.tar

Let's run docker container and see if all gpus are available to use:

docker run --runtime nvidia nvidia/cuda:9.0-base nvidia-smi

Bingo!

Wed Dec 11 22:19:24 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145                Driver Version: 384.145                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   38C    P0    43W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   38C    P0    43W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
| N/A   39C    P0    42W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
| N/A   37C    P0    43W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
| N/A   39C    P0    40W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   39C    P0    45W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   42C    P0    45W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   39C    P0    47W / 300W |     10MiB / 32502MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Going to next step, in the next article I will try to create docker image with tensorflow, keras, theano and other libs ready for AI trainig.