# DGX-1 Box Introduction
I have a new toy, this is the DGX-1 box that I am going to use today.
It has no connection to Internet, therefore I have to come with some challenges to run what I want on it.
I have no admin access, therefore I limited only by docker containers or native programs for Linux.
Let's start from detecting the version of the DGX-1 and GPU cards on this box.
After running:
nvidia-smi
I have got this output:
Wed Dec 11 13:14:55 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145 Driver Version: 384.145 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 38C P0 44W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 |
| N/A 39C P0 43W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 |
| N/A 39C P0 42W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 |
| N/A 38C P0 43W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |
| N/A 40C P0 40W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |
| N/A 40C P0 45W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |
| N/A 43C P0 45W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 40C P0 47W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
It looks like the installed version of Nvidia drivers is 384.145, that corresponds to CUDA version 9.0 through this table:
CUDA 10.2 (10.2.89) >= 440.33
CUDA 10.1 (10.1.105) >= 418.39
CUDA 10.0 (10.0.130) >= 410.48
CUDA 9.2 (9.2.88) >= 396.26
CUDA 9.1 (9.1.85) >= 390.46
CUDA 9.0 (9.0.76) >= 384.81
CUDA 8.0 (8.0.61 GA2) >= 375.26
CUDA 8.0 (8.0.44) >= 367.48
CUDA 7.5 (7.5.16) >= 352.31
CUDA 7.0 (7.0.28) >= 346.46
That is good, I know the version and I can try to run this docker container:
docker run --runtime nvidia nvidia/cuda:9.0-base nvidia-smi
I am getting an error:
Unable to find image 'nvidia/cuda:9.0-base' locally
docker: Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp XX.XX.XX.XX:443: getsockopt: connection refused.
See 'docker run --help'.
This box is not connected to Internet, therefore I need to download image on another machine, save it, copy and load to DGX-1.
After running the same command on another machine, image was downloaded and stored.
docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/cuda 9.0-base 3c57055e68a2 2 weeks ago 140MB
Now, I am ready to dump the image and compress on the fly:
docker save nvidia/cuda:9.0-base | gzip > nvidia_cuda_9_0_base.tar.gz
Here it is
50269763 Dec 11 14:00 nvidia_cuda_9_0_base.tar.gz
This image is ready to upload to DGX-1 machine. And that I did.
After copying the image, let's decompress it and load to the box.
gzip -d nvidia_cuda_9_0_base.tar.gz
docker load -i nvidia_cuda_9_0_base.tar
Let's run docker container and see if all gpus are available to use:
docker run --runtime nvidia nvidia/cuda:9.0-base nvidia-smi
Bingo!
Wed Dec 11 22:19:24 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145 Driver Version: 384.145 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 38C P0 43W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 |
| N/A 38C P0 43W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 |
| N/A 39C P0 42W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 |
| N/A 37C P0 43W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |
| N/A 39C P0 40W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |
| N/A 39C P0 45W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |
| N/A 42C P0 45W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 39C P0 47W / 300W | 10MiB / 32502MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Going to next step, in the next article I will try to create docker image with tensorflow, keras, theano and other libs ready for AI trainig.