新睿云

> 知识库 > gpu云服务器使用Docker部署深度学习环境(中)

gpu云服务器使用Docker部署深度学习环境(中)

作者/来源:新睿云小编 发布时间:2020-02-21

、安装nvidia-docker:

本文是系列文章

上篇请查看《gpu云服务器使用Docker部署深度学习环境(上)

中篇请查看《gpu云服务器使用Docker部署深度学习环境(中)

下篇请查看《gpu云服务器使用Docker部署深度学习环境(下)

单独安装Docker之后还无法使用带GPU的深度学习机器,需要再安装一下英伟达出品的Nvidia-docker。

1.安装:

# Add the package repositories

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

$ sudo systemctl restart docker

2.测试:

现在可以测试了,以下是在一台4卡1080TI机器上的测试结果,宿主机CUDA版本为9.2:

docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

第一次运行的时候结果大致如下,需要从官方镜像拉取:

 测试

测试

完整用法可以参考nvidia-docker官方github:

#### Test nvidia-smi with the latest official CUDA image

$ docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

 

# Start a GPU enabled container on two GPUs

$ docker run --gpus 2 nvidia/cuda:9.0-base nvidia-smi

 

# Starting a GPU enabled container on specific GPUs

$ docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi

$ docker run --gpus '"device=UUID-ABCDEF,1'" nvidia/cuda:9.0-base nvidia-smi

 

# Specifying a capability (graphics, compute, ...) for my container

# Note this is rarely if ever used this way

$ docker run --gpus all,capabilities=utility nvidia/cuda:9.0-base nvidia-smi

二、测试Tensorflow Docker镜像

几乎所有的深度学习框架都提供了Docker镜像,这里以Tensorflow Docker镜像为例,来玩一下 Tensorflow,Tensorflow Docker的版本是通过Docker Tag区分的:

Tensorflow Docker 

Tensorflow Docker

1.默认拉取的Tensorflow镜像是CPU版本:

docker pull tensorflow/tensorflow

运行bash:

docker run -it tensorflow/tensorflow bash

 Tensorflow

Tensorflow

现在可以在Tensorflow Docker下执行python,可以看得出这是CPU版本的Tensorflow:

root@ed70090804e5:/# python

Python 2.7.15+ (default, Nov 27 2018, 23:36:35) [GCC 7.3.0] on linux2

Type "help", "copyright", "credits" or "license" for more information.>>> import tensorflow as tf>>> tf.__version__'1.14.0'>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))2019-08-15 09:32:38.522061: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA2019-08-15 09:32:38.549054: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3598085000 Hz2019-08-15 09:32:38.550255: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556fa8e5dd60 executing computations on platform Host. Devices:2019-08-15 09:32:38.550282: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>

Device mapping:

/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device2019-08-15 09:32:38.551424: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping:

/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

2.获取GPU版本的Tensorflow-devel镜像:

docker pull tensorflow/tensorflow:devel-gpu

运行:

docker run --gpus all -it tensorflow/tensorflow:devel-gpu

这里有报错:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused /"process_linux.go:413: running prestart hook 0 caused ///"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=410,driver<411 --pid=9821 /home/dlfour/dockerdata/docker/overlay2/814a65fbfc919d68014a6af2c3d532216a5a622e5540a77967719fd0c695a0cb/merged]////nnvidia-container-cli: requirement error: unsatisfied condition: brand = tesla////n///"/"": unknown.

仔细看了一下,最新版的Tensorflow GPU Docker 容器需要的是CUDA>=10.0,这台机器是9.2,并不符合,两种解决方案,一种是升级CUDA到10.x版本,但是我暂时不想升级,google了一下,发现这个tag版本可用cuda9.x:1.12.0-gpu ,所以重新拉取Tensorflow相应版本的镜像:

docker pull tensorflow/tensorflow:1.12.0-gpu

运行:

docker run --gpus all -it tensorflow/tensorflow:1.12.0-gpu bash

没有什么问题:

root@c5c2a21f1466:/notebooks# ls1_hello_tensorflow.ipynb  2_getting_started.ipynb  3_mnist_from_scratch.ipynb  BUILD  LICENSE

root@c5c2a21f1466:/notebooks# python

Python 2.7.12 (default, Dec  4 2017, 14:50:18) [GCC 5.4.0 20160609] on linux2

Type "help", "copyright", "credits" or "license" for more information.>>> import tensorflow as tf>>> tf.__version__'1.12.0'>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))2019-08-15 09:52:29.179406: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA2019-08-15 09:52:29.376334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:

name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582

pciBusID: 0000:05:00.0

totalMemory: 10.92GiB freeMemory: 6.20GiB2019-08-15 09:52:29.527131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:

name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582

pciBusID: 0000:06:00.0

totalMemory: 10.92GiB freeMemory: 6.68GiB2019-08-15 09:52:29.702166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties:

name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582

pciBusID: 0000:09:00.0

totalMemory: 10.92GiB freeMemory: 6.67GiB2019-08-15 09:52:29.846647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties:

name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582

pciBusID: 0000:0a:00.0

totalMemory: 10.92GiB freeMemory: 6.68GiB2019-08-15 09:52:29.855818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 32019-08-15 09:52:30.815240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:2019-08-15 09:52:30.815280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3

感兴趣的同学还可以尝试Jupyte Notebook版本的镜像,通过端口映射运行,然后就可以通过浏览器测试和学习了。

new year
在线客服   
{{item.description}}

—您的烦恼我们已经收到—

我们会将处理结果发送至您的手机

请耐心等待