如何在Docker容器中使用GPU资源

容器与中间件容器服务技术服务知识库
问题描述

在安装了 Nvidia 驱动和 docker 的主机上直接启动容器报错提示如下信息:

docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
问题分析

需要安装 nvidia-docker2 或 nvidia-container-runtime 插件驱动,以便 docker 容器能够使用 Nvidia 驱动。

问题解决

一、安装nvidia-docker2

  1. 设置仓库和 GPGkey
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
  1. 刷新缓存
yum clean expire-cache
  1. 安装 nvidia-docker2
yum install -y nvidia-docker2
  1. 查看 daemon.json 文件

安装完成会自动创建 daemon.json 文件,并且已经存在的 daemon.json 会被覆盖。

cat /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
  1. 重启 dokcer
systemctl restart docker
  1. 验证
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
Fri Dec 10 02:06:20 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:04:01.0 Off |                    0 |
| N/A   36C    P0    15W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

二、nvidia-container-runtime

  1. 设置仓库和GPG key
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
  1. 清理缓存
yum clean expire-cache
  1. 安装 nvidia-container-runtime
yum install nvidia-container-runtime
  1. 重启 docker
systemctl restart docker
  1. 验证
[root@iv-b5oz3v8bkbfse8ti19d9 ~]# docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
Fri Dec 10 02:29:48 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:04:01.0 Off |                    0 |
| N/A   51C    P0    18W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
参考文档

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-centos-7-8

114
0
0
0
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论