【保姆级教程】如何在Win11上搭建一个GPU环境

GPU

unset

unset CUDA和CUDNN安装 unset

unset

CUDA安装

下载对应cuda环境

下载链接https://developer.nvidia.com/cuda-downloads,图片下载的是`cuda_12.6.1_560.94_windows.exe`

picture.image

CUDNN安装

打开cuDNN 下载页面 picture.image我们将文件夹覆盖到上面的CUDA安装目录下,比如我的CUDA的安装目录是C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6,将压缩包内对应的文件夹复制到bin、include、lib目录下即可

环境变量配置

然后添加环境变量,鼠标右键此电脑 => 属性 => 高级系统设置 => 环境变量,将CUDA的安装目录添加到CUDA_PATH变量中

picture.image然后在PATH中添加以下路径:


            
              
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin  
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\libnvvp  
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6  
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib\x64  

          

picture.image

unset

unset WSL以及Ubuntu子系统安装 unset

unset

WSL安装

下载并安装WLS2,

https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi

picture.image运行命令wsl --set-default-version 2设置版本为2

安装 Linux 发行版

默认情况下,安装的 Linux 分发版为 Ubuntu。可以使用 -d 标志进行更改。

若要更改安装的发行版,请输入:wsl --install -d <Distribution Name>。将替换为要安装的发行版的名称。若要查看可通过在线商店下载的可用 Linux 发行版列表,请输入:wsl --list --onlinewsl -l -o。若要在初始安装后安装其他 Linux 发行版,还可使用命令:wsl --install -d <Distribution Name>

picture.image本机选择的安装:wsl --install -d Ubuntu-20.04

picture.image

首次启动需要设置用户名,如下:picture.image

迁移系统盘

  • 导出系统

            
              
wsl --export Ubuntu-20.04 Ubuntu-20.04.tar  

          
  • 注销系统

            
              
wsl --unregister Ubuntu-20.04  

          
  • 导入系统

            
              
wsl --import Ubuntu-20.04 H:\Ubuntu_WSL Ubuntu-20.04.tar  

          
  • 设置默认用户

            
              
Ubuntu2004 config --default-user yanqiang  

          

unset

unset docker以及nvidia-docker2安装 unset

unset

docker安装

wsl进入系统picture.image配置apt源

sudo vi /etc/apt/sources.list

粘贴如下内容


            
              
deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse  
deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu focal stable  

          

更新apt


            
              
sudo apt-get update  
sudo apt-get upgrade  

          

安装依赖


            
              
sudo apt install apt-transport-https ca-certificates curl software-properties-common  

          

安装docker


            
              
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -  
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"  
sudo apt update  
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin  

          

添加用户组


            
              
sudo usermod -aG docker $USER  

          

重启docker


            
              
sudo systemctl start docker  
sudo service docker restart  
sudo systemctl restart docker && sudo systemctl enable docker  

          

安装nvidia-docker2

配置源


            
              
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -  
distribution=$(. /etc/os-release;echo $ID$VERSION\_ID)  
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \  
sudo tee /etc/apt/sources.list.d/nvidia-docker.list  

          

安装nvidia-docker2


            
              
sudo apt update  
sudo apt install -y nvidia-docker2  
sudo pkill -SIGHUP dockerd  

          

unset

unset 下载Pytorch镜像 unset

unset

配置docker源


            
              
{  
    "runtimes": {  
        "nvidia": {  
            "path": "nvidia-container-runtime",  
            "runtimeArgs": []  
        }  
    },  
    "registry-mirrors": [  
    "https://docker.registry.cyou",  
    "https://docker-cf.registry.cyou",  
    "https://dockercf.jsdelivr.fyi",  
    "https://docker.jsdelivr.fyi",  
    "https://dockertest.jsdelivr.fyi",  
    "https://mirror.aliyuncs.com",  
    "https://dockerproxy.com",  
    "https://mirror.baidubce.com",  
    "https://docker.m.daocloud.io",  
    "https://docker.nju.edu.cn",  
    "https://docker.mirrors.sjtug.sjtu.edu.cn",  
    "https://docker.mirrors.ustc.edu.cn",  
    "https://mirror.iscas.ac.cn",  
    "https://docker.rainbond.cc"  
  ],  
  "builder": {  
    "gc": {  
      "defaultKeepStorage": "20GB",  
      "enabled": true  
    }  
  },  
  "experimental": false  
}  
  

          

下载docker镜像

下载镜像,选择自己合适的版本


            
              
docker pull pytorch/pytorch:2.4.0-cuda11.8-cudnn9-runtime  

          

picture.image picture.image

启动容器


            
              
docker run -it --ipc=host --gpus all --name test  pytorch/pytorch:2.4.0-cuda11.8-cudnn9-runtime  

          

picture.image上面如果报错:

请输入:


            
              
# Checks if `cuda` is available via an `nvml-based` check which won't trigger the drivers and leave cuda uninitialized.  
CUDA_DEVICE_ORDER="PCI\_BUS\_ID" PYTORCH_NVML_BASED_CUDA_CHECK=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python -c "import torch;print(torch.cuda.is\_available());"  

          

os.environ[“CUDA_DEVICE_ORDER”] = “PCI_BUS_ID”# 按照PCI_BUS_ID顺序从0开始排列GPU设备

0
0
0
0
关于作者
关于作者

文章

0

获赞

0

收藏

0

相关资源
大规模高性能计算集群优化实践
随着机器学习的发展,数据量和训练模型都有越来越大的趋势,这对基础设施有了更高的要求,包括硬件、网络架构等。本次分享主要介绍火山引擎支撑大规模高性能计算集群的架构和优化实践。
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论