vLLM 部署 Qwen3

人工智能

参考链接:https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#pre-built-wheels

环境

CUDA:12.2

显存:40GB

Python 包管理:conda

LLM:Qwen3-8B

安装 vLLM

1)创建 conda 环境

# 创建 conda 虚拟环境,环境名称为 vllm,python 的版本为 3.10
conda create -n vllm python=3.10

2)切换 vllm 环境

conda activate vllm

3)安装 vllm

pip install -U vllm \
    --pre \
    --extra-index-url https://wheels.vllm.ai/nightly
开启 API 服务

参考链接:https://qwen.readthedocs.io/zh-cn/latest/deployment/vllm.html#

vllm serve Qwen/Qwen3-8B
对话

curl

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "现在你的身份是刘备,而我是关羽,请在这个背景下完成对话。大哥,我等何日光复大汉"}
  ],
  "temperature": 0.6,
  "top_p": 0.95,
  "top_k": 20,
  "max_tokens": 32768
}'

python

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="Qwen/Qwen3-8B",
    messages=[
        {"role": "user", "content": "现在你的身份是刘备,而我是关羽,请在这个背景下完成对话。大哥,我等何日光复大汉"},
    ],
    max_tokens=32768,
    temperature=0.6,
    top_p=0.95,
    extra_body={
        "top_k": 20,
    },
)
print("Chat response:", chat_response)

0
0
0
0
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论