EasyLLM：简化语言模型处理，实现 OpenAI 和 Hugging Face 客户端的无缝切换 - 文章 - 开发者社区

picture.image 点击上方蓝字关注我们

前言

在这短短不到一年的时间里，国内外涌现的大型语言模型（LLM）可谓是百花齐放，不管是开源还是闭源都出现了一些非常优秀的模型，然而在利用LLM进行应用开发的时候，会发现每个模型从部署、到训练、微调、API接口开发、Prompt提示词格式等方面都存在或多或少的差异，导致如果一个产品需要接入不同的LLM或者快速切换模型的时候变得更加复杂，使用没有那么方便，也不便于维护。

首先，LLM的使用和部署过程相对复杂。不同的LLM提供商和框架之间存在着差异，导致用户需要进行繁琐的配置和适配工作。例如，使用OpenAI的Completion API、ChatCompletion、Completion和Embedding与使用Hugging Face的对应功能之间可能存在不兼容性，需要用户手动修改代码以适应不同的模型。

其次，LLM的提示格式也是一个问题。不同的LLM可能使用不同的提示格式，使得在不同模型之间切换时需要进行格式转换。这给用户带来了额外的工作量和学习成本。

此外，LLM的响应时间也是一个考虑因素。在某些场景下，特别是需要实时交互的情况下，等待整个LLM完成生成结果可能会导致延迟和不便。

为了解决以上存在的这些问题， EasyLLM 应运而生，可以帮我们很轻松的解决这些问题。

picture.image

一、EasyLLM 介绍

EasyLLM 是一个开源项目，旨在简化和提升处理LLM的过程。它提供了兼容的客户端，使用户能够

轻松地在不同的LLM之间切换，只需修改一行代码即可实现

。此外，EasyLLM还提供了一个提示助手，帮助用户在不同LLM的格式之间进行转换。而且，EasyLLM支持流式传输，用户可以立即获取部分生成结果，而无需等待整个响应。

EasyLLM第一个版本实现了与 OpenAI 的 Completion API 兼容的Client。这意味着您可以轻松地将

openai.ChatCompletion ,

openai.Completion

openai.Embedding

替换为

huggingface.ChatCompletion

huggingface.Completion

huggingface.Embedding

只需要通过更改一行代码即可替换。

通过EasyLLM，我们可以更加方便地利用和应用不同的LLM模型，提高工作效率和灵活性。接下来，让我们深入了解EasyLLM的主要特点和功能，以及它如何为我们带来更好的LLM体验。

picture.image

二、EasyLLM 特点

以下是当前 EasyLLM 的功能列表：

兼容的客户端

实现与 OpenAI 的 API、 ChatCompletion 、 Completion 和兼容的客户端 Embedding 。通过更改一行代码即可轻松在不同的LLM之间切换。

提示助手

帮助在不同 LLM 的格式之间转换提示的实用程序。例如，从 OpenAI 消息格式转到 LLaMA 等模型的提示。

流式传输支持

从您的 LLM 流式传输完成结果，而不是等待整个响应。非常适合聊天界面之类的东西。

截止到目前为止新的版本计划如下：

evol_instruct （正在进行中） - 是一种使用LLM创建指令的方法，可以将简单的指令演变成复杂的指令。
prompt_utils

帮助方法可以在 OpenAI Messages 等提示格式与 Llama 2 等开源模型的提示之间轻松转换。

sagemaker 客户端可轻松与 Amazon SageMaker 上部署的 LLM 交互

picture.image

三、EasyLLM 入门

通过 pip 安装 EasyLLM：


        
pip install easyllm

然后导入一个客户端并开始使用它：


        
from easyllm.clients import huggingface  
  
# D定义要使用的提示  
huggingface.prompt_builder = "llama2"  
# huggingface.api\_key="hf\_xxx" # change api key if needed  
  
response = huggingface.ChatCompletion.create(  
    model="meta-llama/Llama-2-70b-chat-hf",  
    messages=[  
        {"role": "system", "content": "\nYou are a helpful assistant speaking like a pirate. argh!"},  
        {"role": "user", "content": "What is the sun?"},  
    ],  
      temperature=0.9,  
      top_p=0.6,  
      max_tokens=256,  
)  
  
print(response)

输出结果：


        
{  
  "id": "hf-lVC2iTMkFJ",  
  "object": "chat.completion",  
  "created": 1690661144,  
  "model": "meta-llama/Llama-2-70b-chat-hf",  
  "choices": [  
    {  
      "index": 0,  
      "message": {  
        "role": "assistant",  
        "content": " Arrrr, the sun be a big ol' ball o' fire in the sky, me hearty! It be the source o' light and warmth for our fair planet, and it be a mighty powerful force, savvy? Without the sun, we'd be sailin' through the darkness, lost and cold, so let's give a hearty \"Yarrr!\" for the sun, me hearties! Arrrr!"  
      },  
      "finish\_reason": null  
    }  
  ],  
  "usage": {  
    "prompt\_tokens": 111,  
    "completion\_tokens": 299,  
    "total\_tokens": 410  
  }  
}

四、EasyLLM 客户端

在 EasyLLM 上下文中，Client 是指与特定 LLM API（例如 OpenAI）交互的代码。目前支持的客户端有：

ChatCompletion

ChatCompletion 用于与与 OpenAI ChatCompletion API 兼容的 LLM 进行交互。

Completion

用于与 OpenAI Completion API 兼容的LLM进行交互。

Embedding

用于与 OpenAI Embedding API 兼容的 LLM 进行交互。

picture.image 五、兼容 Hugging Face 客户端

EasyLLM 提供了一个与 Hugging Face 模型连接的客户端。该客户端与 Hugging Face Inference API、Hugging Face Inference Endpoints 或任何运行文本生成推理或兼容 API 端点的Web 服务兼容。

huggingface.ChatCompletion

用于与 HuggingFace 模型交互的客户端，该模型与 OpenAI ChatCompletion API 兼容。

huggingface.Completion

用于与与 OpenAI Completion API 兼容的 HuggingFace 模型连接的客户端。

huggingface.Embedding

用于与与 OpenAI Embedding API 兼容的 HuggingFace 模型连接的客户端。

5.1、huggingface.ChatCompletion

picture.image

该 huggingface.ChatCompletion

客户端用于与在文本生成推理上运行的 HuggingFace 模型交互，这些模型与 OpenAI ChatCompletion API 兼容。


        
from easyllm.clients import huggingface  
  
# hubbingface模块会自动从环境变量HUGGINGFACE\_TOKEN或HuggingFace CLI配置文件中加载HuggingFace API密钥。  
# huggingface.api\_key="hf\_xxx"  
hubbingface.prompt_builder = "llama2"  
  
response = huggingface.ChatCompletion.create(  
    model="meta-llama/Llama-2-70b-chat-hf",  
    messages=[  
        {"role": "system", "content": "\nYou are a helpful, respectful and honest assistant."},  
        {"role": "user", "content": "Knock knock."},  
    ],  
    temperature=0.9,  
    top_p=0.6,  
    max_tokens=1024,  
)

支持的参数有：

model

用于生成完成结果的模型。如果未提供，默认使用基本URL。

messages

List[ChatMessage] 用于生成完成结果的聊天消息列表。

temperature

用于生成完成结果的温度参数。默认为0.9。

top\_p

用于生成完成结果的top_p参数。默认为0.6。

top\_k

用于生成完成结果的top_k参数。默认为10。

要生成的完成结果数量。默认为1。

max\_tokens

要生成的最大令牌数。默认为1024。

stop

用于生成完成结果的停止序列。默认为None。

stream

是否流式传输完成结果。默认为False。

frequency\_penalty

用于生成完成结果的频率惩罚参数。默认为1.0。

debug

是否启用调试日志记录。默认为False。

5.2、huggingface.Completion

picture.image

该 huggingface.Completion

客户端用于与在文本生成推理上运行的 HuggingFace 模型进行交互，这些模型与 OpenAI Completion API 兼容。


        
from easyllm.clients import huggingface  
  
# hubbingface模块会自动从环境变量HUGGINGFACE\_TOKEN或HuggingFace CLI配置文件中加载HuggingFace API密钥。  
# huggingface.api\_key="hf\_xxx"  
hubbingface.prompt_builder = "llama2"  
  
response = huggingface.Completion.create(  
    model="meta-llama/Llama-2-70b-chat-hf",  
    prompt="What is the meaning of life?",  
    temperature=0.9,  
    top_p=0.6,  
    max_tokens=1024,  
)

支持的参数有：

model

用于生成完成结果的模型。如果未提供，默认使用基本URL。

prompt

用于完成的文本，如果设置了prompt_builder，则提示将使用prompt_builder进行格式化。

temperature

用于生成完成结果的温度参数。默认为0.9。

top\_p

用于生成完成结果的top_p参数。默认为0.6。

top\_k

用于生成完成结果的top_k参数。默认为10。

要生成的完成结果数量。默认为1。

max\_tokens

要生成的最大令牌数。默认为1024。

stop

用于生成完成结果的停止序列。默认为None。

stream

是否流式传输完成结果。默认为False。

frequency\_penalty

用于生成完成结果的频率惩罚参数。默认为1.0。

debug

是否启用调试日志记录。默认为False。

echo

是否回显提示。默认为 False。

logprobs

是否返回logprobs（对数概率）。默认为None。

5.3、huggingface.Embedding

picture.image

该 huggingface.Embedding 客户端用于与作为 API 运行的 HuggingFace 模型进行交互，这些模型与 OpenAI Embedding API 兼容。


        
from easyllm.clients import huggingface  
  
# hubbingface模块会自动从环境变量HUGGINGFACE\_TOKEN或HuggingFace CLI配置文件中加载HuggingFace API密钥。  
# huggingface.api\_key="hf\_xxx"  
  
embedding = huggingface.Embedding.create(  
    model="sentence-transformers/all-MiniLM-L6-v2",  
    text="What is the meaning of life?",  
)  
  
len(embedding["data"][0]["embedding"])

支持的参数有：

model

用于创建嵌入的模型。如果未提供，则默认为基本 url。

input

Union[str, List[str]] 要嵌入的文档。

5.4、环境配置

picture.image

可以通过设置 Hugging Face 环境变量或覆盖默认值来配置客户端。下面介绍如何调整 HF 令牌、URL 和提示生成器。

5.4.1、设置HF令牌

默认情况下， huggingface 客户端将尝试读取 HUGGINGFACE_TOKEN 环境变量。如果未设置，它将尝试从 ~/.huggingface 文件夹中读取令牌。如果未设置，则不会使用令牌。

或者，您可以通过设置手动设置令牌 huggingface.api_key 。

手动设置 api 密钥：


        
from easyllm.clients import huggingface  
  
huggingface.api_key="hf\_xxx"  
  
res = huggingface.ChatCompletion.create(...)

使用环境变量：


        
import os  
os.environ["HUGGINGFACE\_TOKEN"] = "hf\_xxx"  
  
from easyllm.clients import huggingface

5.4.2、更改URL地址

默认情况下， Hugging Face 客户端会尝试读取 HUGGINGFACE_API_BASE 环境变量。如果未设置该变量，它将使用默认的URL地址：

https://api-inference.huggingface.co/models

这对于想要使用不同的URL地址（如https://zj5lt7pmzqzbp0d1.us-east-1.aws.endpoints.huggingface.cloud）或本地URL地址（如http://localhost:8000）或Hugging Face推理端点非常有用。

另外，您可以通过设置 huggingface.api_base 来手动设置URL地址。如果您设置了自定义URL地址，则必须将 model 参数留空。

手动设置 api base：


        
from easyllm.clients import huggingface  
  
huggingface.api_base="https://my-url"  
  
  
res = huggingface.ChatCompletion.create(...)

使用环境变量：


        
import os  
os.environ["HUGGINGFACE\_API\_BASE"] = "https://my-url"  
  
from easyllm.clients import huggingface

5.4.3、构建提示

默认情况下， huggingface 客户端将尝试读取 HUGGINGFACE_PROMPT 环境变量并尝试将值映射到 PROMPT_MAPPING 字典。如果未设置，它将使用默认的提示生成器。您也可以手动设置。

手动设置提示生成器：


        
from easyllm.clients import huggingface  
  
huggingface.prompt_builder = "llama2"  
  
res = huggingface.ChatCompletion.create(...)

使用环境变量：


        
import os  
os.environ["HUGGINGFACE\_PROMPT"] = "llama2"  
  
from easyllm.clients import huggingface

picture.image 六、从 OpenAI 迁移到 HuggingFace

从 OpenAI 迁移到 HuggingFace 很容易。只需更改导入语句和要使用的客户端以及可选的提示生成器。


        
- import openai  
+ from easyllm.clients import huggingface  
+ huggingface.prompt_builder = "llama2"  
  
  
- response = openai.ChatCompletion.create(  
+ response = huggingface.ChatCompletion.create(  
-    model="gpt-3.5-turbo",  
+    model="meta-llama/Llama-2-70b-chat-hf",  
    messages=[  
        {"role": "system", "content": "You are a helpful assistant."},  
        {"role": "user", "content": "Knock knock."},  
    ],  
)

在切换使用不同的客户端（指使用不同的模型或系统）时，确保你的超参数仍然有效。例如，GPT-3模型的temperature参数可能与Llama-2模型的temperature参数不同。

超参数是在机器学习和深度学习中用于调整模型行为和性能的参数。其中一个常见的超参数是温度（temperature），它控制生成文本的多样性和随机性。不同的模型可能对温度参数有不同的要求或默认值，因此在切换使用不同的模型时，需要注意确保超参数的设置与所使用的模型相匹配，以获得预期的结果。

picture.image

七、提示工具

prompt_utils 模块包含了一些函数，用于将消息字典转换为可以与ChatCompletion客户端一起使用的提示。

目前支持的提示格式有：

Llama 2
Vicuna
Hugging Face ChatML
WizardLM
StableBeluga2
Open Assistant

Prompt utils 还导出了一个映射字典 PROMPT_MAPPING ，它将模型名称映射到一个提示构建函数。可以通过环境变量来选择正确的提示构建函数。


        
PROMPT_MAPPING = {  
    "chatml\_falcon": build_chatml_falcon_prompt,  
    "chatml\_starchat": build_chatml_starchat_prompt,  
    "llama2": build_llama2_prompt,  
    "open\_assistant": build_open_assistant_prompt,  
    "stablebeluga": build_stablebeluga_prompt,  
    "vicuna": build_vicuna_prompt,  
    "wizardlm": build_wizardlm_prompt,  
}

以下代码演示了为 Hugging Face 客户端设置提示构建器


        
from easyllm.clients import huggingface  
  
# vicuna, chatml\_falcon, chatml\_starchat, wizardlm, stablebeluga, open\_assistant  
huggingface.prompt_builder = "llama2"

7.1、LLama 2 Chat构建器

picture.image

用于创建LLama 2聊天对话的提示。在Hugging Face博客中可以了解如何使用LLama 2的提示。如果传递了一个不支持的角色的消息，将会抛出错误。

示例模型：

meta-llama/Llama-2-70b-chat-hf


        
from easyllm.prompt_utils import build_llama2_prompt  
  
messages=[  
    {"role": "system", "content": "You are a helpful assistant."},  
    {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},  
]  
prompt = build_llama2_prompt(messages)

7.2、Vicuna Chat构建器

picture.image

用于创建Vicuna聊天对话的提示。如果传递了一个不支持的角色的消息，将会抛出错误。

示例模型：

ehartford/WizardLM-13B-V1.0-Uncensored


        
from easyllm.prompt_utils import build_vicuna_prompt  
  
messages=[  
    {"role": "system", "content": "You are a helpful assistant."},  
    {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},  
]  
prompt = build_vicuna_prompt(messages)

7.3、Hugging Face ChatML构建器

picture.image

用于创建 Hugging Face ChatML 聊天对话的提示。Hugging Face ChatML针对不同的示例模型有不同的提示，例如 StarChat 或 Falcon 。如果传递了一个不支持的角色的消息，将会抛出错误。

示例模型：

HuggingFaceH4/starchat-beta

7.3.1、StarChat


        
from easyllm.prompt_utils import build_chatml_starchat_prompt  
  
messages=[  
    {"role": "system", "content": "You are a helpful assistant."},  
    {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},  
]  
prompt = build_chatml_starchat_prompt(messages)

7.3.2、Falcon


        
from easyllm.prompt_utils import build_chatml_falcon_prompt  
  
messages=[  
    {"role": "system", "content": "You are a helpful assistant."},  
    {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},  
]  
prompt = build_chatml_falcon_prompt(messages)

7.4、WizardLM Chat构建器

picture.image

用于创建WizardLM聊天对话的提示。如果传递了一个不支持的角色的消息，将会抛出错误。

示例模型：

WizardLM/WizardLM-13B-V1.2


        
from easyllm.prompt_utils import build_wizardlm_prompt  
  
messages=[  
    {"role": "system", "content": "You are a helpful assistant."},  
    {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},  
]  
prompt = build_wizardlm_prompt(messages)

7.5、StableBeluga2 Chat构建器

picture.image

用于创建StableBeluga2聊天对话的提示。如果传递了一个不支持的角色的消息，将会抛出错误。


        
from easyllm.prompt_utils import build_stablebeluga_prompt  
  
messages=[  
    {"role": "system", "content": "You are a helpful assistant."},  
    {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},  
]  
prompt = build_stablebeluga_prompt(messages)

7.6、Open Assistant Chat构建器

picture.image

示例模型：

OpenAssistant/llama2-13b-orca-8k-3319


        
from easyllm.prompt_utils import build_open_assistant_prompt  
  
messages=[  
    {"role": "system", "content": "You are a helpful assistant."},  
    {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},  
]  
prompt = build_open_assistant_prompt(messages)

picture.image 八、应用案例

以下是一些帮助您开始使用 EasyLLM 库的示例：

| 例子 | 描述 | | 详细的聊天完成示例https://philschmid.github.io/easyllm/examples/chat-completion-api/ | 演示如何使用 ChatCompletion API 与模型进行对话式聊天 | | 如何流式传输聊天请求的示例https://philschmid.github.io/easyllm/examples/stream-chat-completions/ | 演示流式传输多个聊天请求以与模型高效聊天。 | | 如何传输文本请求的示例https://philschmid.github.io/easyllm/examples/stream-text-completions/ | 演示如何流式传输多个文本完成请求。 | | 详细完成示例https://philschmid.github.io/easyllm/examples/text-completion-api/ | 使用 TextCompletion API 通过模型生成文本。 | | 创建嵌入https://philschmid.github.io/easyllm/examples/get-embeddings/ | 使用模型将文本嵌入到矢量表示中。 | | 拥抱脸部推理端点示例https://philschmid.github.io/easyllm/examples/inference-endpoints-example/ | 有关如何使用自定义端点（例如推理端点或本地主机）的示例 | | 使用 Llama 2 检索增强生成https://philschmid.github.io/easyllm/examples/llama2-rag-example/ | 有关如何使用 Llama 2 70B 进行上下文检索增强的示例 | | Llama 2 70B 代理/工具使用示例https://philschmid.github.io/easyllm/examples/llama2-agent-example/ | 如何使用 Llama 2 70B 与工具交互并可用作代理的示例 |

这些示例涵盖了EasyLLM的主要功能 - 聊天、文本完成和嵌入。

picture.image 九、Referencs