Evol-Instruct 竟能精准生成领域专属数据？实操技巧速看！ - 文章 - 开发者社区

在不断发展的人工智能领域，能够对模型进行微调以使其理解并适应特定领域 至关重要。这一过程类似于音乐家在表演前调校乐器；调校得越精准，在特定的声学环境中表现就越出色。在这里，我们的“声学环境”就是希望人工智能在其中表现出色的特定领域——无论是医学、金融还是客户服务。

能够在学校、工作场所甚至在家拥有自己的本地大语言模型，以满足需要上下文的任务需求，这将带来巨大的变革。本博客将探讨如何利用一种名为“Evol-Instruct ”的技术相对轻松地为大语言模型创建特定领域的数据集。我们还将为您提供设置本地大语言模型的指南以及推荐使用的模型！

Why Domain-Specific Models?

通用人工智能模型是全能选手。它们能较好地处理各种各样的任务，但在专业任务方面往往缺乏深度知识。你不能指望 Chat-GPT 在没有提供足够背景信息的情况下，为你详细解答有关大学数据库或个人储蓄的问题，这可能会很麻烦。这就是领域特定模型发挥作用的地方。它们能提供更准确、可靠且符合上下文的回答，这对于需要高度专业化的任务来说至关重要。

Introducing Evol-Instruct: The Technique to Refine AI Prompts

为了达到这种高度的专业化水平，我们采用了名为“Evol-Instruct”的技术。这种方法不仅仅是用大量数据来训练人工智能，而是旨在让其具备思考和理解某一领域深度与广度的能力 。

The Mechanics of Evol-Instruct

Evol-Instruct 与 AI 提示工程的关系，就如同详细的食谱开发与烹饪艺术的关系。它涉及将一个简单明了的提示转化为一个能促使 AI 考虑更深层次含义、复杂情境和细微问题的提示 。

How Evol-Instruct Works:

1. Start with a Basic Prompt: 这类似于一份 基本的食谱 ，它概述了菜品的大致做法，但并未深入探讨达到美食效果所需的细节。对我们而言，这可能包括一个 简单的问题 ，即询问我们的研究领域是什么，并探讨其中的一些广泛主题。
1. Transform Through Evol-Instruct: 在这里， 基本的提示被转化为一个更为复杂且详尽的问题 ，这就好比通过特定的技巧和独特的配料来改进一道菜肴，以使其更加美味。这一过程是借助后续将要讨论的提示工程来完成的。
1. Output a Refined Prompt: 其结果是一种 复杂的提示方式 ，它促使人工智能生成的内容 具有深度的洞察力 ，并且 与特定领域高度相关 。

picture.image

Image from the WizardLM Paper

那么，Evol-Instruct 实际上是如何让一个提示变得更“复杂”的 呢？我们主要采用了以下方法：

In-Breadth Evolving : 人工智能被鼓励扩大问题的范围 ，增加问题的复杂性 。这通过一些例子得以体现，比如原本关于光合作用或光速的简单问题，会转变成涉及更广泛方面或更多变量的问题。

Deepening : 这条路径旨在让人工智能更深入地探究相关主题 。问题变得更加具体 和复杂，旨在寻求更深层次的理解和分析 ，比如关于真空中光速的问题就是如此。

Concretizing ：人工智能会从一个一般性的概念出发，将其细化到特定的背景或实例中 。这通过一些问题得以体现，这些问题会以一个如哥德巴赫猜想这样的宽泛概念为基础，然后要求给出具体的证明或实例。

Increasing Reasoning : 这促使人工智能提出需要逻辑思维或复杂问题解决能力的问题 。所提供的示例促使人工智能生成涉及数学推理或假设情境的问题。

Implementation

对于微调而言，最推荐使用的模型是 Meta 开源的 LLaMA 模型。您需要获得该模型的访问权限，请访问其网站：https://llama.meta.com/llama2/ 以获取更多详情！在我们的博客中，我们将进一步拓展这一技术的应用范围，并为您提供一份大致的指南，帮助您生成任何特定领域的数据！

我们主要研究了其他开源的大规模语言模型，同时考虑到普通个人的计算资源情况，而且也是为了好玩！

Models We Don’t Recommend

我们最初选择 GPT-Neo 是因为它易于使用，并且在各种任务中表现稳定。然而，我们的实验表明，尽管 GPT-Neo 在回答问题方面表现出色，但它无法始终如一地对问题进行重新表述。其输出往往会扩展原始提示的内容，提供背景信息和额外信息，但并非以我们期望的那样以结构化的方式实现提示的演变。

以下是对 GPT-Neo 对我们提示内容处理情况的简要概述：


 
 
 
 
   
Base Prompt: Discuss the implications of using AI in medical diagnostics.  
  
Task: To Deepen  
  
Evolved Prompt: This is a question about the medical diagnosis process, specifically how computer algorithms are used in the diagnosis of medical conditions. Answering this question requires understanding the process. As such, it is not a good fit for a test/tutorial type question, but it does highlight the challenges in doing such a detailed question. For this reason, we do not have a final answer. This question is intended to be an intro to the subject and as such it should not be taken as part of….

与 GPT-Neo 具有相似表现的其他模型包括 GPT-2 变体以及诸如 Pegasus 这样的模型。这些模型旨在找到通往解决方案的最短路径，并且它们通常会默认给出答案，而不是重新表述或使问题变得复杂。它们的设计旨在保持响应性，并且在对话或需要快速获取信息时表现出色。但对于 Evol-Instruct 这种需要精细处理的任务——即针对更深入、特定领域的互动进行提示的塑造——它们就表现不佳了。

基于BERT的模型以其在理解语境和含义方面的卓越能力而著称，但它们并不适合Evol-Instruct这一任务。该任务需要一种更具生成性的方法，而这种方法更适合因果语言模型来处理。

Models We Recommend

像 Mistral-7B、MiniCPM 和 WizardLM 这样的模型特别适合这类工作。例如，Mistral-7B 在改进提示方面表现出色，它能够以一种深化和丰富问题的方式进行改进，而不会偏离到不相关的内容上。它能够完全把握生成更具针对性提示所需的细微差别，从而深入探讨特定领域的问题。

WizardLM 是基于 LLaMa 模型进行微调的版本，在这一领域也展现出了卓越的能力。它专门针对生成提示这一任务进行设计，对于那些希望提升其语言模型回复的相关性和特异性的人来说，它是一个非常实用的工具。

对于那些希望将这些技术应用到自身项目中的人而言，像 Mistral-7B 和 WizardLM 这样的模型非常值得推荐。它们具备必要的生成灵活性和深度，能够生成高质量且针对特定领域的数据集，从而显著提升任何模型的性能。

GPT-3.5 和 GPT-4 也具有出色的表现，原因与前者相同，但请记住，它们并非开源软件，您需要付费才能使用其 API！

在我们的示例中，我们使用了经过微调的 Mistral-7b 版本，名为 Mistral-7B-Instruct-v0.2，并且取得了非常出色的结果。

以下是对 Mistral 系统对我们的指令执行情况的简要概述：


 
 
 
 
   
Base Prompt: Discuss the implications of using AI in medical diagnostics.  
  
Task: To Deepen  
  
Evolved Prompt: How does the integration of artificial intelligence (AI) into medical diagnostics impact the accuracy, efficiency, and ethical considerations of patient care, and what are the potential long-term consequences for the doctor-patient relationship and the healthcare industry as a whole?

Prompts

对于每一项任务，我们都有来自此处的提示模板，您可以根据自己的需求对其进行修改！我们还在代码中添加了相同的提示以供您参考！

picture.image

Image from the WizardLM Paper

Code!

Step 1: Prepare Your Environment

对于 Mistral-7B 和 WizardLM 来说，我们不建议使用谷歌 Colab 环境，因为其内存容量不足。

安装 Python 库：您需要从 Hugging Face 获取 transformers 和 torch 这两个库。


 
 
 
 
   
pip install transformers torch

Step 2: Load Your Model

选择一个能够执行指导性任务的模型。就我们的需求而言，我们将以 Mistral-7B 为例，因为它在处理复杂的指导性指令方面表现出色。


 
 
 
 
   
from transformers import AutoTokenizer, AutoModelForCausalLM  
  
# Replace 'mistralai/Mistral-7B-Instruct-v0.2' with your model of choice if necessary.  
tokenizer = AutoTokenizer.from\_pretrained("mistralai/Mistral-7B-Instruct-v0.2")  
model = AutoModelForCausalLM.from\_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

Step 3: Define Your Base Prompts

我们正将人工智能在医疗领域的应用作为主要研究课题，并准备了 5 个基础提示样本，计划对其进行进一步的改进！

这些提示应当足够宽泛，以便能够实现显著的改进和变化。


 
 
 
 
   
base\_prompts = [  
    "Discuss the implications of AI in diagnosing diseases.",  
    "Explain how AI can improve patient care management.",  
    "How does AI impact surgical outcomes?",  
    "Evaluate the role of AI in medical training and simulation.",  
    "Assess the use of AI in patient data privacy and security."  
]

Step 4: Evolve the Prompts

使用 Evol-Instruct 方法将每个提示转化为更复杂的问题。这可能包括深化内容、使其更具体化，或者要求增加推理步骤。

我们已经考虑了对每个基础提示进行深化、具体化以及增强其推理能力。此外，还有其他方法来改进这些提示，您可以在这里找到更多提示示例以供您自行应用！

1. Deepening Prompt


 
 
 
 
   
def deepen\_prompt(base\_prompt):  
    """  
    Evolves the base prompt by deepening the content using the Evol-Instruct method.  
    """  
    # The Evol-Instruct method as per the provided guidelines  
    instruction = f"""  
    I want you to act as a Prompt Rewriter.  
    Your objective is to evolve a given prompt into a more complex version to make those famous AI systems  
    (e.g., ChatGPT and GPT-4) a bit harder to handle.  
    But the evolved prompt must be reasonable and must be understood and responded by humans.  
    You should complicate the given prompt using the following method:  
    If the given prompt contains inquiries about certain issues, the depth and breadth of the inquiry can be increased.  
    You should try your best not to make the evolved prompt become verbose, the evolved prompt can only add 10 to 20 words into the given prompt.  
    'Given Prompt', 'Evolved Prompt', 'given prompt' and 'evolved prompt' are not allowed to appear in the Evolved Prompt  
    Given Prompt:  
    {base\_prompt}  
    Evolved Prompt:  
    """  
    inputs = tokenizer.encode(instruction, return\_tensors="pt")  
    outputs = model.generate(inputs, max\_length=150, num\_beams=5, early\_stopping=True)  
    evolved\_prompt = tokenizer.decode(outputs[0], skip\_special\_tokens=True)  
    return evolved\_prompt

1. Concretising Prompt


 
 
 
 
   
def concretize\_prompt(base\_prompt):  
    """  
    Evolves the base prompt by concretizing the content using the Evol-Instruct method.  
    """  
    # The Evol-Instruct method as per the provided guidelines  
    instruction = f"""  
    A Concretizing Prompt  
    I want you to act as a Prompt Rewriter.  
    Your objective is to evolve a given prompt into a more specific and concrete version to challenge AI systems.  
    Please replace general concepts with more specific concepts within the given prompt.  
    You should try your best not to make the evolved prompt verbose and the evolved prompt can only add 10 to 20 words.  
    'Given Prompt', 'Evolved Prompt', 'given prompt' and 'evolved prompt' are not allowed to appear in the Evolved Prompt  
    Given Prompt:  
    {base\_prompt}  
    Evolved Prompt:  
    """  
  
  inputs = tokenizer.encode(instruction, return\_tensors="pt")  
    outputs = model.generate(inputs, max\_length=150, num\_beams=5, early\_stopping=True)  
    evolved\_prompt = tokenizer.decode(outputs[0], skip\_special\_tokens=True)  
    return evolved\_prompt

1. Increased Reasoning Prompt


 
 
 
 
   
def increase\_reasoning\_steps\_prompt(base\_prompt):  
    """  
    Evolves the base prompt by increasing reasoning steps using the Evol-Instruct method.  
    """  
    # The Evol-Instruct method as per the provided guidelines  
    instruction = f"""  
    An Increased Reasoning Steps Prompt  
    I want you to act as a Prompt Rewriter.  
    Your objective is to rewrite a given prompt into a version that requires multiple-step reasoning.  
    If the given prompt can be solved with just a few simple thinking processes, you can rewrite it to explicitly request multiple-step reasoning.  
    You should try your best not to make the evolved prompt verbose, and the evolved prompt can only add 10 to 20 words.  
    'Given Prompt', 'Evolved Prompt', 'given prompt' and 'evolved prompt' are not allowed to appear in the Evolved Prompt  
    Given Prompt:  
    {base\_prompt}  
    Evolved Prompt:  
    """  
      
    inputs = tokenizer.encode(instruction, return\_tensors="pt")  
    outputs = model.generate(inputs, max\_length=150, num\_beams=5, early\_stopping=True)  
    evolved\_prompt = tokenizer.decode(outputs[0], skip\_special\_tokens=True)  
    return evolved\_prompt


 
 
 
 
   
evolved\_prompts = {}  
for base\_prompt in base\_prompts:  
    evolved\_prompts[base\_prompt] = {  
        "Deepening": deepen\_prompt(base\_prompt),  
        "Concretizing": concretize\_prompt(base\_prompt),  
        "Increasing Reasoning": increase\_reasoning\_steps\_prompt(base\_prompt)  
    }

Note on Parameters:

• max\_length=150 ：这是对回答长度的限制。这是一个 兼顾详尽性和避免过度延伸 的合理选择。
• num\_beams=5 : 更多的波束意味着更全面的搜索，但计算成本也会相应提高。如果您需要更快的结果或者存在资源限制的情况，可以减少波束的数量。
• early\_stopping=True : 一旦找到满意的答案，就会停止生成过程，从而加快了整个流程。

根据您的计算能力以及所需的输出细节来调整这些设置。进行试验以找到最适合您情况的最优配置。

Step 5: Validate and Refine

审查已制定的提示，以确保其符合医疗领域的预期复杂性和具体性要求。如有必要，进行相应调整，使其更符合您特定领域的需求。

Step 6: Store Your Dataset

一旦您对生成的提示感到满意，就将它们保存到一个 JSON 文件中。这种结构化的格式便于将数据用于训练或测试等用途。


 
 
 
 
   
dataset = {"base\_prompts": base\_prompts, "evolved\_prompts": evolved\_prompts}  
with open('medical\_domain\_dataset.json', 'w') as file:  
    json.dump(dataset, file, indent=2)  
  
# Print out a success message  
print("Dataset generation complete. File saved as 'medical\_domain\_dataset.json'.")

此数据集将为您提供有关每个提示如何得到深化和具体化的详细分析。对于精细调整而言，这样的数据集处理起来会比较复杂。因此，我们建议您逐一解析每个提示，生成回复，并将其存储在单独的 JSON 文件中，以便于更轻松地进行精细调整！

Conclusion

我们对 Evol-Instruct 的深入研究使我们对为特定领域优化语言模型有了宝贵的见解。我们了解到，模型的有效性不仅取决于其规模，还取决于其被引导的方式。虽然像 GPT-Neo 这样的通用模型在处理我们复杂的任务时表现不佳，但像 Mistral-7B 和 LLaMA 这样的专业模型在经过精心设计的提示引导下展现出了良好的前景。

从本质上讲，当我们将这些人工智能工具精准地定制以满足我们独特的需求时，真正的神奇之处便展现出来了。这种方法不仅能充分发挥它们的全部潜力，还能确保它们在特定领域中表现最佳。通过利用像 Mistral-7B 和 LLaMA 这样的模型，并在精心设计的提示引导下，我们可以显著提高其性能和效率。这种范式转变凸显了针对特定领域的精细调整的重要性，使我们能够部署不仅强大而且高度相关且有效的 AI 解决方案，以应对复杂的任务。

参考文献

点个「赞」+「在看」❤️

让我们知道这份文字有温暖到你，也是我们持续创作的最大动力！

《Pygame RPG 开发实战：1-6 系列第 1 期代码细评，从逻辑到效率的提升指南》

指令微调数据-少即是多

LLM generate 参数怎么用？

语音合成（TTS）跳跃与重复问题的解析：成因、机制及解决方案

大模型训练新思路：GEPA 靠 “反思” 赢过 RL，看完秒懂

F5-TTS：用 Flow Matching 玩转语音，流畅度和真实感都 “拉满” 了

E2 TTS：令人尴尬地简单、完全非自回归、零样本的语音合成技术

Voicebox：大规模文本引导的多语言通用语音生成技术

为什么都在聊 Kimi K2？Open Agentic Intelligence 藏着哪些新惊喜

Step-Audio-AQAA 端到端音频模型

DPO、PPO、GRPO的原理，区别与联系

OPENCSG 中文语料库：一系列高质量的中文数据集，用于语言模型训练

不要对 2+3=？想太多：关于类 o1 大语言模型的过度思考

什么是 Classifier-Free Guidance？

Conditional Flow Matching : 连续标准流 Continuous Normalizing Flow

CFM 与 OT-CFM：条件流匹配与最优传输的碰撞

DPO损失实现

Conditional Flow Matching : 常微分方程ODE、欧拉方法和Neural ODE

当 Normalizing flow 遇上语音生成：AI 说话变 “真人” 的秘密在这里！

深度剖析：Kimi - Audio 中 BigVGAN 的神奇作用

为什么说分布变换是 Normalizing flow 的「灵魂操作」？

MATCHA-TTS 来了！条件流匹配让文本转语音效率飙升

从知识增长的角度提升RAG上下文的质量

MiniMax-Speech，零样本语音合成新突破，32 种语言轻松拿捏！

手把手教你创建 evol-instruct 数据集！附完整流程~

社交类聊天的 Query 分析与应答策略

SFT 中指令选择和响应选择哪个更重要？

角色扮演大模型技术分享2-超拟人模型的困境

最新！SpeechLLM 综述：架构、能力、挑战与未来全揭秘

如何低成本生成高质量指令微调数据？

从数量到质量：通过自引导数据选择来提升语言模型性能以实现指令调优

Kimi-Audio：开源音频基础模型全面解析

Kimi-Audio 的 TTS 效果如何？

Qwen 的训练数据是怎么做的？

GeForce RTX 3090, 4090, A10, A40, A100, A800, L20, L40 显卡性能对比

基础模型中的新范式：为什么o1是不同的，以及它将如何改变LLM应用

Semantic token和连续特征在SLLM下的对比

从数量到质量：通过自引导数据选择来提升语言模型性能以实现指令调优

RLHF及其变体：进展和实际工程见解

胖东来与京东联手了

Freeze-Omni: 低延迟语音对话模型

Fully Sharded Data Parallelism (FSDP)

什么是置信度？置信度模型怎么做？

晦涩难懂的 Flow matching！图形化理解

中文指令微调数据，质量就是一切！

基于 LLM 的文本泛化

CosyVoice 2：基于大型语言模型的可扩展流式语音合成技术

Mini-Omni2: with Vision, Speech and Duplex Capabilities

FSQ的原理与VQ-VAE的区别和联系

大模型并行训练的一些知识——极简版

亲测有效！如何用 Address Sanitizer 精准定位内存漏洞？附保姆级操作指南

要用 AI 裁员 50% 的千亿独角兽，公开认错，重启招聘！

一些文档去重算法

single codebook和dual codebook在LLM中向量量化上有什么区别？

亲测有效！如何用 Address Sanitizer 精准定位内存漏洞？附保姆级操作指南

CosyVoice：一种基于监督式语义标记的可扩展多语言 Zero-Shot 语音合成器

Model Context Protocol (MCP)

MCP（模型上下文协议）是什么以及它是如何运作的

压力测试LLMs——大海捞针实现

picture.image

近日还在想要不要建个群呢？感觉自己是个i人，又懒，打理不来呀。但这个想法不自主的就冒出来了，还是要思考下。天人交战良久，得，一位朋友私我要入群，那就建一个吧，感谢。

欢迎入群，希望能有一个交流的地方。但群主是个i人，没事儿让他想静静，有事儿圈他。

群主不是万能的，不是万能的，不是能的，能的。