“Agent Recipes” 网站由 Together AI 提供支持,旨在通过展示一系列常见的代理模式(Agent Recipes)及其应用,帮助开发者改进基于 LLM 和 AI Agent的应用程序。这些模式受到 Anthropic 文章「Building effective agents」的启发,涵盖了多种工作流模式,每种模式都配有摘要、用例和可复制的代码示例,方便开发者快速理解和应用。
| 网站核心内容
“Agent Recipes” 网站介绍了以下六种主要的 Agent 工作流模式,旨在解决不同场景下的任务需求:
- 提示链工作流(Prompt Chaining Workflow)
-
摘要:
这是一种顺序工作流,其中一个 LLM 调用的输出会作为下一个调用的输入。通过这种方式,可以实现结构化的推理和逐步完成任务。
-
用例:
-
先生成营销文案,然后将其翻译成其他语言。
-
编写文档大纲,检查是否符合标准,再根据大纲生成完整文档。
-
使用 LLM 清洗原始数据,然后将清洗后的数据传递给另一个 LLM 进行总结或可视化。
-
-
代码示例:
提供了一个 Python 脚本,通过串联多个提示逐步解决数学问题,例如计算 Sally 的 babysitting 收入。
from typing import List
from helpers import run_llm
def serial_chain_workflow(input_query: str, prompt_chain : List[str]) -> List[str]:
"""Run a serial chain of LLM calls to address the `input_query`
using a list of prompts specified in `prompt_chain`.
"""
response_chain = []
response = input_query
for i, prompt in enumerate(prompt_chain):
print(f"Step {i+1}")
response = run_llm(f"{prompt}\nInput:\n{response}", model='meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo')
response_chain.append(response)
print(f"{response}\n")
return response_chain
# Example
question = "Sally earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?"
prompt_chain = ["""Given the math problem, ONLY extract any relevant numerical information and how it can be used.""",
"""Given the numberical information extracted, ONLY express the steps you would take to solve the problem.""",
"""Given the steps, express the final answer to the problem."""]
responses = serial_chain_workflow(question, prompt_chain)
final_answer = responses[-1]
- 路由工作流(Routing Workflow)
-
摘要:
这是一种低延迟的工作流,根据输入的特性动态路由到最合适的 LLM 实例或配置,以优化效率和专业化。
-
用例:
-
将简单问题路由到较小的模型(如 Llama 3.1 8B),将复杂问题路由到更强大的模型(如 Deepseek v3),以平衡成本和速度。
-
根据客户服务查询的类型(例如退款请求或技术支持),将其路由到不同的下游流程或模型。
-
-
代码示例:
提供了一个 Python 脚本,展示如何根据用户输入选择最佳模型,例如为代码生成任务选择 Qwen2.5-Coder-32B。
from pydantic import BaseModel, Field
from typing import Literal, Dict
from helpers import run_llm, JSON_llm
def router_workflow(input_query: str, routes: Dict[str, str]) -> str:
"""Given a `input_query` and a dictionary of `routes` containing options and details for each.
Selects the best model for the task and return the response from the model.
"""
ROUTER_PROMPT = """Given a user prompt/query: {user_query}, select the best option out of the following routes:
{routes}. Answer only in JSON format."""
# Create a schema from the routes dictionary
class Schema(BaseModel):
route: Literal[tuple(routes.keys())]
reason: str = Field(
description="Short one-liner explanation why this route was selected for the task in the prompt/query."
)
# Call LLM to select route
selected_route = JSON_llm(
ROUTER_PROMPT.format(user_query=input_query, routes=routes), Schema
)
print(
f"Selected route:{selected_route['route']}\nReason: {selected_route['reason']}\n"
)
# Use LLM on selected route.
# Could also have different prompts that need to be used for each route.
response = run_llm(user_prompt=input_query, model=selected_route["route"])
print(f"Response: {response}\n")
return response
prompt_list = [
"Produce python snippet to check to see if a number is prime or not.",
"Plan and provide a short itenary for a 2 week vacation in Europe.",
"Write a short story about a dragon and a knight.",
]
model_routes = {
"Qwen/Qwen2.5-Coder-32B-Instruct": "Best model choice for code generation tasks.",
"Gryphe/MythoMax-L2-13b": "Best model choice for story-telling, role-playing and fantasy tasks.",
"Qwen/QwQ-32B-Preview": "Best model for reasoning, planning and multi-step tasks",
}
for i, prompt in enumerate(prompt_list):
print(f"Task {i+1}: {prompt}\n")
print(20 * "==")
router_workflow(prompt, model_routes)
- 并行化工作流(Parallelization Workflow)
-
摘要:
这是一种将任务分解为独立部分并同时处理的工作流,最终聚合结果以应对复杂或大规模操作。
-
用例:
-
同时使用一个 LLM 回答用户问题,另一个 LLM 筛选不当内容。
-
将长文档分成多个部分,由不同的 LLM 并行总结,最后汇总成完整概述。
-
-
代码示例:
提供了一个 Python 脚本,利用异步调用多个 LLM 并聚合结果,例如解决一个关于苹果采摘数量的数学问题。
import asyncio
from typing import List
from helpers import run_llm, run_llm_parallel
async def parallel_workflow(prompt : str, proposer_models : List[str], aggregator_model : str, aggregator_prompt: str):
"""Run a parallel chain of LLM calls to address the `input_query`
using a list of models specified in `models`.
Returns output from final aggregator model.
"""
# Gather intermediate responses from proposer models
proposed_responses = await asyncio.gather(*[run_llm_parallel(prompt, model) for model in proposer_models])
# Aggregate responses using an aggregator model
final_output = run_llm(user_prompt=prompt,
model=aggregator_model,
system_prompt=aggregator_prompt + "\n" + "\n".join(f"{i+1}. {str(element)}" for i, element in enumerate(proposed_responses)
))
return final_output, proposed_responses
reference_models = [
"microsoft/WizardLM-2-8x22B",
"Qwen/Qwen2.5-72B-Instruct-Turbo",
"google/gemma-2-27b-it",
"meta-llama/Llama-3.3-70B-Instruct-Turbo",
]
user_prompt = """Jenna and her mother picked some apples from their apple farm.
Jenna picked half as many apples as her mom. If her mom got 20 apples, how many apples did they both pick?"""
aggregator_model = "deepseek-ai/DeepSeek-V3"
aggregator_system_prompt = """You have been provided with a set of responses from various open-source models to the latest user query.
Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information
provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the
given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured,
coherent, and adheres to the highest standards of accuracy and reliability.
Responses from models:"""
async def main():
answer, intermediate_reponses = await parallel_workflow(prompt = user_prompt,
proposer_models = reference_models,
aggregator_model = aggregator_model,
aggregator_prompt = aggregator_system_prompt)
for i, response in enumerate(intermediate_reponses):
print(f"Intermetidate Response {i+1}:\n\n{response}\n")
print(f"Final Answer: {answer}\n")
asyncio.run(main())
- 编排者-工作者工作流(Orchestrator-workers Workflow)
-
摘要:
这种工作流由一个中央编排者 LLM 将任务分解为子任务,分配给多个工作者 LLM 并行处理,最后将结果合成为复杂、协调的输出。
-
用例:
-
将编码问题分解为子任务,由不同的 LLM 生成代码片段,再由编排者整合成完整解决方案。
-
创建教程时,将任务拆分为引言、步骤和示例,由工作者 LLM 分别处理,最后合成完整文档。
-
-
代码示例:
提供了一个 Python 脚本,展示如何为环保水壶生成不同风格的产品描述(正式、对话和混合风格)。
import asyncio
import json
from pydantic import BaseModel, Field
from typing import Literal, List
from helpers import run_llm_parallel, JSON_llm
ORCHESTRATOR_PROMPT = """
Analyze this task and break it down into 2-3 distinct approaches:
Task: {task}
Provide an Analysis:
Explain your understanding of the task and which variations would be valuable.
Focus on how each approach serves different aspects of the task.
Along with the analysis, provide 2-3 approaches to tackle the task, each with a brief description:
Formal style: Write technically and precisely, focusing on detailed specifications
Conversational style: Write in a friendly and engaging way that connects with the reader
Hybrid style: Tell a story that includes technical details, combining emotional elements with specifications
Return only JSON output.
"""
WORKER_PROMPT = """
Generate content based on:
Task: {original_task}
Style: {task_type}
Guidelines: {task_description}
Return only your response:
[Your content here, maintaining the specified style and fully addressing requirements.]
"""
task = """Write a product description for a new eco-friendly water bottle.
The target_audience is environmentally conscious millennials and key product features are: plastic-free, insulated, lifetime warranty
"""
class Task(BaseModel):
type: Literal["formal", "conversational", "hybrid"]
description: str
class TaskList(BaseModel):
analysis: str
tasks: List[Task] = Field(..., default_factory=list)
async def orchestrator_workflow(task : str, orchestrator_prompt : str, worker_prompt : str):
"""Use a orchestrator model to break down a task into sub-tasks and then use worker models to generate and return responses."""
# Use orchestrator model to break the task up into sub-tasks
orchestrator_response = JSON_llm(orchestrator_prompt.format(task=task), schema=TaskList)
# Parse orchestrator response
analysis = orchestrator_response["analysis"]
tasks= orchestrator_response["tasks"]
print("\n=== ORCHESTRATOR OUTPUT ===")
print(f"\nANALYSIS:\n{analysis}")
print(f"\nTASKS:\n{json.dumps(tasks, indent=2)}")
worker_model = ["meta-llama/Llama-3.3-70B-Instruct-Turbo"]*len(tasks)
# Gather intermediate responses from worker models
return tasks , await asyncio.gather(*[run_llm_parallel(user_prompt=worker_prompt.format(original_task=task, task_type=task_info['type'], task_description=task_info['description']), model=model) for task_info, model in zip(tasks,worker_model)])
async def main():
task = """Write a product description for a new eco-friendly water bottle.
The target_audience is environmentally conscious millennials and key product features are: plastic-free, insulated, lifetime warranty
"""
tasks, worker_resp = await orchestrator_workflow(task, orchestrator_prompt=ORCHESTRATOR_PROMPT, worker_prompt=WORKER_PROMPT)
for task_info, response in zip(tasks, worker_resp):
print(f"\n=== WORKER RESULT ({task_info['type']}) ===\n{response}\n")
asyncio.run(main())
- 评估者-优化者工作流(Evaluator-optimizer Workflow)
-
摘要:
这是一种迭代细化的工作流,一个 LLM 执行任务后,另一个 LLM 评估结果是否满足所有标准。如果不满足,则重复调整,直到通过评估。
-
用例:
-
生成满足特定运行时复杂度要求的代码,并通过评估者验证。
-
撰写具有特定语气或风格的文章,评估者确保输出符合要求。
-
-
代码示例:
提供了一个 Python 脚本,展示如何通过循环生成和评估一个O(1)复杂度的堆栈实现,直到满足所有要求。
from pydantic import BaseModel
from typing import Literal
from helpers import run_llm, JSON_llm
task = """
Implement a Stack with:
1. push(x)
2. pop()
3. getMin()
All operations should be O(1).
"""
GENERATOR_PROMPT = """
Your goal is to complete the task based on <user input>. If there are feedback
from your previous generations, you should reflect on them to improve your solution
Output your answer concisely in the following format:
Thoughts:
[Your understanding of the task and feedback and how you plan to improve]
Response:
[Your code implementation here]
"""
def generate(task: str, generator_prompt: str, context: str = "") -> tuple[str, str]:
"""Generate and improve a solution based on feedback."""
full_prompt = f"{generator_prompt}\n{context}\nTask: {task}" if context else f"{generator_prompt}\nTask: {task}"
response = run_llm(full_prompt, model="Qwen/Qwen2.5-Coder-32B-Instruct")
print("\n## Generation start")
print(f"Output:\n{response}\n")
return response
EVALUATOR_PROMPT = """
Evaluate this following code implementation for:
1. code correctness
2. time complexity
3. style and best practices
You should be evaluating only and not attempting to solve the task.
Only output "PASS" if all criteria are met and you have no further suggestions for improvements.
Provide detailed feedback if there are areas that need improvement. You should specify what needs improvement and why.
Only output JSON.
"""
def evaluate(task : str, evaluator_prompt : str, generated_content: str, schema) -> tuple[str, str]:
"""Evaluate if a solution meets requirements."""
full_prompt = f"{evaluator_prompt}\nOriginal task: {task}\nContent to evaluate: {generated_content}"
#Build a schema for the evaluation
class Evaluation(BaseModel):
evaluation: Literal["PASS", "NEEDS_IMPROVEMENT", "FAIL"]
feedback: str
response = JSON_llm(full_prompt, Evaluation)
evaluation = response["evaluation"]
feedback = response["feedback"]
print("## Evaluation start")
print(f"Status: {evaluation}")
print(f"Feedback: {feedback}")
return evaluation, feedback
def loop_workflow(task: str, evaluator_prompt: str, generator_prompt: str) -> tuple[str, list[dict]]:
"""Keep generating and evaluating until the evaluator passes the last generated response."""
# Store previous responses from generator
memory = []
# Generate initial response
response = generate(task, generator_prompt)
memory.append(response)
# While the generated response is not passing, keep generating and evaluating
while True:
evaluation, feedback = evaluate(task, evaluator_prompt, response)
# Terminating condition
if evaluation == "PASS":
return response
# Add current response and feedback to context and generate a new response
context = "\n".join([
"Previous attempts:",
*[f"- {m}" for m in memory],
f"\nFeedback: {feedback}"
])
response = generate(task, generator_prompt, context)
memory.append(response)
loop_workflow(task, EVALUATOR_PROMPT, GENERATOR_PROMPT)
- 自主 Agent 工作流(Autonomous Agent Workflow)(即将推出)
-
摘要:
这是一种基于 Agent 的工作流,LLM 在循环中自主行动,与环境交互并根据反馈优化其决策和行动。
-
用例:
暂未详细说明,但预期适用于需要持续学习和适应的场景。
| 网站的特色与价值
-
实用性:
每种工作流模式都提供了详细的摘要、实际用例和可运行的代码示例,帮助开发者快速上手并将其应用到自己的项目中。
-
灵活性:
这些模式针对不同需求(如效率、复杂任务分解、结果优化等)提供了多样化的解决方案。
-
深度资源:
网站还提供“deep-dive notebook”,为每种模式提供更多示例和细节,便于开发者进一步探索。
| 总结
“Agent Recipes” 网站是一个面向开发者的实用资源库,通过展示六种 Agent 工作流模式(提示链、路由、并行化、编排者-工作者、评估者-优化者以及即将推出的自主 Agent),帮助他们优化 LLM 应用程序。这些模式结合了理论描述、现实场景和代码实现,为开发者提供了一个高效的学习和应用工具。无论是处理简单任务还是复杂操作,这些模式都能提升应用程序的性能和灵活性,是构建智能系统的宝贵参考。
网站地址:
