Ollama发布更新，支持带工具调用的流式响应 - 文章 - 开发者社区

实时交互和即时响应是AI应用体验的关键，但阻塞式的工具调用往往会打断内容的流畅性，导致用户在模型与外部工具交互时经历不必要的等待。Ollama 近日推出v0.8更新，带来了带工具调用 的流式响应 (Streaming responses with tool calling) 功能，让开发者构建的聊天应用从此能够像流式输出普通文本一样，实时地调用工具并展示结果。

picture.image

这一更新使得所有聊天应用都能够在模型生成内容的同时，实时地调用外部工具，并将整个过程（包括模型的思考、工具的调用指令、以及最终的文本回复）流畅地展示给用户。该功能已在 Ollama 的 Python 和 JavaScript 库以及 cURL API 中得到全面支持。

picture.image

本次更新的核心亮点包括：

即时工具调用与内容流式输出： 应用不再需要等待模型完整响应后才能处理工具调用，模型生成内容和工具调用指令可以同步、分块地流式传输。
全新智能增量解析器： Ollama 构建了新的解析器，它专注于理解工具调用的结构，而不仅仅是寻找JSON。这使得Ollama能够：

实时分离： 在流式输出用户内容的同时，准确检测、抑制和解析工具调用相关的Token。
兼容广泛模型： 无论模型是否经过工具特定Token的训练，都能有效工作，甚至能处理模型输出的部分前缀或在必要时回退到JSON解析。
提升准确性： 通过前缀匹配和状态管理，显著改善了工具调用的可靠性，避免了以往可能出现的重复或错误解析问题。

广泛的模型支持： 包括 Qwen 3, Devstral, Qwen2.5 系列, Llama 3.1, Llama 4 等众多支持工具调用的模型。
开发者友好的集成： 提供了清晰的 cURL, Python, JavaScript 示例，方便快速上手。
模型上下文协议 (MCP) 增强： 使用 MCP 的开发者现在也可以享受流式聊天内容和工具调用的好处，并且官方建议使用更大的上下文窗口（如 32k）可以进一步提升工具调用的性能和结果质量。

在技术实现层面，开发者可以通过以下方式启用该功能：

REST API (cURL): 在 /api/chat 请求中设置 "stream": true 并通过 tools 数组定义可用的工具。
Python: 使用 ollama.chat() 时，设置 stream=True 并将工具定义（可以是函数对象）传递给 tools 参数。
JavaScript: 使用 ollama.chat() 时，设置 stream: true 并将工具schema对象传递给 tools 参数。

下面是 Python 的示例 (调用自定义的数学函数):

  
# Define the python function  
def add\_two\_numbers(a: int, b: int) -> int:  
"""  
  Add two numbers  
  
  Args:  
    a (set): The first number as an int  
    b (set): The second number as an int  
  
  Returns:  
    int: The sum of the two numbers  
  """  
return a + b  
  
from ollama import chat  
messages = [{'role': 'user', 'content': 'what is three minus one?'}]  
  
response: ChatResponse = chat(  
  model='qwen3',  
  messages=messages,  
  tools=[add\_two\_numbers], # Python SDK supports passing tools as functions  
  stream=True  
)  
  
for chunk in response:  
# Print model content  
  print(chunk.message.content, end='', flush=True)  
# Print the tool call  
if chunk.message.tool\_calls:  
    print(chunk.message.tool\_calls)

预期输出 (示例，取决于模型行为和用户问题是否匹配工具):

  
<think>  
Okay, the user is asking ...  
</think>  
  
[ToolCall(function=Function(name='subtract\_two\_numbers', arguments={'a': 3, 'b': 1}))]

cURL 示例 (查询天气):

  
curl http://localhost:11434/api/chat -d '{  
  "model": "qwen3",  
  "messages": [  
    {  
      "role": "user",  
      "content": "What is the weather today in Toronto?"  
    }  
  ],  
  "stream": true,  
  "tools": [  
    {  
      "type": "function",  
      "function": {  
        "name": "get\_current\_weather",  
        "description": "Get the current weather for a location",  
        "parameters": {  
          "type": "object",  
          "properties": {  
            "location": {  
              "type": "string",  
              "description": "The location to get the weather for, e.g. San Francisco, CA"  
            },  
            "format": {  
              "type": "string",  
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",  
              "enum": ["celsius", "fahrenheit"]  
            }  
          },  
          "required": ["location", "format"]  
        }  
      }  
    }  
  ]  
}'

流式输出:

  
...  
{  
"model": "qwen3",  
"created\_at": "2025-05-27T22:54:57.641643Z",  
"message": {  
    "role": "assistant",  
    "content": "celsius"  
  },  
"done": false  
}  
 {  
"model": "qwen3",  
"created\_at": "2025-05-27T22:54:57.673559Z",  
"message": {  
    "role": "assistant",  
    "content": "</think>"  
  },  
"done": false  
}  
{  
"model": "qwen3",  
"created\_at": "2025-05-27T22:54:58.100509Z",  
"message": {  
    "role": "assistant",  
    "content": "",  
    "tool\_calls": [  
      {  
        "function": {  
          "name": "get\_current\_weather",  
          "arguments": {  
            "format": "celsius",  
            "location": "Toronto"  
          }  
        }  
      }  
    ]  
  },  
"done": false  
}  
...

官方同时指出，为了获得最佳工具调用效果，对于需要高精度工具调用或复杂交互的场景，如下所示，可以尝试通过 options 中的 num\_ctx 增加模型的上下文窗口（例如设置为 32000），但这会增加内存使用。

  
curl -X POST "http://localhost:11434/api/chat" -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    }  
  ],  
  "options": {  
    "num\_ctx": 32000 # Update context window here  
  }  
}'

感兴趣的朋友快去下载最新版本体验吧。

地址：https://ollama.com/download

书中有ollama功能介绍、实战使用集成全过程案例，感兴趣可以购买。

公众号回复进群“入群”讨论。