https://blog.composio.dev/improving-function-calling-accuracy-for-agentic-integrations/
参照clickup文档,选定了一些api,并转换成openai function schema格式,具体如下:
https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/clickup_space_schema.json
get_spaces(team_id:string, archived:boolean)
create_space(team_id:string, name:string, multiple_assignees:boolean, features:(due_dates:(enabled:boolean, start_date:boolean, remap_due_dates:boolean, remap_closed_due_date:boolean), time_tracking:(enabled:boolean)))
get_space(space_id:string)
update_space(space_id:string, name:string, color:string, private:boolean, admin_can_manage:boolean, multiple_assignees:boolean, features:(due_dates:(enabled:boolean, start_date:boolean, remap_due_dates:boolean, remap_closed_due_date:boolean), time_tracking:(enabled:boolean)))
delete_space(space_id:string)
get_space_tags(space_id:string)
create_space_tag(space_id:string, tag:(name:string, tag_fg:string, tag_bg:string))
delete_space_tag(space_id:string, tag_name:string, tag:(name:string, tag_fg:string, tag_bg:string))
为了有效地评估结果,制作了小型的基准数据集,要求解决八个选定函数之一,范围从简单到复杂。 我们的评估将基于使用正确参数调用函数的准确程度。
https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/clickup_space_benchmark.json
[
{
"prompt": "As the new fiscal year begins, the management team at a marketing agency decides it's time to archive older projects to make way for new initiatives. They remember that one of their teams is called \"Innovative Solutions\" and operates under the team ID \"team123\". They want to check which spaces under this team are still active before deciding which ones to archive.",
"solution": "get\_spaces(team\_id=\"team123\", archived=False)"
},
{
"prompt": "Ella, the project coordinator, is setting up a new project space in ClickUp for the \"Creative Minds\" team with team ID \"cm789\". This space, named \"Innovative Campaigns 2023\", should allow multiple assignees for tasks, but keep due dates and time tracking disabled, as the initial planning phase doesn't require strict deadlines or time monitoring.",
"solution": "create\_space(team\_id=\"cm789\", name=\"Innovative Campaigns 2023\", multiple\_assignees=True, features=(due\_dates=(enabled=False, start\_date=False, remap\_due\_dates=False, remap\_closed\_due\_date=False), time\_tracking=(enabled=False)))"
},
...
]
独立评估GPT-4的性能,而不受任何系统提示的影响。
fcalling_llm = lambda fprompt : client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{
"role": "system",
"content": """"""
},
{
"role": "user",
"content": prompt
},
],
temperature=0,
max_tokens=4096,
top_p=1,
tools=tools,
tool_choice="auto"
)
response = fcalling_llm(bench_data[1]["prompt"])
我们设置了temperature=0,以使结果不随机,实验了3次平均准确率为30%。
有些函数要求输出参数以嵌套结构的形式提供。以下是一个示例-
{
"name": "create\_space",
"description": "Add a new Space to a Workspace.",
"parameters": {
"type": "object",
"properties": {
"team\_id": {
"type": "string",
"description": "The ID of the team"
},
"name": {
"type": "string",
"description": "The name of the new space"
},
"multiple\_assignees": {
"type": "boolean",
"description": "Enable or disable multiple assignees for tasks within the space"
},
"features": {
"type": "object",
"description": "Enabled features within the space",
"properties": {
"due\_dates": {
"type": "object",
"description": "Due dates feature settings",
"properties": {
"enabled": { "type": "boolean" },
"start\_date": { "type": "boolean" },
"remap\_due\_dates": { "type": "boolean" },
"remap\_closed\_due\_date": { "type": "boolean" }
}
},
"time\_tracking": {
"type": "object",
"description": "Time tracking feature settings",
"properties": {
"enabled": { "type": "boolean" }
}
}
}
}
},
"required": ["team\_id", "name", "multiple\_assignees", "features"]
}
}
根据对LLM的经验,虽然模型(GPT-4)已经针对结构化输出进行了优化,但复杂的输出结构实际上可能会降低LLM输出的性能和准确性。
因此,我们以前缀方式展平这些参数。
展平后的上述函数将如下所示:
{
"description": "Add a new Space to a Workspace.",
"name": "create\_space",
"parameters": {
"properties": {
"features\_\_due\_dates\_\_enabled": {
"description": "enabled\_\_Due dates feature settings\_\_Enabled features within the space\_\_",
"type": "boolean"
},
"features\_\_due\_dates\_\_remap\_closed\_due\_date": {
"description": "remap\_closed\_due\_date\_\_Due dates feature settings\_\_Enabled features within the space\_\_",
"type": "boolean"
},
"features\_\_due\_dates\_\_remap\_due\_dates": {
"description": "remap\_due\_dates\_\_Due dates feature settings\_\_Enabled features within the space\_\_",
"type": "boolean"
},
"features\_\_due\_dates\_\_start\_date": {
"description": "start\_date\_\_Due dates feature settings\_\_Enabled features within the space\_\_",
"type": "boolean"
},
"features\_\_time\_tracking\_\_enabled": {
"description": "enabled\_\_Time tracking feature settings\_\_Enabled features within the space\_\_",
"type": "boolean"
},
"multiple\_assignees": {
"description": "Enable or disable multiple assignees for tasks within the space\_\_",
"type": "boolean"
},
"name": {
"description": "The name of the new space\_\_",
"type": "string"
},
"team\_id": {
"description": "The ID of the team\_\_",
"type": "string"
}
},
"required": [
"team\_id",
"name",
"multiple\_assignees",
"features\_\_due\_dates\_\_enabled",
"features\_\_due\_dates\_\_start\_date",
"features\_\_due\_dates\_\_remap\_due\_dates",
"features\_\_due\_dates\_\_remap\_closed\_due\_date",
"features\_\_time\_tracking\_\_enabled"
],
"type": "object"
}
}
我们将参数名称与其上级参数连接起来。例如:features__due_dates__enabled、features__due_dates__remap_due_dates,3次测试结果如下
之前调用没有使用系统提示词,所以LLM没有被指示其角色或与ClickUp API交互的方式。现在让我们添加一个简单的系统提示。
from openai import OpenAI
client = OpenAI()
fcalling_llm = lambda fprompt : client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{
"role": "system",
"content": """
You are an agent who is responsible for managing various employee management platform,
one of which is ClickUp.
When you are presented with a technical situation, that a person of a team is facing,
you must give the soulution utilizing your functionalities.
"""
},
{
"role": "user",
"content": fprompt
},
],
temperature=0,
max_tokens=4096,
top_p=1,
tools=tools,
tool_choice="auto"
)
response = fcalling_llm(bench_data[1]["prompt"])
可以看到通过添加一个系统提示,可以提高性能,增强其细节,以评估性能的提升是否持续。
You are an agent who is responsible for managing various employee management platform,
one of which is ClickUp.
You are given a number of tools as functions, you must use one of those tools and fillup
all the parameters of those tools ,whose answers you will get from the given situation.
When you are presented with a technical situation, that a person of a team is facing,
you must give the soulution utilizing your functionalities.
First analyze the given situation to fully anderstand what is the intention of the user,
what they need and exactly which tool will fill up that necessity.
Then look into the parameters and extract all the relevant informations to fillup the
parameter with right values.
在系统提示词的,角色之外,添加工具的功能,已进一步增强系统提示
以下是我们添加到提示词中的工具的简明摘要。
get_spaces - View the Spaces available in a Workspace.
create_space - Add a new Space to a Workspace.
get_space - View the details of a specific Space in a Workspace.
update_space - Rename, set the Space color, and enable ClickApps for a Space.
delete_space - Delete a Space from your Workspace.
get_space_tags - View the task Tags available in a Space.
create_space_tag - Add a new task Tag to a Space.
delete_space_tag - Delete a task Tag from a Space.
现在,让函数名更具描述性
schema_func_name_dict = {
"get\_spaces": "get\_all\_clickup\_spaces\_available",
"create\_space": "create\_a\_new\_clickup\_space",
"get\_space": "get\_a\_specific\_clickup\_space\_details",
"update\_space": "modify\_an\_existing\_clickup\_space",
"delete\_space": "delete\_an\_existing\_clickup\_space",
"get\_space\_tags": "get\_all\_tags\_of\_a\_clickup\_space",
"create\_space\_tag": "assign\_a\_tag\_to\_a\_clickup\_space",
"delete\_space\_tag": "remove\_a\_tag\_from\_a\_clickup\_space",
}
optimized_schema = []
for sc in flattened_schema:
temp_dict = sc.copy()
temp_dict["name"] = schema_func_name_dict[temp_dict["name"]]
optimized_schema.append(temp_dict)
schema_func_decription_dict = {
"get\_spaces": "Retrives information of all the spaces available in user's Clickup Workspace.",
"create\_space": "Creates a new ClickUp space",
"get\_space": "Retrives information of a specific Clickup space",
"update\_space": "Modifies name, settings the Space color, and assignee management Space.",
"delete\_space": "Delete an existing space from user's ClickUp Workspace",
"get\_space\_tags": "Retrives all the Tags assigned on all the tasks in a Space.",
"create\_space\_tag": "Assigns a customized Tag in a ClickUp Space.",
"delete\_space\_tag": "Deletes a specific tag previously assigned in a space.",
}
optimized_schema = []
for sc in flattened_schema:
temp_dict = sc.copy()
temp_dict["description"] = schema_func_decription_dict[temp_dict["name"]]
optimized_schema.append(temp_dict)
之前,我们通过将嵌套参数的描述与其父级描述堆叠在一起,直到它们处于扁平化状态,来扁平化模式。
现在让我们将它们替换为:
schema_func_params_dict = {
'create\_space': {
'features\_\_due\_dates\_\_enabled': 'If due date feature is enabled within the space. Default: True',
'features\_\_due\_dates\_\_remap\_closed\_due\_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
'features\_\_due\_dates\_\_remap\_due\_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
'features\_\_due\_dates\_\_start\_date': 'If start date feature in due dates is available within the space. Default: False',
'features\_\_time\_tracking\_\_enabled': 'If time tracking feature is available within the space. Default: True',
'multiple\_assignees': 'Enable or disable multiple assignees for tasks within the space. Default: True',
'name': 'The name of the new space to create',
'team\_id': 'The ID of the team'
},
'create\_space\_tag': {
'space\_id': 'The ID of the space',
'tag\_\_name': 'The name of the tag to assign',
'tag\_\_tag\_bg': 'The background color of the tag to assign',
'tag\_\_tag\_fg': 'The foreground(text) color of the tag to assign'
},
'delete\_space': {
'space\_id': 'The ID of the space to delete'
},
'delete\_space\_tag': {
'space\_id': 'The ID of the space',
'tag\_\_name': 'The name of the tag to delete',
'tag\_\_tag\_bg': 'The background color of the tag to delete',
'tag\_\_tag\_fg': 'The foreground color of the tag to delete',
'tag\_name': 'The name of the tag to delete'
},
'get\_space': {
'space\_id': 'The ID of the space to retrieve details'
},
'get\_space\_tags': {
'space\_id': 'The ID of the space to retrieve all the tags from'
},
'get\_spaces': {
'archived': 'A flag to decide whether to include archived spaces or not. Default: True',
'team\_id': 'The ID of the team'
},
'update\_space': {
'admin\_can\_manage': 'A flag to determine if the administrator can manage the space or not. Default: True',
'color': 'The color used for the space',
'features\_\_due\_dates\_\_enabled': 'If due date feature is enabled within the space. Default: True',
'features\_\_due\_dates\_\_remap\_closed\_due\_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
'features\_\_due\_dates\_\_remap\_due\_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
'features\_\_due\_dates\_\_start\_date': 'If start date feature in due dates is available within the space. Default: False',
'features\_\_time\_tracking\_\_enabled': 'If time tracking feature is available within the space. Default: True',
'multiple\_assignees': 'Enable or disable multiple assignees for tasks within the space. Default: True',
'name': 'The new name of the space',
'private': 'A flag to determine if the space is private or not. Default: False',
'space\_id': 'The ID of the space'
}
}
optimized_schema = []
for sc in flattened_schema:
temp_dict = sc.copy()
temp_dict["description"] = schema_func_decription_dict[temp_dict["name"]]
for func_param_name, func_param_description in schema_func_params_dict[temp_dict["name"]].items():
sc["parameters"]["properties"][func_param_name]["description"] = func_param_description
optimized_schema.append(temp_dict)
对于所有的运行,我们得分都达到或超过了75%。
LLMs 一般在few-shot下表现更好。所以在每个工具的描述中提供一些调用示例
schema_func_decription_dict = {
"get\_spaces": """\
Retrives information of all the spaces available in user's Clickup Workspace. Example Call:
\```python
get\_spaces({'team\_id': 'a1b2c3d4', 'archived': False})
\```
""",
"create\_space": """\
Creates a new ClickUp space. Example Call:
\```python
create\_space ({
'team\_id': 'abc123',
'name': 'NewWorkspace',
'multiple\_assignees': True,
'features\_\_due\_dates\_\_enabled': True,
'features\_\_due\_dates\_\_start\_date': False,
'features\_\_due\_dates\_\_remap\_due\_dates': False,
'features\_\_due\_dates\_\_remap\_closed\_due\_date': False,
'features\_\_time\_tracking\_\_enabled': True
})
\```}
很遗憾,分数似乎在下降!
由于函数描述中调用示例没有起作用,现在让我们尝试向函数参数添加示例值,以提供更清晰的输入值的概念。我们将相应调整函数参数的描述。
schema_func_params_dict = {
'create\_space': {
'features\_\_due\_dates\_\_enabled': 'If due date feature is enabled within the space. \nExample: True, False \nDefault: True',
'features\_\_due\_dates\_\_remap\_closed\_due\_date': 'If remapping closed date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
'features\_\_due\_dates\_\_remap\_due\_dates': 'If remapping due date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
'features\_\_due\_dates\_\_start\_date': 'If start date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
'features\_\_time\_tracking\_\_enabled': 'If time tracking feature is available within the space. \nExample: True, False \nDefault: True',
'multiple\_assignees': 'Enable or disable multiple assignees for tasks within the space \nExample: True, False. Default: True',
'name': 'The name of the new space to create \nExample: \'NewWorkspace\', \'TempWorkspace\'',
'team\_id': 'The ID of the team \nExample: \'abc123\', \'def456\' '
},
'create\_space\_tag': {
'space\_id': 'The ID of the space \nExample: \'abc123\', \'def456\'',
'tag\_\_name': 'The name of the tag to assign \nExample: \'NewTag\', \'TempTag\'',
'tag\_\_tag\_bg': 'The background color of the tag to assign \nExample: \'#FF0000\', \'#00FF00\'',
'tag\_\_tag\_fg': 'The foreground(text) color of the tag to assign \nExample: \'#FF0000\', \'#00FF00\''
},
'delete\_space': {
'space\_id': 'The ID of the space to delete \nExample: \'abc123\', \'def456\''
},
'delete\_space\_tag': {
'space\_id': 'The ID of the space to delete \nExample: \'abc123\', \'def456\'',
'tag\_\_name': 'The name of the tag to delete \nExample: \'NewTag\', \'TempTag\'',
'tag\_\_tag\_bg': 'The background color of the tag to delete \nExample: \'#FF0000\', \'#00FF00\', \'#0000FF\'',
'tag\_\_tag\_fg': 'The foreground color of the tag to delete \nExample: \'#FF0000\', \'#00FF00\', \'#0000FF\'',
'tag\_name': 'The name of the tag to delete \nExample: \'NewTag\', \'TempTag\''
},
'get\_space': {
'space\_id': 'The ID of the space to retrieve details \nExample: \'abc123\', \'def456\''
},
'get\_space\_tags': {
'space\_id': 'The ID of the space to retrieve all the tags from \nExample: \'abc123\', \'def456\''
},
'get\_spaces': {
'archived': 'A flag to decide whether to include archived spaces or not \nExample: True, False. Default: True',
'team\_id': 'The ID of the team \nExample: \'abc123\', \'def456\''
},
'update\_space': {
'admin\_can\_manage': 'A flag to determine if the administrator can manage the space or not \nExample: True, False. Default: True',
'color': 'The color used for the space \nExample: \'#FF0000\', \'#00FF00\'',
'features\_\_due\_dates\_\_enabled': 'If due date feature is enabled within the space. \nExample: True, False \nDefault: True',
'features\_\_due\_dates\_\_remap\_closed\_due\_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
'features\_\_due\_dates\_\_remap\_due\_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
'features\_\_due\_dates\_\_start\_date': 'If start date feature in due dates is available within the space. Default: False',
'features\_\_time\_tracking\_\_enabled': 'If time tracking feature is available within the space. \nExample: True, False \nDefault: True',
'multiple\_assignees': 'Enable or disable multiple assignees for tasks within the space \nExample: True, False. Default: True',
'name': 'The new name of the space \nExample: \'NewWorkspace\', \'TempWorkspace\'',
'private': 'A flag to determine if the space is private or not \nExample: True, False. Default: False',
'space\_id': 'The ID of the space to update \nExample: \'abc123\', \'def456\''
}
}