Function Calling - ihower's Notes

* OpenAI 發表時的 Blog https://openai.com/blog/function-calling-and-other-api-updates (2023/6/13) * OpenAI 官方文件: https://platform.openai.com/docs/guides/function-calling * Claude Tool use 官方文件: https://docs.anthropic.com/en/docs/build-with-claude/tool-use * Gemini 官方文件: https://ai.google.dev/gemini-api/docs/function-calling * Gemma 3 Function Calling 範例: https://www.philschmid.de/gemma-function-calling * Azure 文件不錯 https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-calling?tabs=python * 在你的函數定義中提供更多細節 * 在系統訊息中提供更多的上下文 * 指示模型提出澄清問題 * 如果您發現模型生成了未提供的函數呼叫，請嘗試在系統訊息中包含一個句子，說明如下： "Only use the functions you have been provided with." * 探索打造 LLM agentic system 的各種可能性 (2024/12/29) * https://axk51013.medium.com/human-agent-computer-interaction-design-%E6%8E%A2%E7%B4%A2%E6%89%93%E9%80%A0-llm-agentic-system-%E7%9A%84%E5%90%84%E7%A8%AE%E5%8F%AF%E8%83%BD%E6%80%A7-c179b5521761 * paper: Re-Invoke https://arxiv.org/abs/2408.01875 * 基於每一個 tool document 生成對應這個 tool 可能會面對的 hypothetical query，每一次 user query 進來再跟這些 hypothetical query 做比對 * paper: CodeAct https://arxiv.org/abs/2402.01030 * 盡量避免 tool 設計成模型要來回使用多次才能達成一個任務。也就是說我們往往需要捨棄 tool 的使用彈性，而以 end to end 的角度來設計 tool * paper: DRAFT https://arxiv.org/abs/2410.08197 * 基於每一次 LLM 使用 tool 的經驗，去 improve tool description，移除多餘或錯誤的資訊，補充必要資訊及範例。 * Chip Huyen 的 Agents (2024/1/7) * https://huyenchip.com/2025/01/07/agents.html * 針對 function calling 有內容 - Function Calling is All You Need — Full Workshop, with Ilan Bigio of OpenAI (2025/4) - https://www.youtube.com/watch?v=KUEmEb71vzQ ## 推理模型**的** Function Calling - [https://cookbook.openai.com/examples/reasoning_function_calls](https://cookbook.openai.com/examples/reasoning_function_calls) (2025/4/25) - [https://cookbook.openai.com/examples/responses_api/reasoning_items](https://cookbook.openai.com/examples/responses_api/reasoning_items) (2025/5/11) - [https://cookbook.openai.com/examples/o-series/o3o4-mini_prompting_guide](https://cookbook.openai.com/examples/o-series/o3o4-mini_prompting_guide) (2025/5/26) ## 評估 * Berkeley Function-Calling Leaderboard :https://gorilla.cs.berkeley.edu/leaderboard.html * https://twitter.com/shishirpatil_/status/1774928279599972822 (2024/4/2) * v3 https://x.com/shishirpatil_/status/1837205152132153803 (2024/9/21) * BFCL V3 • Multi-Turn & Multi-Step Function Calling Evaluation * https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html * ToolTalk: https://github.com/microsoft/ToolTalk * https://txt.cohere.com/command-r-plus-microsoft-azure/ * GAIA: A Benchmark for General AI Assistants (2023/11) * https://arxiv.org/abs/2311.12983 * ComplexFuncBench (2025/1) * https://huggingface.co/papers/2501.10132 * https://x.com/_philschmid/status/1883055262669349287 * bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains (2024/6) * https://arxiv.org/abs/2406.12045 * https://hal.cs.princeton.edu/#leaderboards 有比較新的 TAU 評測結果 * 𝜏-Bench: Benchmarking AI agents for the real-world * https://sierra.ai/blog/benchmarking-ai-agents ## 大量工具需求 * bigtool https://github.com/langchain-ai/langgraph-bigtool * https://x.com/LangChainAI/status/1905302614218305891 (2205/3/28) * RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation (2025/5) * https://arxiv.org/abs/2505.03275v1