Glossary
Tool-RAG / Tool retrieval
Tool-RAG (or tool retrieval) is the application of RAG (Retrieval-Augmented Generation) principles to dynamic tool selection for an AI agent: embed the tool catalogue in a vector store, then retrieve the k most relevant tools for each conversational turn instead of exposing all of them to the model.
Also known as
- tool-RAG
- tool retrieval
- tool retrieval
When an agent has hundreds of tools (getchatsocial.com wires 206 Brandyze MCP tools), exposing the entire catalogue to the model on every turn is counterproductive: (1) token cost explodes, (2) latency increases, (3) the model makes more routing mistakes by confusing similar tools. Tool-RAG solves all 3 problems by exposing only the tools that are semantically relevant to the user's current intent.
Typical architecture: a vector index (pgvector, Qdrant) over tool descriptions + a lightweight re-ranker (BM25 keyword + composite boost). Each turn: embed the query, retrieve the top-k (typically 10–20), union with a Tier-1 always-on set (the strategic composite tools that are always available), then pass this subset to the model.
Tool-RAG has a **security subtlety**: the `activeTools` directive in the Vercel AI SDK is purely a visibility filter — the SDK can still execute an out-of-list tool-call if one appears in the conversation history. A server-side guard that re-validates every execution against the same allow-list is therefore required.
In the getchatsocial.com product
getchatsocial.com uses a hybrid pgvector cosine + BM25 tool-RAG over 206 Brandyze MCP tools. Each turn: top-k=12 plus the TIER1 union (brandyze_* composites always-on). The server guard in lib/mcp/client.ts re-validates every execute() before forwarding to the MCP.
FAQ
At what tool count does tool-RAG become necessary?
Beyond ~30 tools, the model's routing quality starts to degrade noticeably. Above 100, tool-RAG is essentially mandatory.
What top-k size should I use?
Between 8 and 20, depending on intent diversity. 12 is a good starting point. The optimization signal: retrieved-vs-called ratio (how many retrieved tools were actually invoked). Ideal range: 30–50%.