Glossary

Tool-RAG / Tool retrieval

Tool-RAG (or tool retrieval) is the application of RAG (Retrieval-Augmented Generation) principles to dynamic tool selection for an AI agent: embed the tool catalogue in a vector store, then retrieve the k most relevant tools for each conversational turn instead of exposing all of them to the model.

Also known as

  • tool-RAG
  • tool retrieval
  • tool retrieval

When an agent has hundreds of tools (getchatsocial.com wires 206 Brandyze MCP tools), exposing the entire catalogue to the model on every turn is counterproductive: (1) token cost explodes, (2) latency increases, (3) the model makes more routing mistakes by confusing similar tools. Tool-RAG solves all 3 problems by exposing only the tools that are semantically relevant to the user's current intent.

Typical architecture: a vector index (pgvector, Qdrant) over tool descriptions + a lightweight re-ranker (BM25 keyword + composite boost). Each turn: embed the query, retrieve the top-k (typically 10–20), union with a Tier-1 always-on set (the strategic composite tools that are always available), then pass this subset to the model.

Tool-RAG has a **security subtlety**: the `activeTools` directive in the Vercel AI SDK is purely a visibility filter — the SDK can still execute an out-of-list tool-call if one appears in the conversation history. A server-side guard that re-validates every execution against the same allow-list is therefore required.

In the getchatsocial.com product

getchatsocial.com uses a hybrid pgvector cosine + BM25 tool-RAG over 206 Brandyze MCP tools. Each turn: top-k=12 plus the TIER1 union (brandyze_* composites always-on). The server guard in lib/mcp/client.ts re-validates every execute() before forwarding to the MCP.

FAQ

  • At what tool count does tool-RAG become necessary?

    Beyond ~30 tools, the model's routing quality starts to degrade noticeably. Above 100, tool-RAG is essentially mandatory.

  • What top-k size should I use?

    Between 8 and 20, depending on intent diversity. 12 is a good starting point. The optimization signal: retrieved-vs-called ratio (how many retrieved tools were actually invoked). Ideal range: 30–50%.