Tool-RAG / Tool retrieval — Definition

When an agent has hundreds of tools (getchatsocial.com wires 206 Brandyze MCP tools), exposing the entire catalogue to the model on every turn is counterproductive: (1) token cost explodes, (2) latency increases, (3) the model makes more routing mistakes by confusing similar tools. Tool-RAG solves all 3 problems by exposing only the tools that are semantically relevant to the user's current intent.

Typical architecture: a vector index (pgvector, Qdrant) over tool descriptions + a lightweight re-ranker (BM25 keyword + composite boost). Each turn: embed the query, retrieve the top-k (typically 10–20), union with a Tier-1 always-on set (the strategic composite tools that are always available), then pass this subset to the model.

Tool-RAG has a **security subtlety**: the `activeTools` directive in the Vercel AI SDK is purely a visibility filter — the SDK can still execute an out-of-list tool-call if one appears in the conversation history. A server-side guard that re-validates every execution against the same allow-list is therefore required.

FAQ

At what tool count does tool-RAG become necessary?

Beyond ~30 tools, the model's routing quality starts to degrade noticeably. Above 100, tool-RAG is essentially mandatory.

What top-k size should I use?

Between 8 and 20, depending on intent diversity. 12 is a good starting point. The optimization signal: retrieved-vs-called ratio (how many retrieved tools were actually invoked). Ideal range: 30–50%.