Efficient Tool Calling

AI Agents & Orchestration — Session 3

2026-04-26

A. The Routing Problem

When 50 tools is worse than 5

Agenda

  • A. The Routing Problem — Why tool selection matters (~10 min)
  • B. Classifier-Based Routing — LLM categorizes queries (~15 min)
  • C. Programmatic Tool Calling — Batch tools without round-trips (~15 min)
  • D. Wrap-up — Key takeaways & lab preview (~5 min)

The Tool Proliferation Problem

Your agent starts with 3 tools. Then it grows:

  • search_web, search_news, search_academic
  • get_stock_price, get_financial_report, calculate_roi
  • read_pdf, summarize_document, extract_tables
  • send_email, create_calendar_event, query_database

Result: 20+ tools crammed into every context window.

Why More Tools ≠ Better Agent

Context Window Pollution

  • Every tool schema consumes tokens
  • 20 tools = ~2000 tokens per request
  • Less space for reasoning and memory
  • Higher latency, higher cost

Decision Confusion

  • Model sees too many options
  • May pick wrong tool
  • May call multiple similar tools
  • Inconsistent behavior

The Insight

The agent doesn’t need ALL tools for EVERY query. It only needs the RELEVANT ones.

The Solution: Dynamic Tool Routing

graph LR
    Q["User Query"] --> R["Router"]
    R -->|"Financial"| TF["Finance Tools<br/>(3 tools)"]
    R -->|"Academic"| TA["Academic Tools<br/>(4 tools)"]
    R -->|"General"| TG["General Tools<br/>(5 tools)"]
    TF --> A["Agent"]
    TA --> A
    TG --> A

    style R fill:#1C355E,stroke:#00C9A7,color:white
    style Q fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#FF7A5C,stroke:#1C355E,color:#1C355E

The router selects a subset of tools before the agent runs.

B. Classifier-Based Routing

Using a small LLM to categorize queries

How Classifier Routing Works

sequenceDiagram
    participant U as User
    participant C as Classifier LLM
    participant T as Tool Registry
    participant A as Agent

    U->>C: "What's Apple's stock price?"
    C->>C: Classify domain
    C-->>C: Domain = "Financial"
    C->>T: Get tools for "Financial"
    T-->>C: [get_stock_price, calculate_roi]
    C->>A: Agent + 2 tools
    A->>U: "$178.72"

The Classifier Prompt

A small, fast model (GPT-4o-mini, Claude Haiku) classifies the query:

CLASSIFIER_PROMPT = """Classify this query into ONE domain.

Domains: financial, academic, general, technical

Query: {query}

Respond with ONLY the domain name, nothing else.
"""

Trade-off: Fast and cheap (~$0.0001 per query), but requires predefined categories.

Classifier Router Implementation

class ClassifierRouter:
    def __init__(self, domain_tool_map: dict[str, list[str]]):
        self.domain_tool_map = domain_tool_map
        
    def classify(self, query: str) -> str:
        response = completion(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": CLASSIFIER_PROMPT.format(query=query)}]
        )
        return response.choices[0].message.content.strip().lower()
    
    def select_tools(self, query: str) -> list[Tool]:
        domain = self.classify(query)
        tool_names = self.domain_tool_map.get(domain, [])
        return [registry.get_tool(name) for name in tool_names]

When Classifier Routing Works Well

Good For Not Good For
Clear domain boundaries Overlapping domains
Fixed tool categories Ad-hoc tool additions
Low-latency requirements Nuanced query understanding
Budget constraints Cross-domain queries

Rule of thumb: If you can list your domains on one hand, classifier routing is a good starting point.

C. Semantic Routing

When classifiers aren’t enough

From Categories to Similarity

Problem with classifiers: You must define categories upfront.

Semantic routing: Match the query to tools based on meaning, not labels.

  • Embed each tool’s description at startup
  • Embed the user query at runtime
  • Retrieve the top-K most similar tools — inject only those into context

More on RAG in Module 4

The underlying technique (embeddings + similarity search) is covered in depth in Module 4. The same pipeline applies here: swap documents for tool schemas.

The RoutedAgent Pattern

Combine routing with the agent loop for efficient execution:

class RoutedAgent:
    def run(self, query: str) -> str:
        selected_tools = self.router.select_tools(query)   # Route first
        return self.base_agent.run(query, tools=selected_tools)  # Then act

Context Window Savings

Scenario Tools in Context Tool Schema Tokens
Standard Agent 20 tools ~2000 tokens
Routed Agent (top-5) 5 tools ~500 tokens
Savings -15 tools ~1500 tokens

Impact: 75% less token overhead — more room for reasoning, lower latency and cost.

D. Programmatic Tool Calling

Eliminate round-trips for multi-tool workflows

The Round-Trip Problem

Traditional tool calling: N tools = N model round-trips

sequenceDiagram
    participant M as Model
    participant T as Tools
    
    M->>T: Call tool_1
    T-->>M: Result (in context)
    M->>T: Call tool_2
    Note over M,T: ...
    M->>T: Call tool_n
    T-->>M: Result (in context)
    M->>M: Final answer

Note

Each round-trip: latency + token overhead + cost

How Programmatic Tool Calling Works

Claude writes code that calls your tools — no model in the loop:

sequenceDiagram
    participant M as Model
    participant C as Code Execution
    participant T as Tools
    
    M->>C: Write code to call tools
    C->>T: tool_1()
    T-->>C: Result
    Note over C,T: ...
    C->>T: tool_n()
    T-->>C: Result
    C->>C: Process/filter results
    C-->>M: Final output only
    M->>M: Response

Tip

Key insight: Tool results stay in code execution — only final output reaches context

The allowed_callers Pattern

Mark tools as callable from code execution:

tools = [
    {"type": "code_execution_20260120", "name": "code_execution"},
    {
        "name": "query_database",
        "description": "Execute a SQL query. Returns JSON.",
        "input_schema": {"type": "object", "properties": {"sql": {"type": "string"}}},
        "allowed_callers": ["code_execution_20260120"],  # <-- Key field
    },
]
allowed_callers Meaning
["direct"] Only model can call (default)
["code_execution_20260120"] Only from code execution
Both Callable either way

Example: Batch Processing

Query sales for 5 regions, return only the top performer:

# Claude writes this code internally
regions = ["West", "East", "Central", "North", "South"]
results = {}

for region in regions:
    data = await query_database(f"SELECT SUM(revenue) FROM sales WHERE region='{region}'")
    results[region] = data[0]["sum"]

top_region = max(results.items(), key=lambda x: x[1])
print(f"Top region: {top_region[0]} with ${top_region[1]:,}")

What reaches context: "Top region: West with $2,340,000" — not all 5 query results

Token & Latency Gains

Metric Traditional (5 tools) Programmatic
Model turns 5 round-trips 1 turn
Tool results in context All 5 None (filtered)
Latency ~15 seconds ~5 seconds
Tokens consumed ~10,000 ~2,000

The Multiplier Effect

Calling 10 tools directly uses ~10x the tokens of calling them programmatically and returning a summary.

Advanced Patterns

Early termination — stop as soon as success criteria met:

for endpoint in ["us-east", "eu-west", "apac"]:
    status = await check_health(endpoint)
    if status == "healthy":
        print(f"Found healthy: {endpoint}")
        break

Conditional tool selection — choose tool based on data:

file_info = await get_file_info(path)
if file_info["size"] < 10000:
    content = await read_full_file(path)
else:
    content = await read_file_summary(path)

When to Use Programmatic Calling

Good For Less Ideal
3+ dependent tool calls Single tool call
Large result filtering Simple responses
Loops over many items Need user feedback each step
Batch data processing Very fast single operations

Vendor Note

This pattern is currently Claude-specific (requires their code execution container). The concept generalizes — you can implement client-side with your own sandbox.

Alternative: Self-Managed Execution

Not using Claude? You can implement the pattern yourself:

  1. Give Claude a code execution tool (e.g., Python sandbox)
  2. Describe available functions in that environment
  3. Claude writes code → you execute → return result
# Your code execution tool
{
    "name": "execute_python",
    "description": "Run Python code. Available functions: search(), query_db(), send_email()",
    ...
}

Trade-off: More control, but you manage security and infrastructure.

E. Wrap-up

Key Takeaways

  1. Tool proliferation hurts performance — route to reduce context bloat
  2. Classifier routing uses a small LLM to categorize queries (fast, rigid)
  3. Semantic routing applies RAG on tool descriptions — covered in depth in Module 4
  4. Programmatic calling eliminates round-trips — Claude writes code that calls tools
  5. Combine patterns: route to select tools, programmatic calling to execute efficiently

Lab Preview: Efficient Tool Calling

Part 1: Classifier Router

  • Implement ClassifierRouter
  • Map domains to tool sets

Part 2: Semantic Router

  • Build SemanticToolSelector
  • Index tool descriptions

Part 3: Routed Agents

  • Create RoutedAgent wrapper
  • Compare context usage

Part 4: Programmatic Calling

  • Explore allowed_callers pattern
  • Measure token savings

Time: 75 minutes

Questions?

Session 3 Complete