Efficient Tool Calling

AI Agents & Orchestration — Session 3

2026-04-26

A. The Routing Problem

When 50 tools is worse than 5

Agenda

A. The Routing Problem — Why tool selection matters (~10 min)
B. Classifier-Based Routing — LLM categorizes queries (~15 min)
C. Programmatic Tool Calling — Batch tools without round-trips (~15 min)
D. Wrap-up — Key takeaways & lab preview (~5 min)

The Tool Proliferation Problem

Your agent starts with 3 tools. Then it grows:

search_web, search_news, search_academic
get_stock_price, get_financial_report, calculate_roi
read_pdf, summarize_document, extract_tables
send_email, create_calendar_event, query_database

Result: 20+ tools crammed into every context window.

Why More Tools ≠ Better Agent

Context Window Pollution

Every tool schema consumes tokens
20 tools = ~2000 tokens per request
Less space for reasoning and memory
Higher latency, higher cost

Decision Confusion

Model sees too many options
May pick wrong tool
May call multiple similar tools
Inconsistent behavior

The Insight

The agent doesn’t need ALL tools for EVERY query. It only needs the RELEVANT ones.

The Solution: Dynamic Tool Routing

graph LR
    Q["User Query"] --> R["Router"]
    R -->|"Financial"| TF["Finance Tools<br/>(3 tools)"]
    R -->|"Academic"| TA["Academic Tools<br/>(4 tools)"]
    R -->|"General"| TG["General Tools<br/>(5 tools)"]
    TF --> A["Agent"]
    TA --> A
    TG --> A

    style R fill:#1C355E,stroke:#00C9A7,color:white
    style Q fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#FF7A5C,stroke:#1C355E,color:#1C355E

The router selects a subset of tools before the agent runs.

B. Classifier-Based Routing

Using a small LLM to categorize queries

How Classifier Routing Works

sequenceDiagram
    participant U as User
    participant C as Classifier LLM
    participant T as Tool Registry
    participant A as Agent

    U->>C: "What's Apple's stock price?"
    C->>C: Classify domain
    C-->>C: Domain = "Financial"
    C->>T: Get tools for "Financial"
    T-->>C: [get_stock_price, calculate_roi]
    C->>A: Agent + 2 tools
    A->>U: "$178.72"

The Classifier Prompt

A small, fast model (GPT-4o-mini, Claude Haiku) classifies the query:

CLASSIFIER_PROMPT = """Classify this query into ONE domain.

Domains: financial, academic, general, technical

Query: {query}

Respond with ONLY the domain name, nothing else.
"""

Trade-off: Fast and cheap (~$0.0001 per query), but requires predefined categories.

Classifier Router Implementation

class ClassifierRouter:
    def __init__(self, domain_tool_map: dict[str, list[str]]):
        self.domain_tool_map = domain_tool_map
        
    def classify(self, query: str) -> str:
        response = completion(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": CLASSIFIER_PROMPT.format(query=query)}]
        )
        return response.choices[0].message.content.strip().lower()
    
    def select_tools(self, query: str) -> list[Tool]:
        domain = self.classify(query)
        tool_names = self.domain_tool_map.get(domain, [])
        return [registry.get_tool(name) for name in tool_names]

When Classifier Routing Works Well

Good For	Not Good For
Clear domain boundaries	Overlapping domains
Fixed tool categories	Ad-hoc tool additions
Low-latency requirements	Nuanced query understanding
Budget constraints	Cross-domain queries

Rule of thumb: If you can list your domains on one hand, classifier routing is a good starting point.

C. Semantic Routing

When classifiers aren’t enough

From Categories to Similarity

Problem with classifiers: You must define categories upfront.

Semantic routing: Match the query to tools based on meaning, not labels.

Embed each tool’s description at startup
Embed the user query at runtime
Retrieve the top-K most similar tools — inject only those into context

More on RAG in Module 4

The underlying technique (embeddings + similarity search) is covered in depth in Module 4. The same pipeline applies here: swap documents for tool schemas.

The RoutedAgent Pattern

Combine routing with the agent loop for efficient execution:

class RoutedAgent:
    def run(self, query: str) -> str:
        selected_tools = self.router.select_tools(query)   # Route first
        return self.base_agent.run(query, tools=selected_tools)  # Then act

Context Window Savings

Scenario	Tools in Context	Tool Schema Tokens
Standard Agent	20 tools	~2000 tokens
Routed Agent (top-5)	5 tools	~500 tokens
Savings	-15 tools	~1500 tokens

Impact: 75% less token overhead — more room for reasoning, lower latency and cost.

D. Programmatic Tool Calling

Eliminate round-trips for multi-tool workflows

The Round-Trip Problem

Traditional tool calling: N tools = N model round-trips

sequenceDiagram
    participant M as Model
    participant T as Tools
    
    M->>T: Call tool_1
    T-->>M: Result (in context)
    M->>T: Call tool_2
    Note over M,T: ...
    M->>T: Call tool_n
    T-->>M: Result (in context)
    M->>M: Final answer

Note

Each round-trip: latency + token overhead + cost

How Programmatic Tool Calling Works

Claude writes code that calls your tools — no model in the loop:

sequenceDiagram
    participant M as Model
    participant C as Code Execution
    participant T as Tools
    
    M->>C: Write code to call tools
    C->>T: tool_1()
    T-->>C: Result
    Note over C,T: ...
    C->>T: tool_n()
    T-->>C: Result
    C->>C: Process/filter results
    C-->>M: Final output only
    M->>M: Response

Tip

Key insight: Tool results stay in code execution — only final output reaches context

The `allowed_callers` Pattern

Mark tools as callable from code execution:

tools = [
    {"type": "code_execution_20260120", "name": "code_execution"},
    {
        "name": "query_database",
        "description": "Execute a SQL query. Returns JSON.",
        "input_schema": {"type": "object", "properties": {"sql": {"type": "string"}}},
        "allowed_callers": ["code_execution_20260120"],  # <-- Key field
    },
]

`allowed_callers`	Meaning
`["direct"]`	Only model can call (default)
`["code_execution_20260120"]`	Only from code execution
Both	Callable either way

Example: Batch Processing

Query sales for 5 regions, return only the top performer:

# Claude writes this code internally
regions = ["West", "East", "Central", "North", "South"]
results = {}

for region in regions:
    data = await query_database(f"SELECT SUM(revenue) FROM sales WHERE region='{region}'")
    results[region] = data[0]["sum"]

top_region = max(results.items(), key=lambda x: x[1])
print(f"Top region: {top_region[0]} with ${top_region[1]:,}")

What reaches context: "Top region: West with $2,340,000" — not all 5 query results

Token & Latency Gains

Metric	Traditional (5 tools)	Programmatic
Model turns	5 round-trips	1 turn
Tool results in context	All 5	None (filtered)
Latency	~15 seconds	~5 seconds
Tokens consumed	~10,000	~2,000

The Multiplier Effect

Calling 10 tools directly uses ~10x the tokens of calling them programmatically and returning a summary.

Advanced Patterns

Early termination — stop as soon as success criteria met:

for endpoint in ["us-east", "eu-west", "apac"]:
    status = await check_health(endpoint)
    if status == "healthy":
        print(f"Found healthy: {endpoint}")
        break

Conditional tool selection — choose tool based on data:

file_info = await get_file_info(path)
if file_info["size"] < 10000:
    content = await read_full_file(path)
else:
    content = await read_file_summary(path)

When to Use Programmatic Calling

Good For	Less Ideal
3+ dependent tool calls	Single tool call
Large result filtering	Simple responses
Loops over many items	Need user feedback each step
Batch data processing	Very fast single operations

Vendor Note

This pattern is currently Claude-specific (requires their code execution container). The concept generalizes — you can implement client-side with your own sandbox.

Alternative: Self-Managed Execution

Not using Claude? You can implement the pattern yourself:

Give Claude a code execution tool (e.g., Python sandbox)
Describe available functions in that environment
Claude writes code → you execute → return result

# Your code execution tool
{
    "name": "execute_python",
    "description": "Run Python code. Available functions: search(), query_db(), send_email()",
    ...
}

Trade-off: More control, but you manage security and infrastructure.

E. Wrap-up

Key Takeaways

Tool proliferation hurts performance — route to reduce context bloat
Classifier routing uses a small LLM to categorize queries (fast, rigid)
Semantic routing applies RAG on tool descriptions — covered in depth in Module 4
Programmatic calling eliminates round-trips — Claude writes code that calls tools
Combine patterns: route to select tools, programmatic calling to execute efficiently

Lab Preview: Efficient Tool Calling

Part 1: Classifier Router

Implement ClassifierRouter
Map domains to tool sets

Part 2: Semantic Router

Build SemanticToolSelector
Index tool descriptions

Part 3: Routed Agents

Create RoutedAgent wrapper
Compare context usage

Part 4: Programmatic Calling

Explore allowed_callers pattern
Measure token savings

Time: 75 minutes

Questions?

Session 3 Complete

Efficient Tool Calling

A. The Routing Problem

Agenda

The Tool Proliferation Problem

Why More Tools ≠ Better Agent

The Solution: Dynamic Tool Routing

B. Classifier-Based Routing

How Classifier Routing Works

The Classifier Prompt

Classifier Router Implementation

When Classifier Routing Works Well

C. Semantic Routing

From Categories to Similarity

The RoutedAgent Pattern

Context Window Savings

D. Programmatic Tool Calling

The Round-Trip Problem

How Programmatic Tool Calling Works

The allowed_callers Pattern

Example: Batch Processing

Token & Latency Gains

Advanced Patterns

When to Use Programmatic Calling

Alternative: Self-Managed Execution

E. Wrap-up

Key Takeaways

Lab Preview: Efficient Tool Calling

Questions?

The `allowed_callers` Pattern