Schedule Call

Tool Use Patterns for Production AI Agents

The patterns that make AI agent tool use reliable in production — and the anti-patterns that cause failures.

Tool use is what separates AI agents from chatbots. The ability to call APIs, query databases, and write back to systems is what makes agents useful in operations.

But tool use is also where most production failures happen.

The Core Problem

LLMs use tools by generating function calls. The model decides which tool to call, with which parameters, based on the context. This works remarkably well — until it doesn’t.

Failure modes:

  • Wrong tool selected for the situation
  • Correct tool, wrong parameters
  • Correct call, but called when it shouldn’t be (e.g., a write operation when only read was appropriate)
  • Infinite loops when a tool fails

Each of these is catastrophic in operational contexts. An agent that writes incorrect data back to your ERP doesn’t just fail — it corrupts production data.

Pattern 1: Tool Scope Constraints

Every agent should have the minimum set of tools needed for its task. Not the tools it might need. The tools it definitely needs.

This sounds obvious. It’s not how most teams build.

The tendency is to give agents broad tool access “in case they need it.” This is the equivalent of giving a new employee the keys to every system on day one.

Instead: define the tool scope for each specific task. The triage agent gets read-only tools. The action agent gets write tools, scoped to specific operations.

Pattern 2: Tool Call Validation

Before any write operation executes, validate the parameters. This is a separate validation step, not just schema validation.

For an “update order status” tool, schema validation checks that order_id is a string and status is a valid enum. Business validation checks that the order exists, the status transition is legal, and the agent has permission for this specific order.

The validation layer lives between the LLM’s function call and the actual execution. It can reject calls, request clarification, or escalate to human review.

Pattern 3: Idempotency

Every write tool should be idempotent — calling it twice with the same parameters should have the same effect as calling it once.

Agents retry. Networks fail. Tools get called multiple times. Without idempotency, retries create duplicate operations — duplicate carrier claims, duplicate notifications, duplicate order updates.

Design your tools to be safe to retry. This usually means tracking call IDs and deduplicating on the receiver side.

Pattern 4: Graduated Write Access

Not all write operations are equally risky. Reading carrier rates: low risk. Updating order status: medium risk. Initiating a return to shipper: high risk.

Structure your tools with graduated write access:

  • Tier 1: Safe writes (status updates, annotations)
  • Tier 2: Consequential writes (carrier bookings, customer notifications)
  • Tier 3: High-impact writes (financial adjustments, escalations)

Tier 1 can execute automatically. Tier 2 requires confidence threshold. Tier 3 requires human approval.

Pattern 5: Full Tool Call Logging

Every tool call — successful or failed — should be logged with:

  • Full input parameters
  • Full output (or error)
  • Calling agent and task context
  • Timestamp and latency
  • Associated token cost

This isn’t just for debugging. It’s for auditing. Operations teams need to be able to reconstruct exactly what an agent did and why.

Anti-Patterns to Avoid

Chained writes without checkpoints. If an agent makes 5 write operations in sequence and #4 fails, what’s the state? Design recovery paths for partial execution.

Undifferentiated error handling. A tool that returns “error” for network failures, permission errors, and invalid inputs needs better error taxonomy. The agent needs to know why a call failed to handle it appropriately.

Tools that do too much. A single tool that reads data, applies business logic, and writes a result is hard to test, hard to audit, and hard to retry safely. Keep tools small and focused.

The overhead of building reliable tool infrastructure pays back quickly. The alternative is an agent that works great in testing and causes incidents in production.

Share this article: