Retry & Backoff Playground
Model endpoints fail. They rate-limit you with 429s, they 500 under load, and they time out exactly when you need them most. The reliability question isn't if a call fails — it's what your client does next. The same primitives that keep distributed systems alive apply directly to LLM traffic.
Drag the failure rate up, pick a strategy, and send traffic. Each block is one attempt: green succeeded, amber failed and triggered a retry, red means the circuit breaker fired.
Why the strategy matters
A naive client retries immediately, hammering an already-struggling endpoint and turning a blip into an outage. Backoff spreads attempts out; jitter stops every client in your fleet from retrying in lockstep; the circuit breaker stops calling entirely once an endpoint is clearly down, failing fast instead of burning latency budget.
| Strategy | Behaviour | Best when |
|---|---|---|
| Fixed | Constant delay between retries | Transient, uncorrelated failures |
| Exponential | base · 2ⁿ — backs off fast | Endpoint is overloaded |
| Backoff + jitter | Randomized exponential | Many clients retrying at once |
| Circuit breaker | Fail fast after N failures | Endpoint is down, not flaky |
The cost of retries
Retries trade success rate for latency. Watch the p95 wall climb as the failure rate rises — each retry adds the attempt latency plus the backoff wait. At high failure rates, a circuit breaker often wins on tail latency: it stops paying for doomed attempts and lets you fall back to a cheaper model or a cached response.
const wait = Math.round(Math.random() * baseDelay * 2 ** attempt); // full jitterNumbers to watch
p50 is the typical request; p95 is the tail that defines your SLO. Total wall-clock is what an agent loop actually spends waiting on the model — and the reason a good retry policy is the difference between an agent that feels responsive and one that stalls.