HomeExperimentsRetry Backoff

Retry & Backoff Playground

Published Jun 20, 2026
Updated Jun 20, 2026
1 minutes read

Model endpoints fail. They rate-limit you with 429s, they 500 under load, and they time out exactly when you need them most. The reliability question isn't if a call fails — it's what your client does next. The same primitives that keep distributed systems alive apply directly to LLM traffic.

Drag the failure rate up, pick a strategy, and send traffic. Each block is one attempt: green succeeded, amber failed and triggered a retry, red means the circuit breaker fired.

70%
Strategy
Current request — attempt timeline
idle — press “Send traffic”
successretry / failcircuit openin flight
requests0
success0%
p50 wall0ms
p95 wall0ms
Wall-clock per requesttotal 0.0s
no traffic yet

Why the strategy matters

A naive client retries immediately, hammering an already-struggling endpoint and turning a blip into an outage. Backoff spreads attempts out; jitter stops every client in your fleet from retrying in lockstep; the circuit breaker stops calling entirely once an endpoint is clearly down, failing fast instead of burning latency budget.

StrategyBehaviourBest when
FixedConstant delay between retriesTransient, uncorrelated failures
Exponentialbase · 2ⁿ — backs off fastEndpoint is overloaded
Backoff + jitterRandomized exponentialMany clients retrying at once
Circuit breakerFail fast after N failuresEndpoint is down, not flaky

The cost of retries

Retries trade success rate for latency. Watch the p95 wall climb as the failure rate rises — each retry adds the attempt latency plus the backoff wait. At high failure rates, a circuit breaker often wins on tail latency: it stops paying for doomed attempts and lets you fall back to a cheaper model or a cached response.

const wait = Math.round(Math.random() * baseDelay * 2 ** attempt); // full jitter

Numbers to watch

p50 is the typical request; p95 is the tail that defines your SLO. Total wall-clock is what an agent loop actually spends waiting on the model — and the reason a good retry policy is the difference between an agent that feels responsive and one that stalls.