Retry & Backoff Playground

Published Jun 20, 2026

⋅

Updated Jul 5, 2026

⋅

1 minutes read

Model endpoints fail. They rate-limit you with 429s, they 500 under load, and they time out exactly when you need them most. The reliability question isn't if a call fails — it's what your client does next. The same primitives that keep distributed systems alive apply directly to LLM traffic.

Drag the failure rate up, pick a strategy, and send traffic. Each block is one attempt: green succeeded, amber failed and triggered a retry, red means the circuit breaker fired.

Endpoint failure rate (429/500)70%

Strategy

Current request — attempt timeline

idle — press “Send traffic”

successretry / failcircuit openin flight

requests0

success0%

p50 wall0ms

p95 wall0ms

Wall-clock per requesttotal 0.0s

no traffic yet

Why the strategy matters

A naive client retries immediately, hammering an already-struggling endpoint and turning a blip into an outage. Backoff spreads attempts out; jitter stops every client in your fleet from retrying in lockstep; the circuit breaker stops calling entirely once an endpoint is clearly down, failing fast instead of burning latency budget.

Strategy	Behaviour	Best when
Fixed	Constant delay between retries	Transient, uncorrelated failures
Exponential	`base · 2ⁿ` — backs off fast	Endpoint is overloaded
Backoff + jitter	Randomized exponential	Many clients retrying at once
Circuit breaker	Fail fast after N failures	Endpoint is down, not flaky

The cost of retries

Retries trade success rate for latency. Watch the p95 wall climb as the failure rate rises — each retry adds the attempt latency plus the backoff wait. At high failure rates, a circuit breaker often wins on tail latency: it stops paying for doomed attempts and lets you fall back to a cheaper model or a cached response.

const wait = Math.round(Math.random() * baseDelay * 2 ** attempt); // full jitter

Numbers to watch

p50 is the typical request; p95 is the tail that defines your SLO. Total wall-clock is what an agent loop actually spends waiting on the model — and the reason a good retry policy is the difference between an agent that feels responsive and one that stalls.

PreviousList Transitions