Retries
A transient failure - a 502 Bad Gateway, a dropped connection, an upstream timeout - is often gone by the time you try again a moment later. Without retries, that momentary blip surfaces to the storefront as a hard error on the very first attempt.
The Alokai middleware can retry these transient failures automatically for any integration. It's opt-in and works the same way for every integration regardless of transport (Axios, GraphQL, or a native SDK), because it operates on the normalized error after the request fails.
Quick Start
Add a retry field to the integration's config - the same file where its location and extensions already live (middleware.config.ts just composes these configs together):
export const config = {
configuration: { /* ... */ },
extensions: (extensions) => [...extensions /* ... */],
location: '@vsf-enterprise/sapcc-api/server',
retry: true, } satisfies Integration<Config>;
That's it. retry: true applies the default policy: up to 2 retries (3 attempts total) on transient errors, with exponential backoff between attempts. Integrations without a retry config are never retried.
What gets retried
By default, a failed call is retried only when the error is transient - something a retry could plausibly fix. That means one of these status codes (resolved from the normalized error):
| Status | Meaning |
|---|---|
408 | Request Timeout |
429 | Too Many Requests |
502 | Bad Gateway |
503 | Service Unavailable |
504 | Gateway Timeout |
Network and timeout failures (ETIMEDOUT, ECONNRESET, ECONNREFUSED, ENOTFOUND, …) are normalized into this range as well, so they're retried too.
Everything else fails immediately - 4xx business errors like 400 or 404, a deterministic 500, and a circuit breaker that has opened (it surfaces as a non-retryable error so a struggling backend isn't hammered further).
Retries are not method-aware by default
The default policy retries any method on a transient error, including mutations like placeOrder or addCartLineItem. If a write times out after the backend already committed it, a retry can submit it twice. If that matters for your integration, restrict retries with a custom retryCondition (see below).
Customizing the policy
Instead of true, pass an object to override any part of the default policy. Every field is optional.
export const config = {
configuration: { /* ... */ },
extensions: (extensions) => [...extensions /* ... */],
location: '@vsf-enterprise/sapcc-api/server',
retry: {
retries: 3,
retryCondition: (error, context) => { /* ... */ },
retryDelay: (attempt, error) => { /* ... */ },
},
} satisfies Integration<Config>;
| Field | Default | Description |
|---|---|---|
retries | 2 | Number of retries after the initial attempt. 0 disables retrying. |
retryCondition | transient errors | (error, context) => boolean - return true to retry. |
retryDelay | randomized exponential backoff | (attempt, error) => number - delay in milliseconds before the given retry. |
The context passed to retryCondition carries the functionName, integrationName, the optional extensionName, the 1-based attempt number, the configured retries budget, and a request-scoped logger - so you can tailor the decision per method and log why a call was or wasn't retried.
Retry reads only
To avoid retrying mutations, opt in by method name. The error passed to retryCondition has already been normalized to an HttpError, so you can read its statusCode directly. This example only retries transient failures on read-shaped methods:
import { HttpError } from '@alokai/connect/middleware';
const TRANSIENT = new Set([408, 429, 502, 503, 504]);
// inside the integration config:
retry: {
retryCondition: (error, context) =>
/^(get|search|list)/.test(context.functionName) &&
error instanceof HttpError &&
TRANSIENT.has(error.statusCode),
},
Custom backoff
retryDelay receives the attempt number (starting at 1) and returns milliseconds to wait. For a linear delay:
retry: {
retryDelay: (attempt) => attempt * 500, // 500ms, 1000ms, ...
},
Disabling retries
Retries are off unless you opt in, so removing the retry field is enough. You can also set it explicitly to make the intent clear:
retry: false,
Relationship with the circuit breaker
Retries and the circuit breaker work together. Retries handle short-lived blips; the circuit breaker handles sustained outages. The breaker wraps the retry loop, which gives two guarantees:
- A call that recovers within its retry budget counts as a single success - transient failures a retry papers over do not trip the breaker. Only a fully exhausted retry counts as one failure.
- When the breaker is open, it short-circuits immediately, so the request fails fast without retrying.
Because the breaker wraps the whole retry sequence, its timeout now bounds all attempts plus their backoff delays - if you raise retries or the delay substantially, make sure the breaker's timeout leaves room for them.
Observability
Every retry emits a debug log line naming the integration, method, attempt number, and delay, so you can confirm retries are happening and tune the policy from real traffic.
Summary
- Opt-in per integration, off by default
- Retries transient failures (
502/503/504/408/429and network/timeout errors) - Sensible defaults (2 retries, exponential backoff) with full control over count, condition, and delay
- Works for every integration through one shared mechanism
- Plays nicely with the circuit breaker - an open breaker fails fast