Logging & Troubleshooting
This page explains how to read circuit breaker logs and what to do when you see them. Each log line tells you two things: what state the breaker is in, and whether your traffic is being blocked right now.
Reading the logs
Every circuit breaker log carries an alokai.circuitBreaker.impact field so you can tell, at a glance, whether the breaker is just watching or actively blocking:
| Log | Severity | impact | What it means |
|---|---|---|---|
observed an upstream error | warning | observing | The upstream returned an error. The breaker did not block anything - traffic is still flowing. It is only counting errors toward the trip threshold. |
OPEN | error | blocking | The error rate crossed the threshold. The breaker is now blocking requests and failing them fast without calling the upstream. |
blocked a request | warning | blocking | A request was blocked because the breaker is open. |
HALF-OPEN | info | probing | The breaker is testing recovery by letting one trial request through. |
CLOSED | info | recovered | The upstream recovered and normal traffic has resumed. |
Each log also includes a recommendedAction field (investigate_upstream or none) for filtering. failure and reject logs are throttled to reduce noise. All rates and counts in the logs are measured over the breaker's rolling window (rollingCountTimeout), not since startup.
A "failure" log does not mean the breaker tripped
The observed an upstream error log is logged at warning, not error. It fires on a normal pass-through upstream error while the breaker is still closed and serving traffic. The breaker only causes request failures when it is OPEN - that is the one log to alert on.
Troubleshooting
The breaker is open ("Upstream temporarily unavailable")
The upstream had too many failures or timed out, so the breaker tripped to protect it. Requests fail fast while it's open.
Investigate the upstream - check its status, credentials, rate limits, and network. No restart or redeploy is needed: the breaker retries the upstream automatically after the reset timeout and resumes traffic once it recovers. If it keeps reopening, the upstream is still unhealthy.
Error rate approaching the threshold
When the error rate climbs toward the trip threshold, the failure log calls it out. Investigate the upstream now to prevent the breaker from opening and blocking traffic.
Too many rejects
Try:
- lowering the error threshold
- reducing the timeout
- switching to a more tolerant preset (
TOLERANT,RELAXED_DEBUG)
Not tripping when expected
Only infrastructure failures count. This includes 5xx errors, network issues, 408 (Request Timeout), and 429 (Too Many Requests). Other 4xx business errors are intentionally ignored.