Kubernetes Probes

Alokai Cloud customers' middleware and frontend apps are deployed in Kubernetes. This document explains the Kubernetes mechanisms used by Alokai Cloud to ensure that customers' applications are starting, running and exiting correctly. Implementing your application so that it works in accordance with those mechanisms ensures better detectability of issues and lower error rates of your deployment.

In order for some of those mechanisms to work, you - as the application developer - need to ensure certain REST endpoints exist in your application and that they respond with the correct status code. The sections below explain how to implement those endpoints.

Liveness probes

The purpose of liveness probes

Without any kind of probes set up for your application, the only failure scenario where your app will be considered as not working correctly is if the process exits on startup or any time during the application runtime. For example, if you run npm start but the application secrets are missing from the environment, most applications' processes will immediately exit. The operating system could also kill the process due to lack of memory.

Liveness probes can help you recover the application from a broken state when only restart can solve the problem. The application gets probed every few seconds to determine if the server can respond. If a timeout or error response is received instead of a success status code, the app is considered to be in a dead state (e.g. caught in an infinite loop due to an edge-case and developer error) and gets restarted.

This helps your application keep handling traffic despite an issue that causes it to lock, which gives you time to fix the underlying issue without impacting company operations (as much).

Implementation of liveness probes in your application

In order for your application to be covered by liveness probes, you need to ensure that it responds with a HTTP 200 OK status code to a GET request on a [your app URL]/healthz endpoint.

In the case of the Alokai middleware - as of version 3.0.0 of the @vue-storefront/middleware package, a liveness probe is enabled by default. You can launch the middleware locally and send a GET request to http://localhost:4000/healthz endpoint. The response will be a HTTP 200 OK containing the body "ok".

In the case of the frontend apps, /healthz endpoints are also present in Nuxt and Next templates generated from the Alokai CLI. If you will be deploying your own custom fronted app to Alokai Cloud, and your app is not generated from Alokai CLI, you will need to add the /healthz endpoint manually.

The below instructions help you create a simple /healthz endpoint that responds with a HTTP 200 OK status code and the text ok in the response body. You may be tempted to instead make /healthz a real application route in your app, which if queried will respond with the HTML of your actual application. At first glance it may seem more robust, as in theory it tests a larger part of your application stack.

In reality, requesting a full app route as part of a liveness check - especially during periods of heavy traffic - can lead to new connection being opened that will never be resolved. A full page app can take hundreds of miliseconds to respond and contain a few kilobytes of payload. The simple route described below will respond in a few miliseconds with a two byte body size.

Do not make /healthz or /readyz a full app route. Instead keep it as simple as possible, by following the instructions below. Do not consider liveness probes as a fully fledged application smoke test, but as a simple check if the app can serve a basic HTTP request.

Nuxt

In Nuxt, to create a /healthz endpoint, use server endpoints. Create a [Nuxt fronted app folder]/server/routes/healthz.ts file with the following contents:

server/routes/healthz.ts

export default defineEventHandler(() => 'ok');

Next: App router

If using Next.js with app router, use route handlers:

Create a [Next app directory]/app/healthz/route.ts file

Paste the following content inside

app/healthz/route.ts

import { NextResponse } from 'next/server';

export function GET() {
  return NextResponse.json({ status: 'ok' }, { status: 200 });
}

Next: Pages router

If using Next.js with pages router, use rewrites together with API routes:

Add the following code to your next.config.js:

next.config.js

module.exports = {
  // ...
  async rewrites() {
    return [
      {
        source: '/healthz',
        destination: '/api/healthz',
      },
    ];
  },
}

This is necessary because by default Next's API routes have an /api subpath, but we need it to be /healthz and not /api/healthz.

Create a [Next app directory]/pages/api/healthz.ts file

Paste the following content inside:

pages/api/healthz.ts

import { NextApiRequest, NextApiResponse } from 'next';

export default function handler(_req: NextApiRequest, res: NextApiResponse) {
  res.status(200).send('ok');
}

Readiness probes

The purpose of readiness probes

Applications in Kubernetes are often hosted in such a way that multiple duplicate instances of an application exist side-by-side simultaneously. Such a duplicate instance is called a replica. Readiness probes allow an application replica to temporarily mark itself as unable to serve requests in a Kubernetes cluster. A liveness probe can pass while a readiness probe fails - meaning that in general, the application is up, but is still waiting for something to happen so that it can serve requests (e.g. waiting for some secondary, dependent service to become online, like Redis cache).

The /readyz endpoint of your application is queried automatically by Alokai Cloud every few seconds to check whether requests should be routed to the queried application replica. One such case - where traffic will should stop directed to a replica - is if an application instance is being killed (if it receives a SIGTERM signal).

You can read more about Kubernetes readiness probes in the official documentation.

Built-in middleware readiness probes

As of version 5.0.0 of the @vue-storefront/middleware package, you can launch the middleware locally and send a GET request to the http://localhost:4000/readyz endpoint. The response will contain either a success message or a list of errors describing why the readiness probe failed.

To add custom readiness probes to the built-in @vue-storefront/middleware readiness probe feature, pass them to the readinessProbes property when calling createServer.

const customReadinessProbe = async () => { 
  const dependentServiceRunning = await axios.get('http://someservice:3000/healthz');
  if(dependentServiceRunning.status !== 200) {
    throw new Error('Service that the middleware depends on is offline. The middleware is temporarily not ready to accept connections.')
  }
}
const app = await createServer(config, { readinessProbes: [customReadinessProbe]});

In order for custom readiness probes to be implemented correctly, they need to do two things:

they must all be async or return a promise (the return value is not checked, it's expected to be void/undefined)
they must all throw an exception when you want a readiness probe to fail

Implementation of readiness probes in your own application

Readiness probes are more difficult to implement than liveness probes. In liveness probes, a simple stateless REST endpoint handler was sufficient. Readiness probes, on the other hand, need to monitor the signals that the application process received. In addition to that, you can write your own readiness conditions, such as checking if an external service that your application depends on is online.

If your application uses Node's http module to serve HTTP requests, you can use the @godaddy/terminus NPM package to implement readiness checks.