October 16, 2024

Kubernetes Health Checks : Best Practices for Configuring

Tania Duggal
Technical Writer

Hello Everyone, Welcome you all to this article on health Checks in Kubernetes.

In this article, we’ll discuss the types of probes, different types of health checks, how to troubleshoot, and best practices.

We perform Kubernetes health checks through probes. In Kubernetes, probes are mechanisms that monitor the health and status of containers. They ensure that your applications are running correctly and can detect issues to take appropriate actions. 

There are three main types of probes: Liveness, Readiness, and Startup probes:

1. Liveness Probe

The Liveness Probe is like a heartbeat check for your container. Its main job is to ensure that your application is still running and hasn't gone into a deadlock or some other irrecoverable state. If the liveness probe fails, Kubernetes will kill the container, and, depending on your restart policy, it may restart it. This is important for maintaining the health of your application over time.

2. Readiness Probe

The Readiness Probe is all about traffic management. It checks whether your container is ready to handle incoming requests. If the readiness probe fails, Kubernetes will temporarily remove the container from the service's endpoints, meaning it won't receive any traffic until it passes the readiness check again. This ensures that only healthy instances of your application are serving requests. Another important feature of the readiness probe is related to the Rolling Update deployment strategy. Without the probe, a container is considered ready once it's running. With the probe, the update won't progress until the container is actually ready, i.e., passes the readiness probe.

3. Startup Probe

The Startup Probe is designed for applications that take a while to initialize. It checks if your application has started up correctly. If the startup probe fails, Kubernetes will kill the container and may restart it based on the restart policy. This probe is only active during the startup phase of the container, making it ideal for applications with long and unpredictable initialization times. On its success, it enables readiness and liveness. This probe can be seen as a dynamic alternative to the static initialDelaySeconds parameter of the two other probe types.

Types of k8s Health Checks

Kubernetes offers mechanisms to ensure that your applications are running smoothly and can recover from failures. These mechanisms include HTTP requests, commands, TCP connections, and gRPC health checks. Let’s explore readiness probes, which are important for managing traffic and making sure only healthy instances of your application are serving requests:

1. HTTP Requests

HTTP-based readiness probes are used to check if an application is ready to handle incoming requests by sending HTTP GET requests to a specified endpoint. If the endpoint returns a success status code (2xx or 3xx), the container is considered ready to serve traffic. This ensures that the application is fully initialized and ready to process requests before it starts receiving them.

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

In this example, Kubernetes sends an HTTP GET request to /ready on port 8080 every 5 seconds after an initial delay of 5 seconds. If the endpoint returns a success code, the container is considered ready to serve traffic.

2. Commands

Command-based readiness probes execute a command inside the container. If the command returns a zero exit code, the container is considered ready. Otherwise, it is considered not ready. It's important to handle zombie processes when using exec probes. A zombie process occurs when a child process has completed execution, but its parent has not yet read its exit status. This can be managed by ensuring that the parent process calls wait() or waitpid() to retrieve the child's exit status. For more information on handling zombie processes, you can refer this.

apiVersion: v1
kind: Pod
metadata:
  name: readiness-exec
spec:
  containers:
  - name: readiness
    image: registry.k8s.io/busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/ready; sleep 30; rm -f /tmp/ready; sleep 600
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/ready
      initialDelaySeconds: 5
      periodSeconds: 5

In this example, Kubernetes runs the command cat /tmp/ready every 5 seconds after an initial delay of 5 seconds. If the file /tmp/ready exists, the command succeeds, and the container is considered ready.

3. TCP Connections

TCP-based readiness probes check if an application is ready by attempting to open a TCP connection to a specified port. If the connection is successful, the container is considered ready. 

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: registry.k8s.io/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 10

In this example, Kubernetes attempts to open a TCP connection to port 8080 every 10 seconds after an initial delay of 15 seconds. If the connection is successful, the container is considered ready.

4. gRPC

gRPC-based readiness probes use the gRPC Health Checking Protocol to check if an application is ready. If the gRPC endpoint returns a healthy status, the container is considered ready. 

apiVersion: v1
kind: Pod
metadata:
  name: grpc-server
spec:
  containers:
  - name: grpc-server
    image: my-grpc-server:latest
    command: [ "/usr/local/bin/my-grpc-server" ]
    ports:
    - containerPort: 50051
    readinessProbe:
      grpc:
        port: 50051
      initialDelaySeconds: 10
      periodSeconds: 10

In this example, Kubernetes uses the gRPC Health Checking Protocol to check the readiness of the grpc-server container on port 50051 every 10 seconds after an initial delay of 10 seconds. The successThreshold and failureThreshold parameters can be configured to determine how many consecutive successes or failures are required before the container is considered ready or not ready, respectively. The default values are 1 for both.

Using All Types for Kubernetes Health Check

You can combine different types of probes to ensure health checks for your application. For example, you use an HTTP probe for readiness, a command probe for liveness, and a TCP probe for startup.

apiVersion: v1
kind: Pod
metadata:
  name: health-check
spec:
  containers:
  - name: myapp
    image: myapp:latest
    ports:
    - containerPort: 8080
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 10
      periodSeconds: 10
    startupProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 10

The configuration defines:

- The readiness probe uses an HTTP GET request to check if the application is ready to serve traffic.

- The liveness probe uses a command to check if the application is still running correctly.

- The startup probe uses a TCP connection to check if the application has started successfully.

By configuring these probes, you can ensure that your application is robust and can recover from various failure scenarios.

>> Take a look at ultimate guide to how you can keep your k8s clusters lean

Common Troubleshooting Steps

Understanding and effectively troubleshooting health checks is important for maintaining the reliability and performance of your applications. You can troubleshoot health checks to enable faster resolution of issues.

1. Check Container Status: Use the following command to see the status of your containers within pods.

kubectl get pods -n 

This will give you an overview of which pods are running, pending, or failed, and you can drill down to container statuses.

2. Describe the Pod: For detailed information about a specific pod and its containers, use:

kubectl describe pod  -n 

This command provides insights into the pod’s status, including health check failures.

3. Check Container Logs: Logs can provide valuable information about why a container is failing health checks

kubectl logs -f  -n 

Look for error messages that indicate why the application within the container is not healthy.

4. Check Events: Kubernetes events can provide a timeline of what happened to your pods and their containers:

kubectl get events -n 

Events can help you understand the sequence of actions and identify where things went wrong.

5. Resource Limits: Check if the pod is running out of resources (CPU, memory) which might cause it to fail health checks. If it runs out of memory, it will get OOM killed. CPU throttling can cause a probe failure.

6. Configuration Errors: Verify that the health check configurations (endpoints, commands, ports) are correct.

By following these steps and understanding the underlying mechanisms of health checks, you can quickly identify and resolve issues, ensuring that your applications remain healthy and responsive.

Usage Advantages Disadvantages
HTTP Web services with HTTP endpoints Simple to implement, detailed status codes Slower, requires health endpoint
HTTPS Secure communication for web services Secure, ensures data integrity SSL/TLS setup, more resource-intensive
TCP Applications using non-HTTP protocols Simple, lightweight Only checks port, can be misleading
Command Custom health checks Highly customizable, flexible Requires careful handling of exit codes, potential for zombie processes
gRPC gRPC applications Built-in support, detailed health status Requires gRPC Health Checking Protocol implementation, more complex setup


Best practices for Kubernetes health checks

1. When configuring health checks in Kubernetes, it's important to choose the protocol that best suits your application's requirements. HTTP probes are perfect for web services as they can provide detailed status information via health endpoints, although they might be slower. TCP probes are more suitable for applications that don't support either HTTP or gRPC but use other protocols (like databases). Command probes are ideal for custom checks that need to execute specific commands within the container, while gRPC probes are optimal for applications using the gRPC protocol, offering built-in support for health checks. Security is also a key consideration; for example, HTTP probes might need SSL/TLS encryption for secure communication, whereas TCP probes typically do not require authentication or encryption but should be configured to minimize exposure.

2. Enhancing efficiency through connection reuse is beneficial. Utilizing connection pools to reuse existing connections reduces the overhead of establishing new ones. Monitoring and adjusting connection pool settings can ensure optimal performance. Enabling HTTP keep-alive allows the reuse of the same TCP connection for multiple requests, further improving efficiency.

3. For command probes, custom scripts can manage complex health checks that standard probes cannot handle. These scripts should return appropriate exit codes to accurately indicate health status. Using environment variables or command-line arguments can make these scripts configurable and reusable. Documenting custom scripts and storing them in a version-controlled repository ensures easy maintenance and sharing.

4. Utilizing HTTP/2 can offer advantages, including multiplexing, server push, and header compression. Configuring the HTTP/2 server to handle health check requests reliably can enhance the robustness of your health checks. Using HTTP/2 over TLS ensures secure and dependable health checks.

5. Accurate health status indication requires defining appropriate HTTP response codes. Specific codes like 200-400 (OK) or 503 (Service Unavailable) should be used, and these codes should be recognized by your monitoring and alerting systems. Combining HTTP response codes with other metrics, such as response time or error rate, provides a health assessment.

6. To avoid overloading the system, it's important to limit resource consumption. Reducing the frequency of resource-intensive operations like network requests or custom scripts is recommended. Whenever possible, use simpler methods like HTTP or TCP probes.

Note: HTTP calls can be expensive.

7. For worker containers (not serving traffic), consider using a lease file mechanism. This involves touching a file during each iteration of the main loop and checking the timestamp of that file from an exec probe. This method is relatively easy and avoids the overhead of embedding an HTTP server. Writing logs and checking them from the probe also works.

8. Preventing common issues can save a lot of trouble. For example, using TCP health checks for HTTP applications can be misleading, as they might mark the application as healthy based solely on port binding. Implement proper health endpoints that check dependencies for HTTP applications. Always implement readiness checks to prevent the application from receiving traffic prematurely. For databases like Redis, avoid using TCP health checks; instead, use command probes to ensure the database is in the desired state. Lastly, avoid verifying unnecessary dependencies in health checks to prevent cascading failures.

By following these best practices, you can ensure that your Kubernetes health checks are effective and contribute to the overall stability and reliability of your deployments.

As you’ve seen in the above discussion - healthchecks improve your app reliability and uptime. But they can also be potentially expensive, stealing resources from your application logic. In order to get maximum Kubernetes reliability at the lowest possible cost - check out: PerfectScale. PerfectScale is designed to optimize and scale your Kubernetes environments effortlessly, ensuring that your clusters are always running at peak performance.
Our advanced algorithms and machine learning techniques ensure your services are precisely tuned to meet demand, cutting down on waste and optimizing every layer of your K8s stack. Join industry leaders like Paramount Pictures and Creditas who have already optimized their Kubernetes environments with PerfectScale. Sign up or Book a demo to experience the immediate benefits of automated Kubernetes cost optimization and management, ensuring your environment is always perfectly scalable.

PerfectScale Lettermark

Reduce your cloud bill and improve application performance today

Install in minutes and instantly receive actionable intelligence.
Subscribe to our newsletter
Keep your applications running smoothly with Kubernetes health check. Learn the types of probes, types of health checks, how to troubleshoot, and best practices.
This is some text inside of a div block.
This is some text inside of a div block.

About the author

This is some text inside of a div block.
more from this author
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.