How to fix OOMKilled in Kubernetes

OOMKilled disrupting K8s workloads?

Learn what causes Kubernetes OOM issues, how to troubleshoot them, and best practices to prevent OOMKilled events for optimal container performance.

Here are some of the main errors you might encounter in Kubernetes, and quick advice for how to resolve them.

Memory Management in Kubernetes

Memory management is a crucial concept in Kubernetes, ensuring that applications run smoothly without exhausting system resources. Each node has a set of CPUs and memory available. Each pod requires a set of resources to run. When a pod is placed on a node, it consumes the resources available on that node. The Kube-scheduler identifies the best node to place a pod on if memory requests are specified. When they are not, the requests are considered by the scheduler to be 0b of memory. If there are no sufficient resources on any node, then the pod will be in a pending state.

To manage memory in K8s, you must mention two parameters in the manifest, i.e., requests and limits.

Requests: The minimum amount of memory requested by the container.

Limits: It is the maximum RAM usage used by the container.

apiVersion: v1
kind: Pod
metadata:  
	name: simple-pod
spec:  
 containers:  
  - name: simple-container    
    image: nginx:latest    
    resources:      
    requests:        
    	memory: "256Mi"        
    	cpu: "250m"      
    limits:        
    	memory: "512Mi"        
      cpu: "500m"

What is OOMKilled?

OOMKilled is an event in Kubernetes that occurs when a container tries to exceed its memory limit beyond what is defined in the manifest or attempts to consume more resources on a node that is not available.

In this situation, the OOMKilled status is shown.

NAME         READY     STATUS        RESTARTS     AGE
test-pod     0/1       OOMKilled     0            5m9s

‍

The memory consumption used by all pods on the nodes should be less than the node's available memory. Otherwise, Kubernetes terminates some pods to stabilize the node's memory.

‍Learn more about node out-of-memory behavior.

OOMKilled is not a feature of Kubernetes but a feature of Linux's OOMKiller mechanism. It is a process that becomes active when the available memory in the system is exhausted. This process terminates the programs that consume the excess memory. Its main objective is to keep the system stable. The reason for introducing the OOMKiller in Kubernetes is that swap isn't enabled by default to avoid performance degradation. By not relying on swap space, Kubernetes ensures more consistent performance and predictable resource allocation, which helps maintain the stability and reliability of the applications running in the cluster.

Common Causes of OOMKilled

There can be different causes of OOMKilled.

Let’s discuss:

Misconfigured Memory Limits: The most common cause of OOMKilled is when you misconfigure the memory limits. Always verify the memory requirements for your application. If your application needs more memory than is allocated, it tries to consume more memory, eventually leading to the OOMKilled events.

Misconfigured Memory Requests: One common cause of OOMKilled events in Kubernetes is misconfigured memory requests. When memory requests are set too low, the Kubernetes scheduler may not allocate sufficient memory for the pod, leading to frequent restarts or crashes. Conversely, setting memory requests too high can result in inefficient resource utilization, preventing other pods from being scheduled.

‍
Java applications, in particular, can exacerbate these issues due to their unique memory management requirements. The Java Virtual Machine (JVM) uses parameters like `-Xms` (initial heap size) and `-Xmx` (maximum heap size) to manage memory. If `-Xms` is set too high, the JVM will allocate a large amount of memory at startup, potentially causing OOM errors if the Kubernetes memory request is not set accordingly. Similarly, if `-Xmx` exceeds the Kubernetes memory limit, the pod will be terminated when it tries to allocate more memory than allowed. Properly aligning `-Xms` with memory requests and `-Xmx` with memory limits, along with continuous monitoring and adjustment, can help avoid these issues.

Memory Leaks in Applications: When an application or a process does not release the allocated memory when it is not needed, the gradual memory usage increases and exhausts the system's memory, triggering the OOMKilled events.

Node Memory Pressure: When a Kubernetes node is under memory pressure, it means that the node's available memory is running low while the pods scheduled on it are consuming more memory than anticipated. This situation often arises when too many pods are scheduled on a single node, leading to an overcommitment of resources. One common cause of this issue is the absence of properly set memory requests for the pods. Memory requests inform the Kubernetes scheduler about the minimum amount of memory a pod needs to function correctly. Without these requests, the scheduler assumes that the pod requires zero memory, which can result in the node being overcommitted with more pods than it can handle. Consequently, when the node runs out of memory, it triggers Out of Memory (OOM) events, causing the pods to be terminated with OOMKilled errors.

Unbounded Resource Consumption: Unbounded Resource Consumption can be another reason for OOMKilled events. It happens when any application or process consumes unlimited resources. This can be due to not setting the memory limits altogether or a bug in your application that leads to unnecessary consumption.

How to fix OOMKilled in Kubernetes?

To diagnose and debug the OOMKilled error, follow the below steps:

Inspecting Logs and Events: To examine the problem properly, you can check the logs and events of your pod. Events provide information about exactly what happened.

To confirm the OOMKilled status, run the kubectl describe pod <pod-name> command:

Events:  
Type     Reason     Age                From               
Message  ----     ------     ----               ----               -------  
Warning  OOMKilled  1m                 kubelet            
Container test-container in pod test-pod was killed due to out of memory

Logs might not provide detailed information in such cases because the OOM killer sends a SIGKILL signal, causing the process to die immediately without the chance to log any final messages. However, you can still check the logs for any preceding information by running kubectl logs --previous <pod-name> -c <container-name>.

After examining the logs and events, it is clear why the pod is not functioning correctly. The events indicate an "Out of memory error," which aligns with the OOMKilled status. This situation typically occurs when the node is under memory pressure, leading to the termination of processes that exceed their memory limits. To address this issue, measurable steps such as setting appropriate memory requests and limits for the pods should be taken. This ensures that the Kubernetes scheduler can make informed decisions about pod placement, preventing overcommitment of resources, and maintaining node stability.

Examining Resource Quotas and Limits: Always check the Resource Quotas and Limits that you have set for your pods. Resource Quotas can be set at the namespace level, and limits can be set at the container level. Resource Quotas define how many resources can be occupied by all pods in the namespace, but limits are set for each container within a pod.

If you find pods constantly consuming memory, you can inspect the resource Quotas and Limits defined in your manifest. Always check the memory usage of the pods and ensure they do not exceed the limits.

To check the resource quotas set for a namespace, you can use the following command:

 kubectl get resourcequota --namespace=namespace-name

‍
To check the resource limits set for a specific pod, you can describe the pod using:

kubectl describe pod  --namespace=namespace-name

‍
It's time to look at the application code if the above steps don't provide a clear picture. Look at that section of code that is consuming more memory, where memory is allocated but not released. This can be due to a bug, a memory leak, or inefficient algorithms in your code.

Ensure that caching mechanisms have proper eviction policies to prevent unbounded memory growth. You can use memory profiling tools to identify memory patterns and leaks. Some memory profiling tools include kubectl-flame

How can I Prevent OOMKilled errors in Kubernetes?

Some best practices can help in preventing the OOMKillled error:

Properly Setting Memory Requests and Limits: You have four cases to set requests and limits for your pods:

No Requests No Limits: If there are no requests or limits, one pod can consume all the resources on the node and prevent the other pods from requiring resources. This is not an ideal case.

No Requests but Have Limits: In this case, k8s automatically sets the requests the same as limits, and each pod has guaranteed resources as limits.

Having Requests and Limits: If both requests and limits are set, each pod gets a guaranteed number of resources and can go up to the defined limit.

Having Requests but No Limits: Each pod can get guaranteed resources as requests are set and can go up to many resources as no limits are set.

You can choose any case of setting requests and limits for your pod according to your requirements, but it's always preferred to set requests and limits on your pods so that they can't use more or fewer resources than needed. If you don't mention limits, it can consume unlimited resources, leading to the OOMKilled event, and if you mention the fewer limits that are required by the pod to work, that also will not work. Always look for a balance between the resources.
Refer to the Pod QoS model for more details.

‍Monitoring and Alerting: The best practice is always to set the monitoring system for your cluster. You can use tools like Prometheus and Grafana to monitor memory usage and set up alerts for high memory usage. You can regularly analyze the potential issues and gain detailed insight into your cluster's performance.

‍Implementing Resource Quotas: It's always good to implement Resource Quotas at the namespace level. With the help of this, you can limit the amount of resources or memory used by each namespace and prevent a single namespace from consuming all resources. This helps in preventing OOMKilled events by ensuring that no single namespace can exhaust the cluster's resources. This is how you can set Resource Quota:

apiVersion: v1
kind: ResourceQuota
metadata:  
	name: memory-quota  
  namespace: dev
 spec:  
 	hard:    
  	requests.memory: "4Gi"    
    limits.memory: "8Gi"

‍
Code Optimization and Testing: Regularly reviewing the code, ensuring all memory is properly released after use, implementing proper cache eviction policies, and setting limits on the size of caches can help prevent OOMKilled events.

Perform different types of testing on your application, observe memory usage behavior, and ensure proper cleanup to help you avoid OOMKilled events.

OOMKilled is a common error that can occur when the system consumes too much memory or does not release the allocated memory after use. By following the above-outlined steps and best practices, you can fix the error and ensure a more stable and efficient Kubernetes environment.

Solving Kubernetes OOMKilled errors with PerfectScale by DoiT

PerfectScale Kubernetes governance platform continuously monitors workload behavior and detects signs of instability, such as OOM events or CPU throttling, that often lead to crash loops.

Unlike traditional observability tools that only highlight issues, PerfectScale goes a step further. The platform provides actionable insights and automated recommendations, reducing manual effort and errors. Whether applied manually or autonomously for immediate impact, these recommendations restore workload stability, eliminate recurring restarts, and help prevent similar failures in the future.

fix OOMKilled with PerfectScale — Fix Kubernetes OOMKilled errors with PerfectScale by DoiT

Join industry leaders like Paramount Pictures and Creditas who have already optimized their Kubernetes environments with PerfectScale. Sign up or Book a demo today with the team!