Kubernetes Alerting: 10 Must-Have Alerts for Proactive Monitoring

Setting up Kubernetes alerting can help you quickly identify and notify of relevant recilience indicators changes, ensuring they can be eliminated before impacting the system.

Here are the top 10 important alerts to configure for your Kubernetes cluster, all included in PerfectScale's Kubernetes monitoring for you to get notified of identified risks in real-time

PerfectScale provides timely and reliable Alerts to inform about any unusual system activity.

Resilience indicators

OOM

Out-of-Memory events usually occur in the following situations:

The memory limit for a pod is set too low. An event will be triggered when the memory usage of the pod reaches a defined limit.
Node is experiencing memory pressure and tries to evict some pods.Official documentation

CPU Throttling

CPU Throttling occurs when the pod reaches its defined CPU limit and could create latency in application response.

Kubernetes use CFS’s quota mechanism to implement the limit. The quota is based on the time period and not based on available CPU power. cfs_period_us is used to define the time period, it’s always 100000us (100ms).For example, a container with 1 core limit will be throttled after 50ms when running on 2 cores node and after 25ms when running on 4 cores node regardless of the number of consumed CPU cores.

RestartsObserved

Frequent restarts indicate the presence of a problem with a high potential of harming the desired SLA.

Eviction

Eviction indicates forcefully terminating and removing a running pod from a node. Eviction events usually occur due to memory or CPU pressure on a node.

Official documentation

HPAAtMaxReplicasObserved

As demand for a service or application increases, HPA will scale the system to handle the additional load by dynamically adding more replicas. Once the maximum configured limit of replicas is reached, PerfectScale will raise the HPAAtMaxReplicasObserved indicator, which means the system cannot scale further based on the existing settings.

Depending on a workload's running time at maximum replicas, the severity of the indicator will vary.

For example, the longer a workload runs at maximum replicas, the higher the severity indicator.

Limit/Request not set indicators

CpuRequestNotSet

Setting proper CPU requests helps the Kubernetes scheduler to allocate the right amount of CPU for each container, making sure that the cluster's nodes capacity meets the demand.

MemRequestNotSet

Setting proper MEMORY requests helps the Kubernetes scheduler to allocate the right amount of memory for each container, making sure that the cluster's nodes capacity meets the demand.

MemLimitNotSet

Setting proper MEMORY limit helps to protect your worker node from OOM, preventing the risk of memory over-allocation.

Unlike compressible CPU (new cycle every 100ms), MEMORY is incompressible and cannot be over-allocated.

UnderProvisioning indicators

UnderProvisionedCpuRequest

Setting proper CPU requests helps the Kubernetes scheduler to allocate the right amount of CPU for each container, making sure that the cluster's nodes capacity meets the demand.

UnderProvisionedMemRequest

Setting proper MEMORY requests helps the Kubernetes scheduler to allocate the right amount of memory for each container, making sure that the cluster's nodes capacity meets the demand.

UnderProvisionedMemLimit

Setting proper MEMORY limit helps to protect your worker node from OOM, preventing the risk of memory over-allocation. However, under-provisioned MEMORY limit could cause unwanted OOM events on a pod level, potentially harming the desired SLA.

Waste indicators

OverProvisionedCpuRequest

Setting proper CPU requests helps the Kubernetes scheduler allocate the right amount of CPU for each container, ensuring that the cluster's nodes capacity meets the demand. In cases of over-provisioned CPU requests, cloud resources are unnecessarily wasted due to allocation without utilization.

OverProvisionedMemoryRequest

Setting proper MEMORY requests helps the Kubernetes scheduler allocate the right amount of memory for each container, ensuring that the cluster's nodes capacity meets the demand. However, when a memory request is over-provisioned, it wastes cloud resources, which are allocated but never used.

Real- time Kubernetes alerts with PerfectScale by DoiT

Configuring these top K8s alerts, alongside robust Kubernetes monitoring and alerting practices, addresses common Kubernetes cluster issues, enabling fast resolution before they impact application availability and performance. Continuously fine-tuning alert thresholds and reviewing actions is crucial for maintaining a healthy infrastructure. PerfectScale provides timely and reliable Alerts to inform about any unusual system activity. Alerts are designed to quickly identify and notify of relevant indicators changes, ensuring they can be eliminated before impacting the system.

Real-time alerts without alerts fatique

PerfectScale allows users to easily set up alerts for their clusters and manage them efficiently with Alert Profiles. You can easily monitor and get notified about alerts that are relevant for your setup.

For faster updates, utilize Slack or MS Teams Integration Profiles to receive notifications when an Alert is generated.

Identify up to 30 different resilience types of risks

‍
PerfectScale’s tracks up to 30 K8S alerts are tailored specifically for Kubernetes. They cover a wide range of potential issues, such as pod failures or resource exhaustion, eliminating the need for deep Kubernetes expertise.

Get actionable recommendations to eliminate risks

A powerful feature with a detailed presentation of resources (Memory and CPU), their current state vs actual usage, and PerfectScale recommendations for optimal performance at the lowest cost.

The PerfectScale recommendations provides you the current resource requests and limits, as well as percentiles of real usage. Based on this data, PerfectScale calculates and provides safe recommendations to ensure the best performance for the lowest possible price.

To apply the recommendations, you can effortlessly copy the .yaml and deploy it to your cluster.

Get actionable recommendations to eliminate risks

Happy building!

Kubernetes Alerting: 10 Must-Have Alerts for Monitoring

Resilience indicators

OOM

CPU Throttling

RestartsObserved

Eviction

HPAAtMaxReplicasObserved

Limit/Request not set indicators

CpuRequestNotSet

MemRequestNotSet

MemLimitNotSet

UnderProvisioning indicators

UnderProvisionedCpuRequest

UnderProvisionedMemRequest

UnderProvisionedMemLimit

Waste indicators

OverProvisionedCpuRequest

OverProvisionedMemoryRequest

Real- time Kubernetes alerts with PerfectScale by DoiT

Real-time alerts without alerts fatique

Identify up to 30 different resilience types of risks

Get actionable recommendations to eliminate risks

Reduce your cloud bill and improve application performance today

Latest Articles

GPU Optimization with Exceptional PerfectScale Visibility

On Demand Webinar: Manage & Scale GenAI on Kubernetes

GCP Cloud Billing with PerfectScale

About the author

Kubernetes Alerting: 10 Must-Have Alerts for Monitoring

Resilience indicators

OOM

CPU Throttling

RestartsObserved

Eviction

HPAAtMaxReplicasObserved

Limit/Request not set indicators

CpuRequestNotSet

MemRequestNotSet

MemLimitNotSet

UnderProvisioning indicators

UnderProvisionedCpuRequest

UnderProvisionedMemRequest

UnderProvisionedMemLimit

Waste indicators

OverProvisionedCpuRequest

OverProvisionedMemoryRequest

Real- time Kubernetes alerts with PerfectScale by DoiT

Real-time alerts without alerts fatique

Identify up to 30 different resilience types of risks

Get actionable recommendations to eliminate risks

Reduce your cloud bill and improve application performance today

How to fix OOMKilled in Kubernetes

8 Tips For Rightsizing Your Kubernetes Cluster

CreateContainerConfigError

Latest Articles

GPU Optimization with Exceptional PerfectScale Visibility

On Demand Webinar: Manage & Scale GenAI on Kubernetes

GCP Cloud Billing with PerfectScale

About the author