Kubernetes alerting for K8s cluster health and performance
Setting up Kubernetes alerting can help you quickly identify relevant recilience indicators changes, ensuring they can be eliminated before impacting the system.
Here are the top 10 important alerts to configure for your Kubernetes cluster, all included in PerfectScale's Kubernetes monitoring platform.With PerfectScale,get notified of identified risks in real-time.
PerfectScale provides timely and reliable Alerts to inform about any unusual system activity.
Resilience indicators
OOM
Out-of-Memory events usually occur in the following situations:
- The memory limit for a pod is set too low. An event will be triggered when the memory usage of the pod reaches a defined limit.
- Node is experiencing memory pressure and tries to evict some pods.Official documentation
CPU Throttling
CPU Throttling occurs when the pod reaches its defined CPU limit and could create latency in application response.
Kubernetes use CFS’s quota mechanism to implement the limit. The quota is based on the time period and not based on available CPU power. cfs_period_us
is used to define the time period, it’s always 100000us (100ms).For example, a container with 1 core limit will be throttled after 50ms when running on 2 cores node and after 25ms when running on 4 cores node regardless of the number of consumed CPU cores.
RestartsObserved
Frequent restarts indicate the presence of a problem with a high potential of harming the desired SLA.
Eviction
Eviction indicates forcefully terminating and removing a running pod from a node. Eviction events usually occur due to memory or CPU pressure on a node.
When eviction is observed, an alert will be triggered immediately to inform the users. Make sure that you have configured and assigned the integration profile to the cluster to receive timely notifications on Slack or MS Teams channel.
HPAAtMaxReplicasObserved
As demand for a service or application increases, HPA will scale the system to handle the additional load by dynamically adding more replicas. Once the maximum configured limit of replicas is reached, PerfectScale will raise the HPAAtMaxReplicasObserved indicator, which means the system cannot scale further based on the existing settings.
Depending on a workload's running time at maximum replicas, the severity of the indicator will vary. For example, the longer a workload runs at maximum replicas, the higher the severity indicator.
Limit/Request not set indicators
CpuRequestNotSet
Setting proper CPU requests
helps the Kubernetes scheduler to allocate the right amount of CPU for each container, making sure that the cluster's nodes capacity meets the demand.
MemRequestNotSet
Setting proper MEMORY requests
helps the Kubernetes scheduler to allocate the right amount of memory for each container, making sure that the cluster's nodes capacity meets the demand.
MemLimitNotSet
Setting proper MEMORY limit
helps to protect your worker node from OOM, preventing the risk of memory over-allocation.
Unlike compressible CPU (new cycle every 100ms), MEMORY is incompressible and cannot be over-allocated.
UnderProvisioning indicators
UnderProvisionedCpuRequest
Setting proper CPU requests
helps the Kubernetes scheduler to allocate the right amount of CPU for each container, making sure that the cluster's nodes capacity meets the demand.
UnderProvisionedMemRequest
Setting proper MEMORY requests
helps the Kubernetes scheduler to allocate the right amount of memory for each container, making sure that the cluster's nodes capacity meets the demand.
UnderProvisionedMemLimit
Setting proper MEMORY limit
helps to protect your worker node from OOM, preventing the risk of memory over-allocation. However, under-provisioned MEMORY limit
could cause unwanted OOM events on a pod level, potentially harming the desired SLA.
Waste indicators
OverProvisionedCpuRequest
Setting proper CPU requests
helps the Kubernetes scheduler allocate the right amount of CPU for each container, ensuring that the cluster's nodes capacity meets the demand. In cases of over-provisioned CPU requests, cloud resources are unnecessarily wasted due to allocation without utilization.
OverProvisionedMemoryRequest
Setting proper MEMORY requests
helps the Kubernetes scheduler allocate the right amount of memory for each container, ensuring that the cluster's nodes capacity meets the demand. However, when a memory request is over-provisioned, it wastes cloud resources, which are allocated but never used.
FAQ
1. What is Kubernetes alerting?
Kubernetes alerting notifies users of potential issues within the cluster.
2. How can I set up alerting in Kubernetes?
Alerting can be set up using monitoring tools like PerfectScale.
3. What are the benefits of Kubernetes alerting?
Alerting helps to ensure smooth operations and faster issue resolution.
4. How can I customize alerting rules in Kubernetes?
Users can define and configure rules based on specific metrics.
5. What are some best practices for Kubernetes alerting?
Best practices include setting up alerts for key performance indicators.