In this blog, you’ll learn about the Node NotReady status in Kubernetes, what it is, the causes of the status, troubleshooting, and methods to prevent the Node NotReady status.
Here are some of the main errors you might encounter in Kubernetes, and quick advice for how to resolve them.
- ImagePullBackOff: Troubleshooting Tips and Tricks
- The Ultimate Guide: Kubernetes CreateContainerConfigError and CreateContainerError
- Kubernetes CrashLoopBackoff: An Ultimate Guide
- How to fix OOMKilled in Kubernetes
What is Kubernetes Node NotReady Error?
The "Node NotReady" error indicates that a node is currently unavailable or not in the ready state to run the workloads or pods. When a node is in the "NotReady" state, the Kubernetes control plane stops scheduling the new pods onto that node and reschedules the existing pods to other healthy nodes to keep the desired state of the cluster. The node controller continuously monitors the health of nodes; if a node fails to report back within a specific grace period, it is marked as NotReady.
Nodes that show NotReady status are in the NotReady state:
Understanding the Kubernetes Node States:
Kubernetes nodes can be in various states, indicating their status and health. It can be one of the following:
1. Ready: The node is healthy and ready to run pods. It means that the node is functioning properly and can host workloads.
2. NotReady: The node is not healthy and cannot run pods. It means the node is experiencing issues. Pods scheduled on this node may be evicted or rescheduled to other nodes.
3. SchedulingDisabled: When the node is in the SchedulingDisabled state, the node is marked as unschedulable, meaning no new pods will be scheduled on it.
4. Unknown: If the node controller is unable to communicate with the node in the last node-monitor-grace-period (default 40s) time, the node status is shown as unknown.
Causes of Kubernetes Node NotReady Error:
Various issues can cause a Kubernetes node to show a NotReady status. Let's discuss:
1. Lack of Resources: Nodes in a Kubernetes cluster require sufficient CPU, memory, and disk resources to function properly. When these resources are exhausted, the node may become unresponsive or unable to manage its workloads effectively, leading to a NotReady status. High CPU or memory usage can cause pods to be evicted or fail to start. Disk pressure occurs when disk usage exceeds a certain threshold, leading to the node being marked as NotReady.
2. Issues with the Kubelet: The Kubelet is the agent running on each node, responsible for managing pods and containers. If the Kubelet crashes or is misconfigured, it can't communicate with the API Server, and the node may become NotReady. The node status shows conditions like KubeletNotReady.
3. Issues with Kube-Proxy: Kube-Proxy is responsible for maintaining network rules on nodes. If kube-proxy crashes or is misconfigured, it can disrupt network traffic, leading to the node being marked as NotReady.
4. Connectivity Issues: Network connectivity is important for nodes to communicate with the control plane and other nodes in the cluster. Network misconfigurations can disrupt this communication, causing nodes to fail to report their status and leading to a NotReady state.
Note: During the initial phase of a node joining the cluster, it may temporarily display a NotReady status. This is a normal part of the process and should not be a cause for concern unless the node remains in this state for an extended period.
Diagnose of Kubernetes Node NotReady Error:
1. Check the Node Status: To ensure the error you are facing is due to an unhealthy node, the status is shown as NotReady in the Kubectl get nodes command. This command provides the current status of the nodes. If a node is marked as NotReady, it indicates that the node is not functioning correctly and cannot schedule new pods.
2. Check the Node's Details and Conditions: To gain more information about the node, you can use the kubectl describe node <name> command. This command provides detailed information about the node, including its conditions and events. By examining the conditions section, you can identify specific issues such as memory pressure, disk pressure, or network problems that might be causing the node to be NotReady.
MemoryPressure - specify whether a node is running low on available memory.
DiskPressure - signifies whether a node is running out of disk space.
PIDPressure - refers to whether a node is running too many processes.
If any of these conditions are True, it typically means the node is under resource pressure and may not be able to handle workloads effectively. As a result, the node is marked as NotReady, indicating it cannot run the pods. To know more about the node conditions, refer this.
3. Kubelet Issue: If all conditions are unknown, it means the kubelet on the node is down, causing the node to go into a NotReady state.
The kubelet is the sole point of contact for the Kubernetes cluster, managing the lifecycle of containers on the node, and if it is not running, the node will not be able to report its status correctly. To diagnose this, you can check the Kubelet logs on the node for any errors or issues.
4. Check Kubernetes System Pods: To diagnose, check the status of the Kubernetes system pods using the kubectl get pods -n kube-system command. These pods are critical for the operation of the cluster, and if any of them are not running correctly, it can affect the status of the nodes.
To see only the pods assigned to a specific node, you can use:
5. Checks for connectivity: To diagnose connectivity problems, you can use the
command and look for the NetworkUnavailable flag in the conditions section. If this flag is True, it means the node has a connectivity issue.
Fixing Node NotReady Issues:
1. Resolve Lack of Resources: You can increase the resources available to the node or reduce resource consumption by scaling down the workloads or optimizing the resource requests and limits for your pods. You can use commands like top or htop on the node to monitor resource usage. Identifying and shutting down non-Kubernetes processes, running malware scans, upgrading the node, and checking for hardware issues or misconfigurations can also help conserve resources and resolve the issue.
2. Resolve Kubelet Issues: To resolve kubelet issues, SSH into the node and run systemctl status kubelet. The status can be: active (running), active (exited), or inactive (dead).
active (running): The kubelet is operational, and the issue might be elsewhere.
active (exited): The kubelet exited, maybe due to an error. Restart it using sudo systemctl restart kubelet.
inactive (dead): The kubelet crashed. Use journalctl -u kubelet to examine the logs and identify the cause.
If the kubelet service is running and has the necessary permissions but the node is NotReady, it also makes sense to look in the kubelet logs—it may be erroring but not crashing.
3. Resolve Kube-Proxy Issues: Check the logs of the kube-proxy pod to identify any errors or warnings. Make sure the kube-proxy as DaemonSet is configured correctly. If you identify any issues with the kube-proxy pod, you can delete it to force a restart. The DaemonSet controller will automatically create a new pod.
4. Checking Connectivity: Check the network configuration on the node, ensure that the necessary ports are open, and verify that the network plugins are correctly installed and configured. You can use commands like ping or traceroute to test network connectivity from the node to other nodes or external endpoints.
We have discussed in detail the various causes of the Kubernetes Node NotReady state and their solutions. Whether it is a lack of resources, problems with Kubelet or Kube-Proxy, or network connectivity issues, by following the above steps, you can identify this issue, resolve it effectively, and ensure a more stable and efficient Kubernetes environment.
Fix Kubernetes errors with PerfectScale
Managing the Kubernetes environment takes time and is challenging, particularly when it comes to troubleshooting. Enter PerfectScale, a platform designed to transform the Kubernetes world.
If you are using the PerfectScale platform for your cluster visibility, you can just go to the alerts tab and quickly identify the errors resulting from your Kubernetes resource misconfigurations.
You can see various types of alerts in the dashboard and also integrate with Slack or Microsoft Teams to get alert notifications in your preferred communication channel.