When setting up a Kubernetes cluster, choosing the right node size for the cluster can impact both performance and cost efficiency. Nodes come in many sizes- ranging from small to high-performance ones. Deciding whether to use a few huge nodes or many smaller ones in your cluster is an important architectural choice that can shape how your cluster behaves, from handling workloads to recovering from failures. In this article, we’ll discuss the pros and cons of using huge and small nodes in your clusters. Let’s dig in!
What is a Node in Kubernetes?
A node is a single machine—either physical or virtual—that runs your applications and is managed by the control plane. Each node contains the necessary components to execute pods, which are the smallest deployable units in Kubernetes.
Kubernetes Node Components
There are different Node Components in Kubernetes. Let’s discuss:
1. Kubelet: This agent ensures that containers are running in a pod. It receives pod specifications from the control plane and manages their lifecycle on the node.
2. Container Runtime: It is responsible for running the containers. Kubernetes supports various container runtimes, such as containerd and CRI-O.
3. Kube-proxy: A network proxy that maintains network rules on nodes. It facilitates communication between pods and services, managing the routing of network traffic.
Node Properties in Kubernetes
Node properties define the characteristics and metadata of a Kubernetes node. These properties help describe the node’s capabilities, configuration, and status. Let’s discuss the different Node Properties in Kubernetes:
1. Node Labels
Labels in Kubernetes are key-value pairs used to organize and identify nodes. They can help schedule pods onto specific nodes based on their labels.
To label nodes with the kubectl label command:
You can view Node Labels:
Labels can also be added directly to the node’s configuration file:
2. Node Annotations
Annotations are another way to add metadata to nodes. Unlike labels, annotations store non-identifying information that Kubernetes doesn’t use for scheduling.
Annotations can be defined like this:
3. Node Allocation and Capacity
Each node in Kubernetes has a set amount of CPU, memory, and storage. The capacity represents the total resources available on the node, while allocatable resources are what Kubernetes can use for workloads after reserving some for system processes.
You can view the node’s capacity and allocatable resources:
You can also see this info in Node’s manifest:
4. Node Conditions
Node conditions show the status of various components of a node. The conditions are:
a. Ready: It tells whether the node is ready to host workloads.
b. MemoryPressure: It indicates if the node is running low on memory.
c. DiskPressure: It indicates if the node’s storage is full or nearly full.
d. PIDPressure: It tells whether there are too many processes running on the node.
5. Node Info
Node information includes details about the hardware and software environment of a node, such as the hardware architecture (e.g., x86_64), container runtime version, Operating System version running on the node, Kubernetes Components versions.
6. Node Status
The node status provides information about the network and IP addresses assigned to the node.
What Do We Mean by Huge and Small Nodes?
Huge Nodes: These are large, high-capacity instances with large CPU, memory, and storage. For example, in the AWS cloud include m5.8xlarge(32 vCPUs, 128 GiB Memory, $1.536 per hour On-Demand) or c5.18xlarge(72 vCPUs, 144 GiB Memory, $3.06 per hour On-Demand), these could be powerful physical servers. Huge nodes can handle a lot of pods, so you can fit more workloads on fewer nodes.
Small Nodes: These are smaller instances with fewer resources, like AWS t3.small(2 vCPUs, 2 GiB Memory, $0.0208 per hour On-Demand ) or m5.large(2 vCPUs, 8GiB Memory, $0.096 per hour On-Demand). Small nodes are cheaper, and using them means you’ll have many more nodes in your cluster.
Advantages of Huge Nodes in Kubernetes
1. Cost Efficiency:
Huge nodes can provide you with a better cost-per-resource. When you run high-performance workloads or require large, consistent allocations of CPU and memory, bigger nodes can be more cost-effective than running an equivalent capacity with many small nodes. Fewer nodes also mean fewer network hops, which can reduce latency and boost performance depending on workload types.
2. Simplified Management
With fewer, larger nodes, there’s less infrastructure to manage. This simplifies tasks like monitoring, logging, and upgrading, as you have fewer nodes to track and fewer components to orchestrate. This can save time and reduce the chance of errors during maintenance.
3. Lower API Server Load
Each node interacts with the k8s API server and adds load to it specially in large clusters. By using fewer huge nodes, it places less load on the API server, which can help with overall cluster performance and stability.
4. Improved Resource Utilization
Huge nodes let you pack in more pods, which can mean fewer resources are left idle. When there are fewer nodes, Kubernetes has an easier time distributing workloads effectively, which can keep costs down by using resources more fully.
Disadvantages of Huge Nodes in Kubernetes
1. Higher Risk of Outages
When a huge node fails, it can affect many more pods than the failure of a small node would. This is often referred to as the “blast radius”—the larger the node, the larger the impact if something goes wrong. In terms of scheduling, it can also be harder to reschedule those pods to other nodes if the remaining nodes don’t have sufficient capacity to handle the load. This can result in downtime for applications until the failed node is restored.
2. Increased Downtime for Upgrades
If you need to upgrade or patch huge nodes, the process can take more time. Fewer nodes mean that each upgrade has a larger impact on the overall cluster capacity. With autoscaling enabled, huge nodes can lead to more drastic scaling changes during updates, which might not be ideal in clusters with dynamic workloads.
3. Resource Wastage for Small Workloads
If you’re running small workloads on huge nodes, you may end up with wasted capacity—bits of CPU or memory that don’t get used because they can’t be filled with smaller tasks. Kubernetes isn’t always efficient at “packing” workloads on large nodes, so some resources may sit idle if workloads aren’t well-matched to node size.
Advantages of Small Nodes
1. High Resiliency
With small nodes, a single node failure affects fewer pods, so the impact is less, and recovery is faster. Smaller nodes allow for easier load balancing and can improve the cluster’s fault tolerance. When combined with multiple availability zones or failure domains, this setup can ensure high availability across nodes.
2. Flexibility
Small nodes are well-suited to clusters with workloads that change frequently or require different resource configurations. Kubernetes can add and remove nodes as needed, scaling up or down quickly with smaller, incremental changes. Autoscalers work well with small nodes because they enable granular scaling—adding only the exact amount of resources required, helps in keeping costs low and prevents over-provisioning.
3. Efficient Resource Allocation for Different Workloads
In clusters with different workloads (such as CPU-heavy and memory-heavy apps), small nodes allow for efficient packing of workloads. The smaller increments make it easier for the Kubernetes scheduler to match workloads to nodes without significant leftover resources. In multi-tenant clusters, where different teams or apps need isolated resources, small nodes help avoid conflicts and resource competition.
Disadvantages of Small Nodes
1. Increased Management Overhead
More nodes mean more infrastructure to manage. Tasks like logging, monitoring, and upgrading become complex with many nodes, and the control plane can experience more load due to the larger node count. API calls and communications between nodes increase proportionally, which can cause API throttling and performance issues in extremely large clusters.
2. Higher Network Overheads
More nodes lead to more network traffic between nodes. Applications that need a lot of cross-node communication may suffer from higher latency and increased data transfer costs. This is particularly relevant for data-heavy apps where pods need to frequently exchange data.
3. Idle Resource Wastage
While small nodes are great for spreading out workloads, they can also lead to inefficient resource usage if not packed perfectly. Nodes with lower CPU and memory limits are more likely to run idle due to uneven resource allocation, which could be a waste of infrastructure resources, particularly if workloads are irregularly distributed.
Choosing the Right Node Size in Kubernetes
Your choice between huge nodes and small nodes depends on the nature of your workloads and your priorities. Here are a few points to consider:
1. Use Huge Nodes When:
a. You have consistent, high-load workloads that require stable, large resource pools.
b. Cost optimization at scale is more important than maximum resilience.
c. You need to simplify management tasks or reduce control plane overhead.
2. Use Small Nodes When:
a. You want to run different, dynamic workloads with frequent scaling requirements.
b. High availability is important, and you want to minimize the blast radius of a node failure.
c. Flexibility, fault tolerance, and incremental scaling are top priorities for your workloads.
For most clusters, the ideal approach is a mix of node groups with different sizes. This allows Kubernetes to schedule workloads on the best node for the job. However, getting this mix exactly right is challenging and requires careful monitoring of workload patterns and node utilization.
Kubernetes Node Management with PerfectScale
PerfectScale, with its InfraFit plugin, simplifies node sizing by providing actionable insights into resource utilization. Here's how it helps:
1. Resource Utilization Analysis: Infrafit identifies underutilized or overprovisioned nodes, enabling efficient allocation of workloads.
2. Node Optimization: PerfectScale suggests adjustments to node types and sizes, ensuring the best performance at the lowest cost.
3. Continuous Monitoring: Tracks node usage trends, providing alerts and real-time suggestions to eliminate inefficiencies.
Here is the step-by-step guide on how you can use InfraFit for your Node Management:
1. Access Node Group Details
- Click on the InfraFit section in PerfectScale.
- Select a specific Node Group to open its detailed view.
2. Analyze Resource Utilization
- Use the Utilization filter to view CPU and memory utilization (allocated, requested, and used) at the desired percentile.
- Identify underutilized or overprovisioned nodes based on Idle Cost and utilization metrics.
3. Review Optimization Recommendations
- In the Node Group Data Table, check the Avg Cost/h, Total Cost, and Idle Cost columns for actionable insights.
- PerfectScale provides recommendations for resizing or replacing nodes, including cost-saving options.
4. Monitor Usage Trends
- Use the Cost per Node Type and Utilization charts to track resource usage trends over time.
- Enable continuous monitoring to receive real-time alerts and updated recommendations.
5. Export Data
- Easily export data as a .csv file for further analysis and team collaboration.
By integrating PerfectScale, teams can achieve up to 30-50% additional cost savings through smarter, data-driven node sizing decisions. Sign up or Book a demo with the PerfectScale team today!