Karpenter is an open-source, flexible, and high-performance Kubernetes cluster autoscaler developed by AWS. It is designed to optimize the provisioning and management of compute resources in a Kubernetes cluster. Unlike traditional autoscalers that rely on static configurations and predefined instance types, Karpenter dynamically provisions the right compute resources based on the specific needs of the workloads running in the cluster.
In this ultimate guide, we’ll cover how karpenter works, implementing Karpenter for dynamic node provisioning, its optimization, its best practices, and NodePools Best Practices.
Let’s dig in!
How Karpenter Works?
Karpenter is a Kubernetes-native autoscaler designed to dynamically adjust the size of your Kubernetes cluster based on real-time workload demands. At its core, Karpenter continuously monitors the state of your cluster, including metrics from both pods and nodes. This monitoring allows Karpenter to make informed decisions about scaling actions. When it detects that the current resources are insufficient to handle the workload, Karpenter initiates a scaling-up process. This involves provisioning new nodes with the appropriate instance types and sizes that best match the resource requirements of the pending pods. Conversely, when the workload decreases and nodes become underutilized, Karpenter safely scales down the cluster by de-provisioning these nodes, ensuring that running workloads are not disrupted.
One of the key strengths of Karpenter is its ability to optimize resource allocation, which helps in reducing operational costs. It achieves this by selecting the most cost-effective instance types and sizes and by efficiently packing workloads onto nodes to maximize resource utilization. However, it's very important to note that Karpenter is only able to optimize resource allocation if the pods are right-sized. It looks at container resource requests and scheduling constraints in order to perform node selection. PerfectScale can help with pod right-sizing, ensuring that Karpenter has accurate information to work with. For more information on how PerfectScale can enhance Karpenter's effectiveness, check out this post
Karpenter's decision-making process is driven by a set of customizable policies and configurations. Users can define custom provisioning logic using NodePool Custom Resource Definitions (CRDs), specifying parameters such as instance types, zones, and resource limits. This allows for fine-grained control over how resources are allocated and managed within the cluster. Scaling policies can be defined to set minimum and maximum node counts, as well as cooldown periods to control the frequency of scaling actions.
Implementing Karpenter for Dynamic Node Provisioning
Prerequisites:
a. AWS CLI: Command-line interface for AWS.
b. kubectl: Kubernetes command-line tool.
c. eksctl (>= v0.180.0): CLI for creating and managing EKS clusters.
d. helm: Kubernetes package manager.
Configure the AWS CLI with a user that has sufficient privileges to create an EKS cluster. Verify the CLI can authenticate properly by running:
Step 1: Set Environment Variables
Set the necessary environment variables for Karpenter and Kubernetes:
Step 2: Create a Cluster and install Karpenter
Use eksctl with a configuration file to create your EKS cluster and install Karpenter simultaneously. Here’s how:
Create the cluster using:
This configuration not only creates the cluster but also installs Karpenter and sets up the SpotInterruptionQueue, which allows Karpenter to replace spot instances before they die.
It's important to note that even though Karpenter does dynamic node provisioning, we still need a predefined nodegroup with 2 nodes. This initial nodegroup serves as a baseline for running critical system pods and ensures that there's always a minimum capacity available for the cluster to function, even if Karpenter encounters issues.
Step 3: Create a NodePool
In this step, we're creating both a NodePool and an EC2NodeClass. The NodePool defines the requirements and limits for the nodes that Karpenter will provision, while the EC2NodeClass specifies the details of the EC2 instances that will be created. This NodePool uses securityGroupSelectorTerms and subnetSelectorTerms to discover resources for launching nodes.
Step 4: Scale Up Deployment
Deploy a sample application to test Karpenter's scaling capabilities:
Step 5: Scale Down Deployment
Delete the deployment to observe Karpenter's node consolidation:
Step 6: Manual Node Deletion
If you need to delete a node manually, Karpenter will handle the graceful shutdown:
Step 7: Clean Up
To avoid charges, remove the demo infrastructure:
Advantages of Karpenter
Karpenter does more than scaling. Here are some of the advantages of Karpenter:
1. Cost Optimization
Karpenter helps in cost optimization by dynamically provisioning the most cost-effective compute resources based on real-time workload demands. It supports a wide range of instance types, including spot instances, which are cheaper than on-demand instances. By intelligently selecting and scaling down underutilized nodes, Karpenter helps organizations minimize their cloud infrastructure costs.
It's important to note that pod right-sizing is important for actual optimization of node selection. PerfectScale can help with this process, making sure that your pods are requesting the appropriate resources. Readers can use PerfectScale's InfraFit Plugin to evaluate and fine-tune their NodePool configurations. InfraFit provides a granular view of resource allocation across the various nodes supporting your Kubernetes clusters, helping you make informed decisions about your infrastructure
This configuration allows both spot and on-demand instances to be used, ensuring flexibility in cost optimization. By enabling consolidationPolicy: WhenUnderutilized, Karpenter actively removes underutilized nodes to minimize costs.
2. Support for Diverse Workloads
Karpenter is designed to handle a variety of workloads, including machine learning (ML) and generative AI applications. These workloads often have unique resource requirements and can be highly variable in nature. Karpenter's flexible provisioning capabilities ensure that the right type and amount of resources are available to meet the specific needs of these complex workloads, as long as the pods have been right-sized. For right-sizing the pods, PerfectScale provides plugins and insights to optimize your workload resource allocation.
This configuration ensures that only on-demand p3.2xlarge and p3.8xlarge instances are used. Nodes are terminated 60 seconds after becoming empty, conserving resources.
3. Simplified Upgrades and Patching
Managing upgrades and patches in a Kubernetes environment can be challenging. Karpenter simplifies this process by enabling node replacements and rolling updates. It can automatically provision new nodes with the latest patches and updates, and gracefully decommission old nodes. This automated approach reduces the operational burden and enhances cluster security and stability.
Here, expireAfter: 720h ensures nodes are replaced after 30 days, keeping your cluster secure and up-to-date.
4. Kubernetes Native
Being Kubernetes-native, Karpenter integrates seamlessly with the Kubernetes ecosystem. It leverages Kubernetes APIs and works in harmony with other Kubernetes components, such as the scheduler and controller manager. This native integration ensures that Karpenter can efficiently manage resources and scale applications without requiring significant changes to existing Kubernetes setups. It also benefits from the robustness and reliability of the Kubernetes platform.
5. Advanced Scheduling Capabilities
Karpenter provides advanced scheduling features like bin-packing and topology-aware scheduling. Bin-packing optimizes resource utilization by packing workloads onto fewer nodes, reducing the overall number of nodes required. Topology-aware scheduling ensures that workloads are distributed in a way that maximizes performance and resilience, taking into account factors like network latency and fault domains. These advanced scheduling capabilities help in achieving better resource efficiency and application performance.
The consolidation field enables bin-packing, which optimizes resource utilization by packing workloads onto fewer nodes.
6. Enhanced Scalability and Performance
Karpenter's real-time data analysis and integration with cloud provider APIs enable it to respond quickly to changing workload demands. This ensures that applications have the necessary resources to maintain optimal performance, even under varying loads. By dynamically scaling resources up and down, Karpenter enhances the overall scalability and responsiveness of Kubernetes clusters.
>> Take a look at How you get the most out of Karpenter with PerfectScale
Karpenter Optimization
Karpenter optimizes cost using a feature called consolidation. Let's say your Kubernetes cluster worker nodes look like this: over time, you have four EC2 worker nodes running. The first worker node is running efficiently with all pods tightly packed, ensuring no resource misutilization. However, the last three worker nodes are not as efficient, with a decent amount of resource misutilization and a lot of empty space.
You might wonder how the cluster worker nodes ended up like this. It can happen that over time, traffic increased, and the Horizontal Pod Autoscaler created many pods, prompting Karpenter to provision more nodes. At one point, all four nodes had four pods each, assuming each EC2 can host up to four application pods. As traffic decreased, some pods got terminated, leaving the EC2 instances underutilized.
With Karpenter in the node pool, you can enable the disruption and consolidation policy when underutilized. Karpenter will automatically detect underutilized EC2 instances and bin-pack those pods. For example, it can move pods from the last two EC2 instances onto the second one and then terminate the third and fourth EC2 instances. This leads to better utilization of worker nodes and reduced costs.
Consider another scenario where the second and third worker nodes are m5.xlarge instances. Even if Karpenter bin-packs these two pods into one EC2 instance, there will still be resource misuse, as 50% of the EC2 instance will remain underutilized. Karpenter is smart enough to create a new m5.large EC2 instance instead of an m5.xlarge instance and move the two pods from the larger instances into this newly provisioned instance. This way, Karpenter gets rid of both larger nodes and launches a smaller node to accommodate both pods, leading to better selection of worker nodes and reduced costs.
Karpenter works to reduce cluster costs by:
- Removing empty nodes.
- Removing nodes by moving pods to other underutilized nodes.
- Replacing nodes with cheaper variants.
You can turn on the consolidation feature by specifying it under the disruption field. You can specify consolidateAfter, which determines how long Karpenter will wait before disrupting an underutilized EC2 instance. The consolidation policy can be set to WhenEmpty or WhenUnderutilized.
- When Empty: Karpenter will only consider nodes for consolidation that contain no workload pods. It's okay if system pods or daemon sets are running, but Karpenter will only consider the node empty if there are no workload pods or application pods running.
-When Underutilized: Karpenter will attempt to remove or replace nodes when a node is underutilized and could be changed to reduce costs.
Karpenter will start saving you money. However, you might not want certain nodes or pods to be consolidated, even if the node is very underutilized or empty. There are ways to control these different kinds of disruptions, such as consolidation, expiration, drift, etc. Once you understand the normal behavior of these disruptions, you can learn how to control them to prevent unwanted disruptions.
Note: Karpenter's consolidation feature can lead to reliability issues for memory burstable pods. Consolidation and scheduling in general work by comparing the pods' resource requests vs. the amount of allocatable resources on a node. The resource limits are not considered. As an example, pods that have a memory limit that is larger than the memory request can burst above the request. If several pods on the same node burst at the same time, this can cause some of the pods to be terminated due to an out-of-memory (OOM) condition. Consolidation can make this more likely to occur as it works to pack pods onto nodes only considering their requests.
Karpenter Best Practices
1. Use Karpenter for Dynamic Workloads
Karpenter performs good in environments where workloads have fluctuating capacity needs. Unlike Amazon EC2 Auto Scaling Groups (ASGs) and Managed Node Groups, which rely on AWS-level metrics like EC2 CPU load, Karpenter integrates more closely with Kubernetes native APIs. This integration allows for more flexible and efficient scaling, especially for workloads that experience high, spiky demand or have diverse compute requirements. While ASGs and MNGs are suitable for static and consistent workloads, Karpenter is ideal for dynamic environments. You can also use a mix of dynamically and statically managed nodes to meet your specific requirements.
2. Run Karpenter Controller on EKS Fargate or a Dedicated Node Group
Karpenter is installed using a Helm chart, which deploys the Karpenter controller and a webhook pod as a Deployment. It is important to ensure that these components run on a stable environment. We recommend running the Karpenter controller on EKS Fargate or a dedicated node group. This setup ensures that Karpenter itself is not managed by Karpenter, thereby avoiding potential disruptions.
>> Take a look also at How to use Karpenter with AWS Reserved Instances
3. Exclude Unnecessary Instance Types
When configuring Karpenter, it is essential to exclude instance types that do not fit your workload requirements. For example, if your workloads do not require large Graviton instances, you can exclude them using the node.kubernetes.io/instance-type key. This exclusion helps in optimizing resource utilization and cost. PerfectScale's InfraFit plugin can provide valuable insights into which instance types are most suitable for your workloads, helping you make informed decisions about which types to include or exclude.
4. Enable Interruption Handling for Spot Instances
Karpenter supports native interruption handling, which is important for managing Spot Instances. Spot Instances can be interrupted with little notice, and Karpenter can handle these interruptions gracefully. By configuring the --interruption-queue CLI argument with the name of an SQS queue, Karpenter can taint, drain, and terminate affected nodes ahead of time, ensuring that workloads are moved to new nodes before the interruption occurs.
5. Configure for Private EKS Clusters
If you are running an Amazon EKS cluster in a VPC without outbound internet access, you need to configure your environment according to the private cluster requirements. This configuration includes creating an STS VPC regional endpoint and an SSM VPC endpoint. These endpoints are necessary for Karpenter to function correctly in a private cluster.
6. Overprovisioning to Improve Responsiveness
In scenarios where you expect a sudden surge in workload, such as a data pipeline process that needs to launch a large number of pods simultaneously, overprovisioning can significantly improve responsiveness. This involves deploying a "dummy" workload with a low PriorityClass to reserve capacity in advance. When the actual workload is deployed, the "dummy" pods are evicted, making room for the new pods to start almost immediately. However, it's important to note that this approach trades off resource efficiency for responsiveness. Overprovisioning can lead to increased costs and resource waste, as you are maintaining additional capacity that may not always be utilized. Therefore, use this strategy judiciously and monitor your resource usage and costs closely. To implement overprovisioning, follow these steps:
1. Deploy the "dummy" workload:
2. Deploy the actual workload:
3. Scale down the "dummy" workload if not needed:
>> Take a look at Guide to monitoring Karpenter with Prometheus
NodePools Best Practices
1. Create Multiple NodePools for Different Requirements
When different teams share a cluster, or when workloads have varying OS or instance type requirements, it is advisable to create multiple NodePools. For example, one team may require GPU instances for machine learning workloads, while another team may need general-purpose instances. By creating multiple NodePools, you can ensure that each team has access to the most appropriate resources.
2. Use Mutually Exclusive or Weighted NodePools
To provide consistent scheduling behavior, create NodePools that are either mutually exclusive or weighted. If multiple NodePools match a workload, Karpenter will randomly choose one, leading to unexpected results. For example, you can create a NodePool for GPU instances with specific taints and another for general compute instances with node affinities. This setup ensures that workloads are scheduled on the appropriate nodes.
In a Kubernetes environment, managing costs and assigning them to the appropriate teams can be a complex task. One effective strategy is to use mutually exclusive NodePools, which can help in assigning cost ownership to different billing teams.
Let’s take an example scenario of different billing teams and how they can use mutually exclusive NodePools:
a. Define NodePools with Specific Constraints:
Create NodePools with constraints that match the resource requirements of each team. For example, you can create a NodePool for GPU instances and another for general-purpose instances.
b. Deploy Workloads with Specific Affinities and Tolerations:
Ensure that the workloads are scheduled on the appropriate NodePools by using node affinities and tolerations.
With this, you can ensure that Team A's GPU workloads are scheduled on GPU instances, and Team B's general compute workloads are scheduled on general-purpose instances. This not only optimizes resource utilization but also ensures that the costs are accurately attributed to each team. For example, if Team A's GPU instances are more expensive, the cost will be reflected in their billing, allowing for better budget management and accountability. Similarly, Team B will only be charged for the general-purpose instances they use, preventing any unexpected cost overruns.
3. Use Timers to Automatically Delete Nodes
Karpenter allows you to set timers on provisioned nodes to automatically delete them when they are no longer needed. This feature is useful for upgrading nodes, as it enables you to retire and replace nodes with updated versions. You can configure node expiry using the spec.disruption.expireAfter field in the NodePool specification.
4. Avoid Overly Constraining Instance Types
When using Spot Instances, avoid placing too many constraints on the instance types that Karpenter can provision. Karpenter uses the Price Capacity Optimized allocation strategy to provision instances from the deepest pools with the lowest risk of interruption. By allowing Karpenter to use a diverse set of instance types, you can optimize the availability and cost of your Spot Instances.
Enhancing Karpenter with PerfectScale
PerfectScale plus Karpenter can improve your Kubernetes cluster's efficiency and additional 30 to 50% in cost reductions on top of what can achieved with Karpenter alone. Through a series of key steps, PerfectScale’s insights empower Karpenter to make data-driven, intelligent decisions for node provisioning:
1. Resource Analysis: PerfectScale provides detailed insights into workload behavior and resource utilization, allowing teams to identify underutilized or overprovisioned pods.
2. Pod Right-Sizing: Using PerfectScale’s InfraFit plugin, you can fine-tune NodePool configurations to ensure pods are right-sized, aligning with real workload needs.
3. NodePool Optimization: With a granular view of resource allocation, PerfectScale helps teams adjust NodePools based on workload requirements, recommending instance types, and other settings for optimal cost and performance.
4. Continuous Monitoring: PerfectScale continuously monitors your workloads, alerting you to potential inefficiencies and offering actionable recommendations. This real-time insight enables proactive cluster management, maximizing the benefits of Karpenter’s dynamic scaling capabilities.
By combining Karpenter’s dynamic provisioning with PerfectScale’s insights, teams can achieve a truly optimized, cost-efficient Kubernetes environment. Try PerfectScale to see how they can enhance your Karpenter-managed cluster. Sign up or Book a demo to learn more.