Right-sizing Kubernetes nodes is a complex multi-layer task that involves resource utilization monitoring, analyzing the requirements of the workloads running on the node, and selecting the proper nodes to meet your application’s needs.
Tracking CPU and memory utilization at the node level with general-purpose monitoring and observability tools leaves several visibility gaps that make node selecting and right-sizing risky. Making the wrong decision can lead to over-sizing and wasting your cloud budget while under-sizing can lead to un-schedulable or crashing pods due to unforeseen runtime constraints. The risk gets exacerbated if the workloads running on the nodes are not optimized. This can drastically impact auto-scalers (like Karpenter), leading to excessive waste during times when the nodes are scaling up.
Regularly right-sizing and optimizing your nodes is integral for maintaining lean and performant clusters. Holistic visibility across your nodes and associated workloads is crucial to making the data-driven decisions necessary to keep your applications resilient and cost-effective.
We are thrilled to introduce you to PerfectScale InfraFit - your solution for achieving a fine-tuned K8s environment, providing an advanced, granular view of node resource allocation and empowering you to optimize the underlying infrastructure of your workloads.
Let's explore it together!
Infrafit: What is under the hood?
InfraFit is an advanced feature that provides a comprehensive multidimensional view of the entire Kubernetes environment at the infrastructure level. It plays a pivotal role in understanding the behavior of distinct node groups and types, with insights that facilitate optimization of the underlying infrastructure of the workloads.
The new level of visibility empowers you to achieve the following:
- Gain granular visibility into the cost of your cluster compute, now enhanced by AWS CUR Integration, to align your node selection with your cloud budgeting goals.
- Identify where the idle capacity exists with data points that can help you understand and adjust node sizes.
- Get a complete multidimensional approach to maximize K8s optimization results and evaluate and ensure the best output of the autoscaling.
- Optimize the provisioning algorithms of node autoscaler (for example, Karpenter) by selecting the node types with the best utilization rates.
How to use Infrafit?
Dive into Infrafit and start exploring your infrastructure optimization opportunities.
Node Resource Utilization Panel - Evaluate overtime trends
Within this panel, you'll find a comprehensive visual breakdown of your infrastructure with the desired percentile of resource utilization for the particular node group, node type, or the entire environment, granularly segmented over the selected timeframe. This view offers detailed insights into actual resource usage compared to the requests, empowering you to pinpoint optimization opportunities and devise a strategy for enhanced efficiency.
Effortlessly toggle between node groups and node types, and vice versa, with the flexibility to customize your view by selecting specific data to display. This feature gives you a precise and customizable picture of the cost per node groups and types and resource distribution (usage, request, allocation) at any time point.
Infrafit Data Table - Identify and Prioritize Inefficiencies
The Infrafit Data Table provides a structured overview of the resource allocation across your infrastructure, facilitating easier management and continuous monitoring. With the ability to view by either Node Types or Node Groups, you can effortlessly track resource utilization, identify optimization opportunities, and prioritize actions based on their impact.
Interested in identifying what node groups or types cost you the most? Concerned which nodes are the most under-utilized? PerfectScale has already organized the data and provided the answers for you!
Detailed Node Groups and Node Types analytics insights help you to:
- Evaluate the effectiveness of your autoscaling (like Karpenter or Cluster Autoscaler).
- Quickly identify opportunities to right-size nodes.
- Identify the nodes with wasted workload resources throughout your environment.
Podfit & Infrafit Connectivity - Take action!
To streamline the optimization process across the different layers of your Kubernetes stack, InfraFit and PodFit are seamlessly integrated into one simplified experience. When drilling down into individual node types or node groups, you can see the workloads running on each node, their run time, and resource utilization insights.
InfraFit and PodFit together offer a comprehensive solution that takes a multidimensional approach to reducing wasted resources in your nodes. Seamlessly pinpoint workloads that contribute most to resource waste, promoting more efficient bin-packing and ultimately reducing your overall node count while maximizing the benefits of your node autoscaling.
Exploring Usage Scenarios
Verifying Traditional Autoscaling Group Configurations
Infrafit could be particularly helpful in evaluating the cost and utilization trends of the autoscaling groups and identifying space for optimization. For example, while using an Autoscaling group with Spot instances for your disruptable workloads and an Autoscaling group with On-Demand instances for all the rest, you may find that your Spot Autoscaling group is consistently underutilized, presenting an opportunity for optimization. Reconfiguring the Spot ASG to use smaller (and hence cheaper) instances, you could significantly increase utilization rates and effectively achieve optimization objectives.
Choosing Optimal Instance Types for Workloads
Services latencies are often critical, and when they are higher than usual, it's essential to verify if the nodes hosting the service instances have sufficient resources.
Infrafit node grouping is a flexible feature that can assist you in getting granular visibility into the utilization of the instances used for the specific workloads over time, enabling further analysis. For example, you may identify that nodes in a specific group are at 99% CPU utilization (causing slowness) while only at 50% memory utilization. In this scenario, to enhance performance and potentially optimize node costs, selecting a different, more suitable instance type for the workload - with more CPU and less memory - and reconfiguring the node group could be a solution.
Verifying Karpenter (NAP) Effectiveness
Did you know that Karpenter's (or GKE NAP) node choice is based on container requests and not on actual utilization? This means that even with perfectly fine-tuned autoscaling, eliminating waste is not guaranteed. To overcome this challenge, the combination of PerfectScale Automation and Infrafit can be particularly helpful.
The recipe is simple:
- Set up PerfectScale Automation to adjust resource requests based on actual container needs.
- Use Infrafit to monitor changes in your Karpenter NodePool compositions.
- Fine-tune Karpenter configuration by drilling down to specific node types and verifying each type’s utilization to potentially filter out nodes with lower utilization rates.
Read more about how to Get the most out of Karpenter here.
Enough talk, now's the perfect time to try it into action. Ready to dive in? Discover more in our Documentation Portal, or arrange a technical session with our team for a detailed Infrafit overview.
Still not with PerfectScale? Start for free today and enhance your K8s optimization journey with ease.