August 12, 2024

Kubernetes Vertical Pod Autoscaler (VPA)

Tania Duggal
Technical Writer

Kubernetes Vertical Pod Autoscaler (VPA) helps in optimizing resource utilization by adjusting the resource requests of pods dynamically.

Vertical Pod Autoscaler is one of the three primary autoscaling approaches in Kubernetes: 

a. Horizontal Pod Autoscaler: scaling the number of replicas of an application

b. Vertical Pod Autoscaler: adjust the resource settings (requests and limits) of a container

c. Cluster Autoscaler: scaling the number of nodes in a cluster

In this article, we will explore the VPA(Vertical Pod Autoscaler), what it is, how it works, how it relates to HPA, the implementation of VPA, using VPA with HPA, VPA’s best practices, and limitations.

What is Kubernetes Vertical Pod Autoscaler (VPA)?

Kubernetes offers two primary scaling strategies: vertical and horizontal. Vertical scaling, often called "scaling up," involves increasing the resources allocated to individual pods. This means beefing up CPU, memory, or other resources for each pod to handle more load. On the other hand, horizontal scaling, or "scaling out," focuses on increasing the number of pod replicas. Instead of making each pod more powerful, we simply add more pods to distribute the workload. 

To automate the vertical scaling process, Kubernetes provides a component called the Vertical Pod Autoscaler (VPA). K8s vertial pod autoscaler is designed to automatically adjust the CPU and memory resource requests and limits for containers within pods. VPA aims to solve one of the most challenging aspects of Kubernetes resource management: accurately setting resource requests and limits. Vertical pod autoscaler Kubernetes continuously monitors the resource usage of your pods and makes recommendations or automatic adjustments to ensure your containers have the right amount of resources.

How does Kubernetes Vertical Pod Autoscaler work?

Kubernetes Vertical Pod AutoScaler

The inner workings of Vertical Pod Autoscaler Kubernetes involve several components and processes that work together to provide functionality. At the heart of VPA, there are three main components: the VPA Recommender, the VPA Updater, and the VPA Admission Controller. These components work together to collect data, analyze resource usage, generate recommendations, and apply changes to pod specifications.

The VPA Recommender is responsible for analyzing resource usage patterns and generating recommendations for CPU and memory settings. It continuously monitors the resource consumption of containers using metrics provided by the Kubernetes Metrics Server. This analysis helps VPA understand the true resource needs of each container over time. 

The VPA Updater is the component responsible for applying the recommendations generated by the Recommender. When operating in "Auto" mode, the Updater will delete pods that need updating and create new ones with the adjusted resource settings. It's important to note that this process can be affected by Pod Disruption Budget (PDB) settings, stalling the update process.

The VPA Admission Controller intercepts pod creation requests and modifies the resource requirements according to the VPA recommendations. This ensures that even newly created pods start with adjusted resource settings, improving overall cluster efficiency from the outset. 

One of the key features of K8s Vertical Pod Autoscaler is its ability to operate in different modes, providing flexibility to cluster administrators. In "Off" mode, VPA generates recommendations but does not apply them automatically, allowing for manual review and application. The "Initial" mode applies recommendations only when new pods are created, which can be useful in scenarios where pod restarts are not desirable. The "Auto" mode, as mentioned earlier, actively applies recommendations to both new and existing pods. 

Kubernetes Vertical Pod Autoscaler incorporates several safety measures to prevent potential issues that could arise from frequent resource adjustments. It respects pod disruption budgets to ensure service availability, implements hysteresis in its decision-making process to avoid oscillations, and can be configured with min/max boundaries to prevent extreme resource allocations.

How Vertical Pod Autoscaler (VPA) Relates to Horizontal Pod Autoscaler (HPA)

While VPA and HPA may seem similar, they serve different purposes and can actually work together to provide comprehensive scaling solutions for Kubernetes workloads. Understanding the relationship between these two autoscalers is crucial for implementing effective resource management strategies in Kubernetes clusters. K8s Vertical Pod Autoscaler focuses on adjusting resources within individual pods, ensuring each container has the right amount of CPU and memory. It's useful for applications with varying resource needs or those that are difficult to scale horizontally.

›› Explore what HPA is, how it works, the implementation of HPA, best practices, and limitations.

For example, a database pod might benefit more from vertical scaling (increasing resources) than horizontal scaling (adding more replicas). HPA, on the other hand, manages the number of pod replicas based on metrics like CPU utilization or custom metrics. It's ideal for stateless applications that can easily scale out to handle increased load. When traffic increases, HPA can add more pod replicas to distribute the load, and when traffic decreases, it can scale down the number of replicas to conserve resources. The relationship between VPA and HPA is complementary. VPA ensures each pod is correctly sized, while HPA determines how many of these optimally sized pods are needed to handle the current workload. This combination can lead to more efficient resource utilization across the cluster. 

Note: Using VPA in "Auto" mode with HPA can lead to conflicts. When VPA adjusts resource requests, it can trigger HPA to scale the number of replicas, causing unpredictable behavior. To avoid this, it's recommended to use VPA in "Initial" mode when used in conjunction with HPA. This allows VPA to optimize the initial resource allocation for new pods while letting HPA handle the scaling of pod replicas based on the overall workload.

Implementation of Kubernetes Vertical Pod Autoscaler

Pre-requisite:

1. First, ensure your Kubernetes cluster is running version 1.11 or later

2. You'll need to have the Metrics Server installed and configured in your cluster, as VPA relies on it for gathering resource utilization data.

K8s Vertical Pod Autoscaler is not part of the standard Kubernetes distribution and needs to be installed separately. Then you can deploy it using the manifests provided in the Kubernetes autoscaler repository.

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
git checkout origin/vpa-release-1.0
./hack/vpa-up.sh

This script will deploy the necessary components, including the VPA Admission Controller, Recommender, and Updater. 

vpa-admission-controller-7467db745-kbhhc   1/1     Running                
vpa-recommender-597b7c765d-4sm2m           1/1     Running                 
vpa-updater-884d4d7d9-qk97b                1/1     Running 

Once VPA is installed, you need to create VPA objects for the deployments you want to autoscale.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 100m
        memory: 50Mi
      maxAllowed:
        cpu: 1
        memory: 500Mi
      controlledResources: ["cpu", "memory"]

This YAML defines a VPA object that targets a deployment named "my-app" and operates in "Auto" mode. It also specifies minimum and maximum resource limits to prevent extreme scaling decisions.

You can use kubectl to inspect the VPA object and its status:

kubectl describe vpa my-vpa

Using Vertical Pod Autosclaer with Horizontal Pod Autoscaler

As mentioned earlier, using VPA and HPA together requires careful consideration to avoid conflicts. A recommended approach for combining these two autoscalers effectively: Use VPA in "Initial" mode which allows it to adjust resource settings for new pods without interfering with HPA’s scaling decisions.

Here’s an example:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-vpa
spec:
  targetRef:
    apiVersion: “apps/v1”
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Initial"

Then, configure HPA to manage the number of replicas:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: my-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
    maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 50

This allows VPA to adjust the resource allocation for each pod while HPA manages the number of replicas based on CPU utilization. It's important to monitor both VPA and HPA behavior closely when using them together.

Combining K8s VPA for Memory Scaling with HPA for CPU Scaling

You might want to configure Kubernetes Vertical Pod Autoscaler to manage vertical scaling specifically for memory while allowing HPA to handle horizontal scaling based on CPU utilization. This approach ensures that VPA and HPA don’t conflict by controlling the same resource.

Configure HPA to scale based on CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cpu-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Then, configure VPA to manage only memory scaling by specifying memory as the controlled resource:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: memory-scaler
spec:
  targetRef:
    apiVersion: “apps/v1”
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      controlledResources: ["memory"]

By setting controlledResources to only include memory, VPA will adjust memory allocations without affecting CPU resources, which are managed by HPA. This separation prevents conflicts and ensures efficient resource management.

Kubernetes Vertical Pod Autoscaler Best Practices

1. Kubernetes Vertical Pod Autoscaler offers three modes: "Off", "Initial", and "Auto". While "Auto" mode provides continuous adjustments, it can cause pod restarts. For critical workloads or stateful applications, consider using "Initial" mode to set initial resource requests without causing restarts during runtime.

2. Implement Pod Disruption Budgets (PDBs) to ensure that a minimum number of pods remain available during VPA-induced restarts, helping to maintain service availability.

3. Ensure your pods have well-configured liveness and readiness probes. This helps Kubernetes manage pod restarts more gracefully when VPA needs to apply new resource settings.

4. While K8s Vertical Pod Autoscaler primarily uses CPU and memory metrics, consider implementing custom metrics or resource estimation for applications with unique resource patterns or requirements.

5. Keep your Vertical Pod Autoscaler Kubernetes components up to date with the latest stable version. This ensures you benefit from bug fixes, performance improvements, and new features.

6. Configure VPA to manage both resource requests and limits. VPA maintains the limit-to-request ratio specified in container templates, ensuring proportional scaling of both values when applying recommendations. For more information on limits control, check out this 

Vertical Pod Autoscaler Limitations

1. Kubernetes Vertical Pod Autoscaler focuses on pod resource usage without considering the available node resources. This can lead to recommendations that, while optimal for the pod, might not be feasible given the cluster's actual capacity, resulting in pods that can't be scheduled.

2. Java applications with their complex memory management through the JVM present a unique challenge. VPA may struggle to accurately gauge the true resource needs of these applications, leading to suboptimal scaling decisions. Also, it can't identify memory leaks and JVM CPU init bursts.

3. To implement resource changes, Kubernetes Vertical Pod Autoscaler needs to recreate pods. This process, while necessary for applying new configurations, can cause brief periods of unavailability for the affected workloads, which might be problematic for applications requiring high availability.

4. While K8s Vertical Pod Autoscaler works well in smaller environments, its performance in large, production-scale clusters with hundreds or thousands of nodes and pods remains a question mark. This uncertainty can be a significant concern for enterprises considering VPA for their large-scale deployments.

5. By focusing primarily on CPU and memory, Vertical Pod Autoscaler overlooks other crucial resources like network bandwidth and disk I/O. In I/O-intensive applications, this oversight can lead to performance bottlenecks that VPA won't address or may even exacerbate.

6. For stateful applications, the pod restart process during Kubernetes Vertical Pod Autoscaler updates can be more disruptive and may require additional considerations, such as proper handling of data consistency and state management during restarts.

7. The K8s Vertical Pod Autoscaleroperates based on historical resource usage data without considering workload revisions or updates. In modern, fast-paced environments where new versions of applications are frequently deployed, this can lead to suboptimal resource recommendations. VPA may apply the same resource adjustments to a newly deployed version that has different resource requirements than its predecessor. This limitation can result in unnecessary pod mutations and inappropriate resource allocations for new revisions, especially in environments with daily or frequent deployments.

Vertical Pod Autoscaler
A Practical Guide to the Kubernetes Vertical Pod Autoscaler (VPA) by Daniele Polencic from LearnK8s

Elevate your Kubernetes management with PerfectScale

PerfectScale’s automated Kubernetes optimization and governance platform simplifies the day-2 operations of production environments at scale. The platform empowers DevOps and Platform Engineering teams to safely and effortlessly rightsize their environments for peak resilience and availability while eliminating wasted resources and costs.

Let s look at the benefits of PerfectScale’s Automation compared to VPA:

- Prioritizes stability: Automatically eliminates wasted resources while ensuring the workload remains performant.

- Accounts for seasonality and revision history: Ensures any changes willbe made safely, and are aligned with the uniqueness of your environment.

- HPA-aware: Only takes automated action that will not impact yourcurrent autoscaling configurations.

- Resolve underprovisioning errors: Resolves throttling, OOM, and eviction issues OOTB, without the complex configuration required by VPA.

- Identifies potential memory leaks: Prevents recursive memory increases.

- Prescriptive and prioritized automation: Accounts for node costs, PodDisruptionBugets (PDB), and workload load patterns.

- Comprehensive optimization: Including Spark, Flink, and other ephemeral workloads.

- Automatic workload grouping: Provides automated actions uniformly across replicas. (VPA can result in replicas having different resource allocations leading to unpredictable behavior).

- Impact Aware : Only taking automated actions that will substantially reduce cost or improve performance (VPA is known for restarting pods for only minor changes).

- Configurable maintenance windows: complete control of when automated actions will occur.

PerfectScale is built to help you ‘scale K8s responsibility’ by continually balancing resilience and stability with cost-effectiveness. Join industry leaders like Paramount Pictures and Rapyd who have already optimized their Kubernetes environments with PerfectScale. Start a free trial now and experience the immediate benefits of automated Kubernetes cost optimization and management, ensuring your environment is always perfectly scalable.

PerfectScale Lettermark

Reduce your cloud bill and improve application performance today

Install in minutes and instantly receive actionable intelligence.
Subscribe to our newsletter
Kubernetes' Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory for your pods. Learn how VPA works, how it relates to HPA, and best practices for optimizing your resource management.
This is some text inside of a div block.
This is some text inside of a div block.

About the author

This is some text inside of a div block.
more from this author
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.