What is HorizontalPodAutoscaler in Kubernetes?

Kubernetes has emerged as a standard for container orchestration, providing a powerful platform for managing and scaling containerized applications.

Scalability is essential in software delivery, and Kubernetes offers built-in features that automatically scale application resources to match demand. As a result, applications can accommodate increased traffic without compromising performance. HorizontalPodAutoscaler (HPA) is one of these features.

This article explains what horizontal pod autoscaling is , how it works, and provides a walkthrough example of how to use HPA in kubernetes.

HorizontalPodAutoscaler is a scaling-on-demand feature provided by Kubernetes as an alternative to manually scaling individual pods. HPA automatically scales up or scales down the number of running pods based on resource metrics like pod CPU or memory utilization and other custom metrics like client request per second.

How does HorizontalPodAutoscaler work?

Kubernetes requires the installation of the Metrics Server to enable autoscaling. The Metrics Server gathers the necessary metrics, such as CPU and memory utilization for the nodes and pods in your cluster. Its major function is to provide resource utilization metrics to Kubernetes autoscaler components.

The Metrics Server continuously monitors the resource request metrics from the application workload. The observed metrics are compared to HPA target metrics listed in the HPA manifest.

If the HPA target metrics are reached, the application is scaled up to meet the demand.

Deploying HPA in Kubernetes

This section is a walkthrough example of how HPA can be set up to automatically scale application pods based on CPU utilization.

We will learn how to:

  • Deploy a Metrics Server on kind.

  • Create HPA for our applications.

  • Test the HPA setup.

1. Deploy the Metrics Server

HPA relies on the Metrics Server to collect and expose resource utilization metrics. To deploy the Metrics Server, follow these steps:

To enable Metrics Server on kind, run the command.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml

Verify the deployment by running :

kubectl get svc -n kube-system

2. Create a sample deployment.

You need an HPA target to demonstrate HorizontalPodAutoscaler. Make a sample deployment that includes the requested resource.

The manifest below is for an Nginx server application that runs a container using the latest Nginx image and exposes it.

apiVersion: v1
kind: Service
metadata:
name: my-nginx-service
labels:
app: web
spec:
ports:
- port: 80
selector:
app: web
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-deployment
labels:
app: web
spec:
replicas: 1
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
Nginx.yaml

Run the following command to apply the deployment:

kubectl apply -f nginx.yaml

3. Configure HPA

Now that the application is up and running, create the autoscaler. You can use a manifest file or the kubectl command.

To autoscale with the kubectl method, run the following command.

$kubectl autoscale deploy my-nginx-deployment --min=1 --max=5 --cpu-percent=60

The command creates an autoscaler that deploys a minimum of one pod and a maximum of five pods when the my-Nginx-deployment reaches a CPU utilization of 60 percent.

To use a manifest file:

  1. Create the manifest "hpa.yaml"

  2. Apply the deployment

$nano hpa.yaml
$kubectl apply -f hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-nginx-deployment
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
HPA deployment

You can verify the HPA deployment using the command kubectl get hpa

4. Checking if HPA works

To test the autoscaler, you'll need to deploy another pod that will serve as a load generator for the sample application.

  1. Create the below manifest "load-generator.yaml"

apiVersion: apps/v1
kind: Deployment
metadata:
name: load-requests
labels:
app: load-requests
spec:
replicas: 1
selector:
matchLabels:
app: load-requests
template:
metadata:
labels:
app: load-requests
spec:
containers:
- command:
- "/bin/sh"
- "-c"
- "while true; do wget -q -O /dev/null my-nginx-service; done"
name: load
image: busybox:1.28
A simple load generator
  1. Run the following command to start the deployment:

kubectl apply -f load-generator.yaml

You can monitor the Nginx deployment CPU usage by running the command kubectl top pods

kubectl top pods:

Wait a minute and check how HPA responds to the load.

Use the command kubectl get hpa to see the resources.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: load-requests
  labels:
    app: load-requests
spec:
  replicas: 1
  selector:
    matchLabels:
      app: load-requests
  template:
    metadata:
      labels:
        app: load-requests
    spec:
      containers:
      - command:
            - "/bin/sh"
            - "-c"
            - "while true; do wget -q -O /dev/null my-nginx-service; done"
        name: load
        image: busybox:1.28

Conclusion

By automating the process of scaling pods, HPA ensures that applications stay responsive even when workloads increase. Organizations can improve the performance and efficiency of containerized applications by utilizing Kubernetes' HPA.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved