Kubernetes has emerged as a standard for container orchestration, providing a powerful platform for managing and scaling containerized applications.
Scalability is essential in software delivery, and Kubernetes offers built-in features that automatically scale application resources to match demand. As a result, applications can accommodate increased traffic without compromising performance. HorizontalPodAutoscaler (HPA) is one of these features.
This article explains what horizontal pod autoscaling is , how it works, and provides a walkthrough example of how to use HPA in kubernetes.
HorizontalPodAutoscaler is a scaling-on-demand feature provided by Kubernetes as an alternative to manually scaling individual pods. HPA automatically scales up or scales down the number of running pods based on resource metrics like pod CPU or memory utilization and other custom metrics like client request per second.
Kubernetes requires the installation of the Metrics Server to enable autoscaling. The Metrics Server gathers the necessary metrics, such as CPU and memory utilization for the nodes and pods in your cluster. Its major function is to provide resource utilization metrics to Kubernetes autoscaler components.
The Metrics Server continuously monitors the resource request metrics from the application workload. The observed metrics are compared to HPA target metrics listed in the HPA manifest.
If the HPA target metrics are reached, the application is scaled up to meet the demand.
This section is a walkthrough example of how HPA can be set up to automatically scale application pods based on CPU utilization.
We will learn how to:
Deploy a Metrics Server on kind.
Create HPA for our applications.
Test the HPA setup.
HPA relies on the Metrics Server to collect and expose resource utilization metrics. To deploy the Metrics Server, follow these steps:
To enable Metrics Server on kind, run the command.
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml
Verify the deployment by running :
kubectl get svc -n kube-system
You need an HPA target to demonstrate HorizontalPodAutoscaler. Make a sample deployment that includes the requested resource.
The manifest below is for an Nginx server application that runs a container using the latest Nginx image and exposes it.
apiVersion: v1kind: Servicemetadata:name: my-nginx-servicelabels:app: webspec:ports:- port: 80selector:app: web---apiVersion: apps/v1kind: Deploymentmetadata:name: my-nginx-deploymentlabels:app: webspec:replicas: 1selector:matchLabels:app: webtemplate:metadata:labels:app: webspec:containers:- name: nginximage: nginx:latestports:- containerPort: 80resources:limits:cpu: 500mrequests:cpu: 200m
Run the following command to apply the deployment:
kubectl apply -f nginx.yaml
Now that the application is up and running, create the autoscaler. You can use a manifest file or the kubectl
command.
To autoscale with the kubectl
method, run the following command.
$kubectl autoscale deploy my-nginx-deployment --min=1 --max=5 --cpu-percent=60
The command creates an autoscaler that deploys a minimum of one pod and a maximum of five pods when the my-Nginx-deployment reaches a CPU utilization of 60 percent.
To use a manifest file:
Create the manifest "hpa.yaml"
Apply the deployment
$nano hpa.yaml$kubectl apply -f hpa.yaml
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata:name: my-nginx-hpaspec:scaleTargetRef:apiVersion: apps/v1kind: Deploymentname: my-nginx-deploymentminReplicas: 1maxReplicas: 5metrics:- type: Resourceresource:name: cputarget:type: UtilizationaverageUtilization: 60
You can verify the HPA deployment using the command kubectl get hpa
To test the autoscaler, you'll need to deploy another pod that will serve as a load generator for the sample application.
Create the below manifest "load-generator.yaml"
apiVersion: apps/v1kind: Deploymentmetadata:name: load-requestslabels:app: load-requestsspec:replicas: 1selector:matchLabels:app: load-requeststemplate:metadata:labels:app: load-requestsspec:containers:- command:- "/bin/sh"- "-c"- "while true; do wget -q -O /dev/null my-nginx-service; done"name: loadimage: busybox:1.28
Run the following command to start the deployment:
kubectl apply -f load-generator.yaml
You can monitor the Nginx deployment CPU usage by running the command kubectl top pods
kubectl top pods:
Wait a minute and check how HPA responds to the load.
Use the command kubectl get hpa
to see the resources.
apiVersion: apps/v1 kind: Deployment metadata: name: load-requests labels: app: load-requests spec: replicas: 1 selector: matchLabels: app: load-requests template: metadata: labels: app: load-requests spec: containers: - command: - "/bin/sh" - "-c" - "while true; do wget -q -O /dev/null my-nginx-service; done" name: load image: busybox:1.28
By automating the process of scaling pods, HPA ensures that applications stay responsive even when workloads increase. Organizations can improve the performance and efficiency of containerized applications by utilizing Kubernetes' HPA.
Free Resources