How to Configure Cluster Autoscaler for Savings

Struggling with high cloud costs? Kubernetes Cluster Autoscaler (CAS) can help you save money by automatically adjusting your cluster size based on real-time workload demands. Here's a quick overview of how it works and why it matters:

Scale Up and Down Automatically: CAS adds nodes when workloads increase and removes underutilised nodes to cut costs.
Avoid Over-Provisioning: Research shows 65% of containers use less than half their requested capacity. CAS rightsizes resources to prevent waste.
Use Cheaper Spot Instances: CAS works with AWS Spot instances, which are up to 90% cheaper than On-Demand options.
Save Thousands Annually: For a business spending £10,000/month on cloud resources, CAS can save £2,000 monthly - or £24,000 annually.

Key Setup Tips:

Use supported Kubernetes versions (1.27+ for AWS).
Configure IAM permissions and node group consistency.
Analyse workload patterns to optimise scaling behaviour.
Set cost-saving goals, e.g., reduce monthly compute costs by 20–30%.

Advanced Features:

Fine-tune settings like scale-down-utilisation-threshold for aggressive cost savings.
Use Spot instances for non-critical workloads and Mixed Instance Policies for diversity.
Protect critical services with Pod Disruption Budgets.

Cluster Autoscaler Tutorial (EKS Pod Identities): AWS EKS Kubernetes Tutorial - Part 5

EKS

Prerequisites and Planning for Cluster Autoscaler

Getting your Cluster Autoscaler setup right from the start is essential if you want it to be cost-efficient. The key lies in nailing the technical groundwork, understanding how your workloads behave, and setting clear financial goals. Without these steps, your autoscaler might end up being a costly experiment rather than a money-saving tool.

Technical Requirements

First things first - check version compatibility. If you're using AWS, you’ll need Kubernetes version 1.27 or later [4]. For other cloud providers supported by the Cluster API, Kubernetes version 1.16 or higher is required [3].

Next, make sure you’re working with a supported cloud provider like AWS, Azure, or Google Cloud Platform (GCP) [4]. For AWS users, this means setting up an OIDC provider and tagging your Auto Scaling Groups correctly. Use these tags:
k8s.io/cluster-autoscaler/enabled = true
k8s.io/cluster-autoscaler/<cluster-name> = owned [6].

IAM permissions are another critical step. Assign only the permissions needed for managing node pools and other resources [6]. This involves creating specific IAM policies, service accounts, and roles to allow scaling decisions while keeping security intact.

Consistency across node groups is also important. Make sure they share similar scheduling properties - like labels, taints, and resource allocations - to ensure predictable scaling [6]. Defining resource requests for each pod provides the autoscaler with accurate data, helping it make better scaling decisions [5][6].

Once you’ve got the technical setup sorted, it’s time to dive into workload analysis.

Understanding Workload Patterns

The next step is to study your workload patterns to spot areas for improvement. For example, look at average resource utilisation - maybe your CPU usage averages 10% while memory sits at 23%. Also identify peak times, like weekday mornings or bank holidays in the UK, to set parameters such as unneededTime and utilizationThreshold [8]. Dynamic workloads might need a longer buffer to avoid constant scaling, while predictable workloads can handle tighter thresholds [7].

Keep an eye on traffic spikes too. For instance, if your application sees a surge in demand during morning commutes or seasonal sales, you’ll need to decide whether to scale aggressively or adopt a more gradual approach [7].

Workload analysis can also highlight the benefits of using multiple node groups for cost efficiency [5]. Different workloads might perform better on specific instance types - CPU-optimised nodes for compute-heavy tasks or high-memory nodes for applications that need a lot of RAM.

Setting Cost-Saving Goals

To align your autoscaler configuration with financial objectives, start by setting measurable cost-saving targets in pounds. Begin with an audit of your current cloud spend. If you’re using a mix of On-Demand and Spot Instances, you could potentially save an average of 59%. Spot-only clusters can push those savings up to 77% [8].

Based on your findings, set monthly savings targets. For instance, if your compute costs are £15,000 per month with noticeable underutilisation, aim for a 20–30% reduction. That’s a monthly saving of £3,000–£4,500, or £36,000–£54,000 annually.

Reserved Instances are another way to cut costs, offering up to 72% savings compared to pay-as-you-go rates [10]. One European company managed to save over £100,000 per month - more than £1.2 million annually - by aligning Reserved Instance purchases with their actual needs [10]. Just remember, these discounts work on a use it or lose it basis [10].

When setting your goals, consider factors specific to the UK. For example, many businesses operate on a standard schedule, so scaling down during evenings, weekends, and bank holidays can lead to significant savings [10].

If your organisation has sustainability goals, efficient autoscaling can also help reduce your carbon footprint. This aligns with the UK’s target of net-zero greenhouse gas emissions by 2050 [9]. Plus, it’s worth noting that 77% of consumers prefer buying from brands that prioritise sustainability [9].

Finally, track your progress using cost management tools from your cloud provider. Set budgets and alerts to ensure your autoscaler configuration stays on track with your savings goals and to uncover further optimisation opportunities [10].

With these prerequisites and strategies in place, you’re ready to configure the autoscaler to maximise cost efficiency.

Step-by-Step Guide to Configuring Cluster Autoscaler

Setting up Cluster Autoscaler on your cloud platform can help you manage resources effectively and save costs. The process varies depending on the provider, so follow the commands below to get started.

Enabling Cluster Autoscaler

For Azure AKS, you can either enable autoscaling during cluster creation or add it to an existing setup.

To create a new AKS cluster with autoscaling:

az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --node-count 1 \
  --vm-set-type VirtualMachineScaleSets \
  --load-balancer-sku standard \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 3 \
  --generate-ssh-keys

To enable autoscaling on an existing cluster:

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 3

For Google GKE, you can enable autoscaling when creating a new cluster:

gcloud container clusters create CLUSTER_NAME \
  --enable-autoscaling \
  --num-nodes NUM_NODES \
  --min-nodes MIN_NODES \
  --max-nodes MAX_NODES \
  --location=CONTROL_PLANE_LOCATION

Or, enable it for an existing node pool:

gcloud container clusters update CLUSTER_NAME \
  --enable-autoscaling \
  --node-pool=POOL_NAME \
  --min-nodes=MIN_NODES \
  --max-nodes=MAX_NODES \
  --location=CONTROL_PLANE_LOCATION

For AWS EKS, enabling autoscaling involves setting up IAM roles and tagging Auto Scaling Groups. Managed Node Groups simplify this by automating provisioning and lifecycle management while supporting various Spot instance types [1]. If you're working across multiple availability zones, ensure you use one node pool per zone and enable --balance-similar-node-groups for even scaling [2].

Once enabled, you can fine-tune configurations to optimise scaling behaviour and minimise costs.

Key Configuration Parameters

The right configuration settings can make a big difference in cost management.

Here are some key parameters to consider:

scale-down-unneeded-time: This defines how long a node must stay idle before being eligible for removal. While the default is 10 minutes, you might extend this to 15–20 minutes for businesses with predictable daily patterns, reducing unnecessary churn during short lulls.
scale-down-utilisation-threshold: The default threshold is 0.5 (50%), meaning a node can be removed if its resource usage drops below this level. Lowering it to 0.3 or 0.4 can help consolidate resources more aggressively.

The table below highlights essential AKS settings:

Setting	Description	Default Value	Cost-Optimised Suggestion
`scan-interval`	How often the cluster is reevaluated for scaling	10 seconds	10 seconds (keep default)
`scale-down-delay-after-add`	Delay before scaling down resumes post scale-up	10 minutes	15 minutes (reduces churn)
`scale-down-unneeded-time`	Idle time before a node is eligible for removal	10 minutes	15–20 minutes (business hours)
`scale-down-utilization-threshold`	Utilisation threshold for node removal	0.5	0.3–0.4 (more aggressive)
`balance-similar-node-groups`	Balances nodes across similar pools	false	true (better distribution)
`expander`	Strategy for selecting node pools during scale-up	random	least-waste (cost-efficient)

The expander setting is particularly important. The least-waste option focuses on balancing resources to accommodate future pods, while binpacking aims to fill nodes more densely to reduce costs [7]. For businesses with steady workloads, least-waste strikes a good balance between cost and performance.

Max-graceful-termination-sec determines how long the autoscaler waits for pods to terminate during scale-down. The default is 600 seconds (10 minutes), but for applications that shut down quickly, reducing this to 300–400 seconds can speed up cost savings.

Enabling ignore-daemonsets-utilisation is also advisable if your cluster uses DaemonSets for monitoring or logging. This prevents these services from affecting utilisation metrics unnecessarily.

Node Labels and Taints for Workload Placement

Using node labels and taints helps allocate resources efficiently, ensuring that workloads are matched with appropriately configured nodes. This prevents lightweight tasks from running on high-spec, costly nodes and maximises the use of Spot instances.

Taints: These act as barriers, ensuring only pods with matching tolerations can be scheduled on certain nodes. For example:
- Apply gpu=true:NoSchedule to reserve GPU-enabled nodes for machine learning tasks.
- Use memory-optimised=true:NoSchedule for memory-intensive applications.
- For Spot instances, apply spot-instance=true:NoSchedule to ensure only fault-tolerant workloads are placed there. This can save up to 90% compared to On-Demand instances [1].

In maintenance scenarios, you can temporarily apply a taint like node.kubernetes.io/unreachable:NoSchedule to prevent new pods from being scheduled on nodes undergoing updates or repairs [11].

Node affinity can further refine scheduling by expressing preferences for specific node types without enforcing strict rules. This approach complements taints and tolerations to optimise workload placement.

When using Mixed Instance Policies, select a diverse range of instance types with consistent CPU, memory, and GPU specifications [1]. Keep your taint and toleration strategy well-documented, and always test it in a staging environment before deploying to production. Overuse of taints can complicate configurations, so start simple and scale complexity gradually.

For better deployment performance, consider overprovisioning with temporary pods that have negative priority. These placeholder pods are evicted when resources are needed, slightly increasing costs but improving scheduling speed [2].

Finally, protect critical pods from being evicted during scale-down by annotating them with cluster-autoscaler.kubernetes.io/safe-to-evict=false. This ensures essential services remain uninterrupted.

Need help optimizing your cloud costs?

Get expert advice on how to reduce your cloud expenses without sacrificing performance.

Schedule a 30 minutes, no-obligation call

Advanced Methods for Cost Efficiency

Once you've set up Cluster Autoscaler, there are more advanced techniques to help you trim costs further. These methods go beyond the basics, fine-tuning your cluster's performance to align with cost-saving objectives. For businesses in the UK, these strategies can make a substantial difference in managing cloud expenses.

Right-Sizing Node Pools

Right-sizing is all about tailoring your node pool configurations to match your workload demands. It ensures you’re using the most cost-effective node types for your needs while avoiding over-provisioning.

Understanding your resource usage is crucial here. By analysing how your containers use CPU and memory, you can adjust requests and limits to reflect actual consumption. As Azure documentation points out:

requests and limits that are higher than actual usage can result in overprovisioned workloads and wasted resources [13].

Node Auto-Provisioning (NAP) takes this concept further by dynamically creating node pools that fit scheduled workloads. While this reduces resource waste compared to static setups, it can slightly delay autoscaling when new node pools are created [12].

Here are some optimisation tips for specific cloud platforms:

Google Cloud: Opt for E2 machine types, which can save up to 31% compared to N1 machine types [12].
Azure: Use preset configurations like Dev/Test or Production Economy in the Azure portal [14].
Multi-architecture: Arm64 nodes can deliver up to 50% better price-performance for scale-out workloads compared to x86-based VMs [14].

Regular audits are essential. Remove idle node pools, unattached volumes, and enforce tagging to keep your environment lean [13]. Setting budgets and alerts will help you monitor spending and spot inefficiencies.

Once your node pools are optimised, the next step is exploring alternative pricing models.

Using Spot or Preemptible Nodes

Spot instances are a great way to cut costs, offering steep discounts [2]. However, these instances can be interrupted when cloud providers need the capacity back, so they’re best suited for fault-tolerant workloads.

To minimise the impact of interruptions, diversify your instance selection. Using a mix of instance families allows you to tap into multiple spot capacity pools, improving your chances of scaling successfully while reducing disruptions [2]. Mixed Instance Policies can help you achieve this diversity without creating too many node groups [2].

Isolation strategies are key to managing the unpredictable nature of spot instances. Keep on-demand and spot capacity in separate Auto Scaling groups, and use taints to ensure only specific pods can tolerate preemption. Tools like nodeSelector, taints, or affinity rules can help you assign workloads appropriately [15].

When setting up Mixed Instance Policies, choose a variety of instance types with similar CPU, memory, and GPU configurations [1]. Tools like the EC2 Instance Selector can help identify compatible types [2]. Be cautious, though: the autoscaler’s scheduling simulator prioritises the first instance type in the policy. If later types are larger, you risk wasting resources; if smaller, pods might fail to schedule due to insufficient capacity [2].

For spot instance configurations, consider conservative timers like scale-down-unneeded-time and scale-down-delay-after-add, especially for stateful or slow-start workloads [15]. Using the least-waste expander strategy can further optimise costs by prioritising the most efficient scaling options [2].

Configuring Pod Disruption Budgets

Pod Disruption Budgets (PDBs) are crucial for maintaining stability during disruptions. They ensure a minimum number of pods remain available, striking a balance between resource efficiency and application reliability [16].

Poorly configured PDBs can lead to inefficiencies. Setting minAvailable too high can block necessary scale-down operations, leaving nodes underutilised and driving up costs [16]. On the other hand, overly relaxed PDBs might allow too many disruptions, risking performance issues or downtime [16].

Best practices include using percentages for minAvailable or maxUnavailable rather than fixed numbers. This approach scales automatically with your deployments, eliminating the need for manual adjustments [16] [18]. Tools like Prometheus and Grafana can help you monitor PDB impacts, visualising disruptions and resource usage [16].

PDBs are widely used across industries. Financial services, telecommunications, SaaS, and healthcare sectors rely on them to maintain stability during maintenance or scaling events [17].

Regularly review and update your PDB configurations to ensure they align with your current deployment needs. Test changes in staging environments before applying them to production, and set alerts to notify your team of any PDB-related issues [16]. This proactive approach ensures your cluster stays balanced between cost efficiency and reliability.

Monitoring and Management

Once your Cluster Autoscaler is set up, keeping an eye on its performance and managing it effectively is crucial for maintaining cost efficiency. Without consistent monitoring, configurations can drift, leading to unexpected expenses. UK businesses, in particular, benefit from monitoring strategies that strike a balance between detailed visibility and operational cost control.

Monitoring Autoscaler Performance

A layered approach to monitoring works best, combining real-time updates, trend analysis, and diagnostic tools for troubleshooting.

Start with Kubernetes Events and ConfigMap, which offer real-time updates at no extra cost. These tools provide quick insights into scaling decisions and potential issues, accessible via standard kubectl commands.

For tracking patterns over time, Managed Prometheus (available through Azure Monitor Managed Prometheus) is a great choice. It focuses on essential metrics, helping you monitor scaling trends, resource usage, and cost patterns. Plus, it supports alerting and integrates with Grafana for visualisation.

Reserve diagnostic logging for in-depth problem-solving. While it provides detailed data, it can be costly and requires expertise to manage effectively.

Monitoring Method	Advantages	Limitations	Cost Considerations
Kubernetes Events & ConfigMap	Free, real-time updates, lightweight	Limited history, basic information	No additional cost
Logs via Diagnostic Settings	Detailed insights, historical data, queries	Expensive at scale, verbose, complex setup	Use targeted logs and limit retention
Metrics via Managed Prometheus	Trend analysis, alerting, Grafana support	Requires setup, less detailed than logs	Focus on key metrics, minimise data ingestion

When setting up alerts, prioritise metrics that directly impact costs, such as scaling frequency, failed attempts, and resource wastage. Avoid overwhelming your team by fine-tuning thresholds for actionable notifications.

The tighter your Kubernetes scaling mechanisms are configured, the lower the waste and costs of running your application. – Cast AI [19]

These practices help you catch inefficiencies early, keeping your operations streamlined and cost-effective.

Common Troubleshooting Scenarios

Understanding frequent issues with Cluster Autoscaler can save time and prevent costly disruptions. Most problems fall into a few common categories:

Pod status issues: Pods stuck in CrashLoopBackOff states often need their memory limits increased. Also, verify IAM permissions to ensure smooth scaling.
Scaling failures: Check autoscaler logs for errors, review deployment resource settings, and confirm that Horizontal Pod Autoscaler (HPA) configurations are correct. Misconfigured resource requests or limits are often the root cause.
Pod placement problems: When pods can’t be scheduled, it’s usually due to node capacity issues or mismatched resource requirements. Examine node and pod statuses, review autoscaler logs, and confirm access to the Kubernetes API server.
Configuration errors: These can silently undermine cost-saving efforts. Regularly audit your configuration files for syntax issues and ensure scaling thresholds and node group settings align with your goals.
Node labelling issues: Incorrect or missing labels can lead to inefficient workload distribution. Verify that labels on new nodes meet your pod scheduling needs.
Metrics availability problems: Scaling decisions rely on accurate metrics. Ensure your metrics server is operational and that HPA configurations are set up properly.

Addressing these issues promptly helps maintain smooth operations and avoids unnecessary expenses.

Regular Audits and Expert Support

Routine audits are essential for spotting inefficiencies and staying on top of costs. As Kubernetes adoption grows, effective management becomes even more critical for UK organisations.

Monthly resource reviews: Assess node sizing against actual workloads and adjust configurations to avoid over-provisioning or under-utilisation [20].
Quarterly configuration audits: Examine scaling policies, storage usage, and resource limits. Update quota settings based on workload changes and explore savings opportunities, such as using spot instances [1].
Cost monitoring automation: Automate the cleanup of unused storage volumes and set alerts for unexpected spending patterns [21].
Version compatibility checks: Ensure your Cluster Autoscaler version matches your cluster version to avoid inefficiencies caused by mismatches [2].

Hokstad Consulting offers tailored solutions for UK businesses looking to optimise cloud costs and streamline operations. Their expertise in cloud cost engineering and DevOps transformation can help align your technical setup with your financial goals.

Cost Optimisation is achieving your business outcomes at the lowest price point. – AWS Documentation [22]

To further enhance cost management, consider adopting FinOps frameworks. These frameworks promote financial accountability and help bridge the gap between technical operations and business objectives [21]. By combining technical monitoring with financial oversight, you can ensure your Cluster Autoscaler continues to deliver value as your infrastructure evolves.

Conclusion

From planning to configuration and monitoring, the Cluster Autoscaler offers a powerful way for UK businesses to manage cloud costs effectively. Achieving this, however, requires careful preparation, precise implementation, and consistent oversight to unlock meaningful savings.

Key Takeaways

Effective planning is the foundation for success. By analysing workload patterns, setting clear cost objectives, and defining technical requirements, businesses can avoid expensive missteps. Precise configuration is equally important - this means setting accurate resource requests and limits for pods, consolidating resources into larger NodeGroups, and using Mixed Instance Policies thoughtfully [1]. Real-world examples, such as the savings from AWS Spot and E2 machine types, highlight the potential for cost reductions [1][12].

Monitoring is the glue that holds it all together. Regular audits, performance checks, and proactive adjustments help prevent configuration drift, which can erode cost savings over time. Companies that adopt these practices often see impressive results - CAST AI clients, for instance, report an average 63% reduction in Kubernetes costs through effective autoscaler management [6].

When properly configured, combined autoscaling can take cost efficiency to the next level. Features like Pod Disruption Budgets ensure a balance between resource optimisation and maintaining high availability, especially when using cost-efficient Spot instances.

Armed with these strategies, UK businesses can take meaningful steps towards optimising their cloud environments.

Next Steps for UK Businesses

Focus on visibility first. Before making changes, analyse your current Kubernetes costs and resource usage patterns. This baseline will help you measure the impact of any adjustments and guide future decisions.

Adopt a phased approach. Start by configuring Cluster Autoscaler for non-critical workloads. Once the setup has been tested and validated, extend it to production environments. Create dedicated node groups for Spot instances to handle workloads that don’t need On-Demand capacity, and gradually introduce advanced autoscaler features [1].

Commit to ongoing reviews. Schedule monthly resource evaluations to align node sizes with actual workloads and perform quarterly audits of scaling policies. These regular check-ins will help identify new opportunities for cost savings and maintain optimal configurations.

For businesses ready to take cloud efficiency to the next level, expert advice can make all the difference. Hokstad Consulting offers specialised support in cloud cost engineering and DevOps transformation, helping organisations align their technical infrastructure with financial goals. Their tailored solutions can help you achieve the savings that a well-managed Cluster Autoscaler delivers.

FAQs

How does the Cluster Autoscaler work with AWS Spot Instances to reduce costs?

The Cluster Autoscaler works seamlessly with AWS Spot Instances by automatically adjusting the size of node groups that include these cost-effective resources. Spot Instances utilise AWS's surplus capacity, offering savings of up to 90% compared to On-Demand pricing.

By dynamically adding or removing Spot Instances based on workload needs, the Cluster Autoscaler ensures efficient scaling while keeping costs low. To make the most of these savings and maintain system reliability, it's wise to diversify Spot Instance pools, plan for potential interruptions, and use Auto Scaling groups strategically.

What challenges might arise when setting up Cluster Autoscaler, and how can they be addressed?

Configuring the Cluster Autoscaler can sometimes be tricky. You might encounter nodes that don't scale as expected, pods that fail to find a home, or delays in provisioning resources. These hiccups often arise from misconfigurations or the unique constraints of certain environments.

To tackle these issues, start by ensuring the autoscaler is set up properly to handle your specific workloads. Keep an eye on its logs - these can be a goldmine for spotting problems early. If you're working in environments with strict regulations or air-gapped systems, make sure you're aware of their limitations. Testing how the autoscaler behaves under different conditions is also key to fine-tuning its performance. By sticking to best practices and troubleshooting proactively, you can reduce the chances of scaling problems and keep things running smoothly.

How can UK businesses configure Cluster Autoscaler to meet their financial and sustainability goals?

To make your Cluster Autoscaler setup align with both financial targets and environmental goals, start by right-sizing your clusters. This means ensuring they’re not over-provisioned (wasting resources) or under-provisioned (lacking capacity). Consider using energy-efficient ARM-64 nodes, which can help lower energy usage and reduce your carbon footprint. Pairing autoscaling with scheduled scaling is another smart move - it helps optimise resource use and keeps costs in check.

To stay on top of your spending, leverage tools that offer cost allocation and budgeting insights. These tools allow you to track expenses closely and make better decisions. By adopting these approaches, you can cut down on cloud infrastructure costs while promoting sustainable practices, striking a balance between financial savings and environmental care.