How to Balance Spot and On-Demand Instances in Node Pools

Balancing spot and on-demand instances in Kubernetes node pools is key to cutting costs while ensuring reliability. Spot instances are cheaper but can be terminated unexpectedly, making them ideal for flexible, non-critical tasks. On-demand instances, while more expensive, provide stable performance for essential workloads. Combining both types creates a cost-effective, resilient infrastructure. Here's how to get started:

Spot Instances: Cost up to 77% less but can be reclaimed with little notice. Best for tasks like batch processing or testing.
On-Demand Instances: Offer consistent availability at a higher price. Ideal for critical operations like databases or customer-facing platforms.
Hybrid Approach: Use spot instances for low-priority tasks and on-demand for essential services. Automate failover to handle interruptions seamlessly.

UK businesses can further optimise this balance by aligning instance usage with business hours, using diverse instance types, and leveraging automation tools like Kubernetes Cluster Autoscaler or Karpenter. Proper workload scheduling with taints, tolerations, and affinity rules ensures tasks are assigned to the right nodes. Regular monitoring and testing help maintain performance during interruptions.

This strategy can reduce infrastructure costs by 30–50% while maintaining stability. Keep reading for detailed steps on configuring and managing mixed-instance node pools effectively.

AWS re:Invent 2019: How Ticketmaster runs Kubernetes for 80% less without managing VMs (CON308-S)

AWS

Spot and On-Demand Instances Explained

Understanding the differences between spot and on-demand instances is essential for creating a cost-effective and reliable cloud strategy. Each type serves distinct purposes and has unique traits that influence both your operational reliability and cloud spending. Let’s break down these two instance types.

What Are Spot Instances?

Spot instances are essentially excess cloud capacity offered at heavily discounted rates. Cloud providers sell this spare capacity at a lower price, but there’s a trade-off: these instances can be reclaimed by the provider at short notice, typically with just two minutes’ warning.

The savings can be substantial - up to 77% lower compute costs [1]. For businesses in the UK, this could mean reducing a £1,000 monthly workload cost to just £230. Spot instances are perfect for workloads that can tolerate interruptions, such as batch processing, data analysis, development environments, and stateless web apps. The key is ensuring your application can recover quickly and avoid data loss if an instance is suddenly terminated.

Spot pricing isn’t fixed; it fluctuates based on supply and demand within each availability zone. Prices can edge closer to on-demand rates during peak times but drop significantly during off-peak periods. This variability requires careful monitoring and the ability to scale workloads flexibly.

What Are On-Demand Instances?

On-demand instances follow the pay-as-you-go model, offering guaranteed capacity at a fixed hourly rate. These instances remain available until you decide to terminate them, making them ideal for workloads that demand high reliability and consistent availability.

The pricing is stable and predictable, which simplifies budgeting. For example, a medium-sized instance might cost around £0.10 per hour. UK businesses should factor in exchange rates and potential fees when converting costs to GBP.

On-demand instances are best suited for mission-critical applications, such as production databases, customer-facing platforms, or workloads that require persistent connections and real-time transactions. They’re also ideal for storing critical state information that can’t be easily recovered. The trade-off is cost - on-demand instances are more expensive because they guarantee both performance and availability. However, for businesses with steady, predictable workloads, this premium is often worth it for the peace of mind it provides.

Why Use Both Instance Types Together?

Combining spot and on-demand instances creates a hybrid cloud strategy that balances cost savings with operational stability. This approach allows you to reduce expenses without compromising performance or reliability.

A hybrid setup leverages spot instances for non-critical, interruption-tolerant tasks, while on-demand instances handle critical workloads and unexpected demand spikes. If spot instances are interrupted, workloads can seamlessly shift to on-demand capacity, ensuring uninterrupted performance.

This approach also provides natural load balancing. During normal operations, spot instances manage most of the workload at a reduced cost. When demand surges or spot capacity becomes unavailable, on-demand instances step in to maintain performance.

The key advantage here is risk management. By distributing workloads across both instance types based on their criticality and tolerance for interruptions, you create a resilient infrastructure. This setup adapts to fluctuations in both cost and availability, ensuring stability.

For UK businesses, this strategy aligns well with the ebb and flow of daily and seasonal demand. Spot instances can handle predictable low-demand periods, while on-demand instances ensure capacity for peak times and essential operations. This combination offers flexibility and cost efficiency tailored to dynamic business needs.

Setting Up Node Pools with Mixed Instances

Creating node pools with a mix of instance types requires careful planning around ratios, regional distribution, and scaling automation. This approach helps balance cost efficiency with reliability. Below, we’ll explore how to determine the right instance ratios, configure types and zones, and automate scaling for smooth operations.

Choosing the Right Instance Ratio

Finding the right balance between spot and on-demand instances depends on your workload's nature and your tolerance for interruptions. For less critical tasks like development or testing, you can lean more heavily on spot instances since brief interruptions won’t significantly impact productivity. Similarly, batch processing workloads that can handle pauses are well-suited for a higher reliance on spot instances. However, web applications or services requiring consistent performance often demand a greater proportion of on-demand capacity.

In the UK, aligning instance ratios with business hours can optimise efficiency. For example, you might favour spot instances during off-peak hours and increase on-demand capacity between 09:00 and 17:00 GMT when demand is highest. Start conservatively, monitor performance and interruption rates, and tweak your setup based on real-world data to strike the right balance.

Instance Types and Availability Zone Setup

Using a variety of instance types reduces the risk of interruptions caused by spot market fluctuations. Instead of relying on a single instance family, distribute your workload across multiple instance types with comparable performance. This strategy increases the chances of maintaining capacity even if one type becomes unavailable.

Spreading nodes across multiple availability zones also strengthens resilience. For instance, if you’re deploying in the London-based eu-west-2 region, you could distribute your instances across eu-west-2a, eu-west-2b, and eu-west-2c. This not only reduces the risk of simultaneous interruptions but also bolsters disaster recovery efforts.

When it comes to instance sizes, smaller instances often have better spot availability but can increase management complexity. Larger instances, on the other hand, may be more efficient but are sometimes more vulnerable to interruptions during peak demand. A balanced approach - mixing general-purpose and compute-optimised types - can help maintain consistent performance despite fluctuations in availability.

Automation Tools for Scaling

Automation is key to managing mixed-instance node pools effectively. Tools like the Kubernetes Cluster Autoscaler can dynamically add nodes when demand grows beyond current capacity. In mixed-instance setups, you can create separate node groups for spot and on-demand instances and use pod specifications to allocate workloads accordingly.

For more advanced scaling, tools like Karpenter can be configured to automatically select the best instance types and availability zones based on current conditions. By defining provisioner specifications, you can outline preferences for spot or on-demand instances, acceptable instance types, and regional constraints, allowing Karpenter to optimise decisions for cost and performance.

Some organisations also use custom scaling scripts to monitor spot interruption rates and dynamically adjust the balance between instance types. These scripts can temporarily shift workloads to on-demand instances during periods of high spot market volatility. Pairing such scripts with automated health checks and alert systems ensures that your team can respond quickly to capacity issues, keeping performance steady even during challenges.

These automation strategies complement earlier decisions around instance ratios and placement, laying the groundwork for efficient workload scheduling in the next steps.

Need help optimizing your cloud costs?

Get expert advice on how to reduce your cloud expenses without sacrificing performance.

Schedule a 30 minutes, no-obligation call

Workload Scheduling Across Node Types

Once you've set up mixed-instance node pools, it's time to ensure workloads are directed to the right nodes. Kubernetes scheduling makes this possible by assigning critical applications to stable on-demand instances and letting non-critical tasks take advantage of spot capacity. Here's how you can make it happen.

Using Taints and Tolerations

Taints and tolerations in Kubernetes act like a lock-and-key mechanism. Taints, applied to nodes, repel certain pods, while tolerations on pods allow them to access those nodes. This method is particularly effective for keeping critical and non-critical workloads separate.

For example, you can taint spot nodes with something like node-type=spot:NoSchedule. This ensures only pods with the matching toleration can run on those nodes. Critical workloads, like production databases or payment processing systems, should avoid tolerations for spot nodes entirely, ensuring they remain on untainted on-demand nodes. Meanwhile, less critical tasks - like development environments, batch jobs, or stateless web apps - can include the appropriate toleration (node-type=spot:NoSchedule) in their configurations to utilise spot capacity.

Taint effects provide additional control. Use PreferNoSchedule to allow pods without tolerations to run on spot nodes only when no other capacity is available - handy during traffic spikes. Alternatively, NoExecute can immediately evict pods without tolerations, clearing spot capacity for other tasks.

Once you've set up taints and tolerations, you can refine scheduling further with affinity rules.

Setting Up Affinity and Anti-Affinity Rules

Affinity rules let you fine-tune where pods are placed, helping to optimise performance and resilience.

With node affinity, you can specify either strict or flexible preferences. For instance:

Use requiredDuringSchedulingIgnoredDuringExecution for strict rules, such as ensuring compliance-sensitive workloads run only on on-demand instances in specific zones.
Use preferredDuringSchedulingIgnoredDuringExecution for softer preferences, allowing workloads to favour certain nodes but remain flexible.

Labelling nodes makes this approach even more effective. Beyond standard labels like instance type, you can add custom ones - e.g., workload-tier=production or cost-priority=savings - to create more targeted scheduling policies. These labels make it easier to manage workloads during incidents or capacity planning.

Pod anti-affinity ensures replicas of the same application aren't clustered on a single node or zone. This is vital for maintaining uptime during spot interruptions. By spreading replicas across zones and node types, you reduce the risk of losing all replicas due to a single failure.

For even greater control, use topology spread constraints. These allow you to define rules like ensuring no more than two replicas of a service run on spot instances or requiring at least one replica per zone on on-demand nodes. This is especially useful in UK-based deployments, where service continuity is critical during regional disruptions.

Be cautious not to over-constrain your scheduler. Start with broad rules and adjust based on performance and interruption patterns. Monitoring scheduling failures is key - overly restrictive rules can prevent pods from starting when resources are tight.

Backup Plans for Spot Interruptions

Because spot instances can be reclaimed with just two minutes' notice, it's essential to have backup strategies in place to maintain service continuity.

Diversify node groups: Use a mix of instance types and sizes to reduce the risk of losing all spot capacity at once. For example, if your primary spot capacity uses m5.large instances, create additional groups with m5.xlarge, c5.large, or t3.medium instances.
Pod disruption budgets (PDBs): Set these to ensure at least 50% of replicas remain available during spot interruptions, giving autoscalers time to adjust.
Horizontal pod autoscaling (HPA): Combine HPA with cluster autoscaling to adapt quickly to capacity changes. When spot nodes are terminated, HPA can temporarily increase replica counts on remaining nodes, maintaining performance.
Node termination handlers: These tools can cordon off affected nodes, drain pods to healthier instances, and trigger alerts - all within the two-minute warning period.
Emergency on-demand capacity: Keep reserved on-demand node groups as a fallback. Use affinity rules to activate them only when spot capacity falls below acceptable levels, balancing cost and reliability.

Finally, consider adding circuit breakers to your applications. These can temporarily reduce feature complexity or defer non-essential tasks during capacity constraints, ensuring core functionality remains intact even during disruptions. This approach helps maintain a seamless experience for users, even under challenging conditions.

Trade-offs and Implementation Tips

Building on the node pool setup and workload scheduling strategies, there are several trade-offs and practical tips to enhance your hybrid deployment approach.

When balancing spot instances and on-demand instances, you’re essentially trading cost savings for performance reliability. Spot instances are budget-friendly but come with the risk of interruptions, while on-demand instances provide stability at a higher price. To navigate this, align instance types with workload criticality. For example, use spot instances for non-critical tasks and reserve on-demand instances for essential operations. Automating failover procedures is key here - manual intervention isn’t practical and can lead to delays [2][3].

Monitoring and Testing Requirements

Effective monitoring is essential for managing hybrid deployments. Use cloud-native tools and custom dashboards to track spot interruption frequencies and critical performance metrics. Analysing historical data allows you to identify trends and set precise alert thresholds, helping you respond to issues quickly.

Before rolling out to production, conduct automated failover tests. Simulate simultaneous interruptions to verify your recovery mechanisms are robust. This step complements the automation strategies we’ve discussed earlier, ensuring scaling and interruption handling are seamless.

UK Business Considerations

When deploying in the UK, it’s crucial to factor in local compliance and business practices. For instance, UK data protection regulations may influence decisions around data residency and the selection of availability zones. Scheduling maintenance and failover tests during off-peak hours can help minimise disruptions to critical operations.

For businesses looking to optimise their cloud infrastructure while achieving a balance between resilience and cost-effectiveness, consulting experts can make a significant difference. Hokstad Consulting offers specialised services to guide you through this process and ensure your deployment strategy aligns with both technical and regulatory demands.

Conclusion

Striking the right balance between spot and on-demand instances in Kubernetes node pools comes down to matching the instance types with the needs of your workloads. Spot instances work well for tasks that can handle interruptions, while on-demand instances are better suited for critical services that demand consistent availability. Getting this balance right helps create node pools that are both resilient and cost-efficient.

To make this strategy effective, focus on setting an appropriate spot-to-on-demand ratio, leveraging taints and tolerations for smarter scheduling, and automating failover mechanisms to handle spot instance interruptions seamlessly. Regularly testing failover processes ensures your deployments stay robust under real-world conditions.

For businesses in the UK, this approach can reduce infrastructure costs by 30–50%, all while adhering to compliance standards and operational requirements. By fine-tuning your node pools and keeping a close eye on monitoring, you can strike a manageable balance between cost savings and stable performance.

FAQs

What’s the best way to balance spot and on-demand instances in Kubernetes node pools?

When deciding the right mix of spot instances and on-demand instances, it all comes down to your workload's stability and how you prioritise cost versus reliability. A good starting point is often 70% spot instances and 30% on-demand instances, but this can be adjusted as you assess performance and operational requirements.

Spot instances are a budget-friendly choice, though they come with the risk of interruptions, making them ideal for tasks that are flexible or not critical. On the other hand, on-demand instances offer the dependability needed for essential operations. Keep an eye on your workloads regularly and tweak this balance to ensure you're getting the best mix of cost savings and system reliability.

If you're looking for personalised advice, experts like Hokstad Consulting can help. They specialise in fine-tuning cloud infrastructure and cutting hosting costs for businesses.

What are the best practices for handling spot instance interruptions automatically?

When dealing with spot instance interruptions, it’s crucial to have measures in place to keep your systems running smoothly. One effective approach is automating failover processes. Tools like AWS Node Termination Handler can help by detecting interruptions early. Pair this with autoscaling groups configured with capacity-optimised allocation strategies to ensure workloads seamlessly shift to on-demand instances, reducing the risk of downtime.

Another smart move is incorporating load balancers into your setup. They can dynamically adjust tasks, scaling them up or down as needed. This keeps your applications running smoothly, even during interruptions, ensuring consistent availability and reliability.

How can UK businesses stay compliant with data protection laws when using a mix of spot and on-demand instances?

UK businesses can stay on the right side of data protection laws by keeping sensitive data stored and processed within UK borders. This approach aligns with data sovereignty and localisation rules. To safeguard this data, organisations should focus on strong security practices like encryption, strict access controls, and constant monitoring - whether using spot or on-demand cloud services.

When choosing a cloud provider, it's essential to confirm they comply with UK regulations such as GDPR and the Cyber Resilience Bill. Providers should also adhere to the National Cyber Security Centre's (NCSC) security principles. To strengthen compliance and build stakeholder trust, businesses should conduct regular audits, establish solid governance frameworks, and ensure clear, well-defined Data Processing Agreements are in place.