Spot Instance Savings: Lessons from 5 Companies | Hokstad Consulting

Spot Instance Savings: Lessons from 5 Companies

Spot Instance Savings: Lessons from 5 Companies

Want to cut cloud costs by up to 90%? Spot Instances could be the answer. These discounted cloud resources help businesses save big by using spare capacity from providers like AWS, Google Cloud, and Azure. But they come with a catch: interruptions. Here's how five companies - Chegg, Samsung SDS, CAST AI, Salesforce, and Lyft - leveraged Spot Instances to reduce expenses without compromising performance.

Key Takeaways:

  • Savings Potential: Spot Instances can slash costs by 50–90%.
  • Best for Flexible Workloads: Ideal for batch processing, CI/CD pipelines, and machine learning tasks that tolerate interruptions.
  • Automation is Key: Tools like Elastigroup and CAST AI automate instance management, ensuring smooth operations even with interruptions.
  • Real Results:
    • Chegg: Saved 70% on EC2 costs (£84,000 annually).
    • Samsung SDS: Enabled clients to save up to 90% on AWS EC2.
    • CAST AI: Reduced monthly AWS costs by 90%.
    • Salesforce: Balanced cost savings with uninterrupted service for global clients.
    • Lyft: Cut compute costs by 75% with minimal code changes.

Spot Instances are a powerful way to cut cloud expenses, but managing them requires careful planning and automation. Whether you're a small business or a global enterprise, these examples prove the savings are worth the effort.

An Intro to AWS Spot Instances Learn How to Maximize Your Cost Savings

AWS

How Spot Instances Work for Cost Savings

Spot Instances offer a smart way for UK businesses to cut cloud computing costs.

Dynamic Pricing Based on Supply and Demand

Amazon EC2 adjusts Spot Instance pricing based on supply and demand trends, providing discounts of up to 90% compared to on-demand rates [3]. The system operates like an auction: as long as your bid remains higher than the current spot price, your instance runs. However, if demand spikes and the spot price exceeds your maximum bid, the instance may be interrupted [4].

Bidding Strategy

UK businesses can set a maximum hourly bid in pounds (£), but they only pay the current lower market price. For instance, if you’re willing to bid £0.10 per hour but the spot price is just £0.03, you’ll pay the lower rate. This approach works best for workloads that can handle occasional interruptions.

Workload Suitability and Flexibility

Spot Instances are ideal for tasks that don’t require continuous operation. These include data analysis, batch processing, and background jobs that can tolerate brief pauses [4]. Their flexibility makes them a great fit for non-time-sensitive applications.

Managing Interruptions for Critical Tasks

To handle potential interruptions, it’s crucial to design systems that are fault-tolerant. Techniques like checkpointing, saving states, and distributing workloads across multiple availability zones can help minimise disruptions. For mission-critical applications, advanced orchestration tools can even migrate workloads automatically to maintain uninterrupted service [4].

Automation and Management Tools

Automation plays a key role in successfully using Spot Instances. Tools designed for cost optimisation can handle tasks like bidding on instances, distributing workloads, tracking pricing trends, and shifting operations to the most cost-effective instance types or regions. These tools are particularly useful for managing workloads across different time zones and fluctuating demand periods, ensuring resources are scaled efficiently [4].

1. Chegg

Chegg, an online education platform, turned to Spot Instances to significantly lower its cloud expenses. By transitioning from a traditional monolithic application to a microservices architecture, Chegg demonstrated how careful planning and automation can lead to notable cost reductions.

Cost Savings Achieved

Back in 2016, Chegg managed to slash its EC2 costs by an impressive 70% after adopting Spot Instances[5]. To put that into perspective, their monthly expenses dropped from £10,000 to just £3,000 - saving roughly £7,000 each month, which adds up to £84,000 annually. This quick and substantial cost reduction showcases how effectively Spot Instances can optimise cloud spending when managed properly.

Workload Types Migrated

Chegg moved its microservices, which were running on stateless containers, to Spot Instances[5]. With the help of AWS ECS (Elastic Container Service), their infrastructure expanded from about 600 to around 1,200 EC2 instances. The stateless design of these containerised workloads made them ideal for handling the occasional interruptions that come with Spot pricing.

Automation and Tools Used

To simplify the management of Spot Instances for ECS, Chegg adopted Spot Elastigroup[5]. This tool automated the process, minimising the challenges of managing both Spot and reserved instances. They also utilised Spot Cluster Roll, which allowed them to adjust AMIs and startup scripts dynamically based on real-time workload demands. This level of automation streamlined infrastructure updates and reduced operational hurdles, making the system more efficient.

Key Challenges and Solutions

One of the main challenges Chegg faced was maintaining high availability while managing both Spot and reserved instances. Automation played a crucial role in overcoming this complexity. Steve Evans, Chegg's VP of Engineering Services, shared his team's experience:

Spot abstracts away all the nitty-gritty details of managing spot and reserved instances for ECS, with their support team and online resources providing us with in-depth ECS expertise. When we want best practices and tips for standing up and managing ECS, Spot is our first call.[5]

He also highlighted:

Chegg's successful adoption of microservices and containers, in large part, can be attributed to Spot keeping our infra cost and management to a bare minimum.[5]

2. Samsung SDS

Samsung SDS

Samsung SDS has integrated Spot Instance technology into its cloud management platform, GOV (Global One View), enabling customers to achieve substantial cost reductions.

Cost Savings Achieved

With this technology, customers can save up to 90% on AWS EC2 and 84% on Azure VMs [6]. For example, a business spending £10,000 per month on AWS EC2 instances could slash its costs to approximately £1,000 - resulting in a monthly saving of £9,000 and an annual reduction of nearly £108,000. These savings are made possible by leveraging surplus compute capacity across multiple cloud providers, even for enterprises without deep technical expertise.

Automation and Tools Used

Samsung SDS uses Elastigroup by Spot [6] to optimise costs. This tool automates the management of spare cloud instances and integrates seamlessly with platforms like AWS, Azure, Kubernetes, ECS, Elastic Beanstalk, and Jenkins, supporting a wide range of workloads.

Additionally, Samsung SDS has developed Zero Touch Mobility (ZTM), which automates device configuration and deployment by connecting with ServiceNow. Businesses using ZTM have reported a 50% reduction in support costs and a 40% decrease in total cost of ownership for managing enterprise device lifecycles [7].

Key Challenges and Solutions

One of the main challenges Samsung SDS faced was the rising costs and unpredictability of spare capacity instances [6]. To tackle this, Elastigroup automates instance management and ensures uptime through a Service Level Agreement (SLA). As Samsung SDS noted:

Spot provides the SLA that enterprises need for mission-critical and production applications. [6]

This solution has allowed Samsung SDS to consistently deliver impressive cost savings while maintaining the performance required for critical workloads. Their approach demonstrates a practical path to achieving efficiency and reliability, paving the way for more success stories in the future.

3. CAST AI

CAST AI

CAST AI takes cloud cost management to the next level by leveraging intelligent Spot Instance deployment. Their approach zeroes in on identifying the right workloads and automating the handling of interruptions and migrations. Like other examples discussed earlier, CAST AI highlights how automation plays a crucial role in getting the most out of Spot Instances.

Cost Savings Achieved

CAST AI has achieved impressive cost reductions. For instance, they slashed monthly AWS costs from $691.20 to $65.01 (around £560 to £53), representing a staggering 90% saving. Over the course of a year, this equates to savings exceeding £6,000[8].

On a broader scale, Kubernetes clusters optimised with partial Spot Instance usage saw an average cost reduction of 59%. For clusters running entirely on Spot Instances, the savings jumped to an average 77% cut in compute costs[8].

Workload Types Migrated

The platform excels at pinpointing workloads that are well-suited for migration to Spot Instances[9]. It supports a wide array of workloads, including:

  • Batch processing jobs
  • Containers and microservices
  • High-performance computing (HPC)
  • CI/CD operations
  • Distributed databases

One standout example is Yotpo, a SaaS marketing solutions provider. CAST AI automated Yotpo's Spot Instance lifecycle, seamlessly managing cost-effective provisioning and transitioning workloads between Spot and on-demand instances as needed[10].

Automation and Tools Used

The backbone of CAST AI's success lies in its robust automation capabilities. Notably, their Container Live Migration technology allows even stateful workloads to be migrated to Spot Instances[11]. The platform also automates transitions between Spot and on-demand instances, ensuring cost efficiency and uninterrupted operations[10].

Their suite of tools includes features like autoscaling, rightsizing, bin packing, and cluster scheduling, all designed to optimise cloud costs[9]. These tools proved invaluable during high-demand periods. As Achi Solomon, Director of DevOps at Yotpo, noted:

After integrating Cast, we didn't have to do anything during Black Friday, which is amazing. We gained not just compute cost reduction but also a reduction in engineer workload.[10]

One of the biggest hurdles in using Spot Instances is managing interruptions. CAST AI tackles this challenge with intelligent automation, ensuring smooth transitions to on-demand instances whenever Spot Instances become unavailable. By selecting the most cost-effective options, the platform helps maintain continuity. This approach has delivered 30–40% cost savings on Kubernetes workloads while also reducing the workload for engineering teams[10].

Need help optimizing your cloud costs?

Get expert advice on how to reduce your cloud expenses without sacrificing performance.

4. Salesforce

Salesforce

Salesforce has established itself as a trailblazer in using Spot Instances as part of its cloud infrastructure strategy [2]. As a global powerhouse in customer relationship management (CRM) and cloud computing, Salesforce showcases how large enterprises can balance cost-effective cloud solutions with uninterrupted service delivery.

By combining its leadership in the tech space with advanced automation, Salesforce maximises the efficiency of Spot Instances.

Automation and Tools Used

Salesforce implements an impressive volume of changes, making nearly 250,000 production updates every week. These updates are managed through automated deployment systems designed to uphold safety standards and minimise errors [13].

Adding to this, Salesforce relies on its AIOps Agent to handle operational issues. This intelligent system proactively detects 82% of incidents within its core CRM products, resolves 61% of these issues automatically, and efficiently directs 43% of unresolved cases to the right service teams [13]. On top of that, the company employs a customised Java Development Kit (JDK) to simplify compliance and streamline its operations [13].

These tools not only enhance operational efficiency but also tackle the complexities of managing large-scale systems.

Key Challenges and Solutions

Adopting cost-saving measures like Spot Instances comes with its own set of challenges, particularly in maintaining security. Salesforce addresses these concerns by using phishing-resistant multi-factor authentication to safeguard its automated systems [13].

But Salesforce’s automation strategy goes beyond infrastructure management. As David Schmaier, President and Chief Product Officer of Salesforce, puts it:

CEOs now focus on connecting with customers in simpler, cost-effective ways. The investments they make now will determine their success today and for the next decade. [12]

This vision is evident in Salesforce’s ability to transition over 85% of its customers to a new platform while maintaining service quality and keeping costs under control [13]. It’s a clear example of how they manage large-scale infrastructure transformations without compromising on performance.

5. Lyft

Lyft shows how even small adjustments can lead to massive savings. By switching to Spot Instances, the ride-sharing company managed to cut its monthly compute costs by a staggering 75%. This goes to show that cloud cost optimisation doesn’t always have to involve complicated changes.

Cost Savings Achieved

Lyft’s move to Spot Instances resulted in a 75% reduction in monthly compute costs[14]. To put this into perspective, a company spending £100,000 a month on compute resources would save £75,000 monthly - adding up to £900,000 in savings over a year.

Automation and Tools Used

What’s remarkable is how simple the change was. Lyft achieved this by tweaking just four lines of code[14]. They relied on automation tools to handle interruptions and manage workload distribution. These tools ensured a seamless transition to On-Demand instances whenever Spot Instances were unavailable.

Key Challenges and Solutions

One of the main challenges with Spot Instances is that they can be interrupted around 5% of the time[15]. Lyft tackled this by building a fault-tolerant, stateless infrastructure capable of quickly recovering from interruptions. Their automated system redistributed workloads seamlessly, allowing them to take full advantage of the cost-saving benefits of Spot Instances.

Cost Savings Comparison

When comparing these five companies, the impact of Spot Instances across different industries and workloads becomes clear. The savings achieved highlight how automation and strategic deployment can significantly cut costs. The table below provides a detailed breakdown of each company's performance.

Company Workload Type Cost Savings Automation Methods Key Challenges
Chegg Educational platform services 70% reduction in EC2 costs Auto scaling and diversified instance families Managing interruptions during peak study periods
Samsung SDS Enterprise applications & batch processing Up to 90% on AWS EC2 and 84% on Azure VMs Kubernetes with automated failover and multi-AZ deployment Ensuring high availability for enterprise clients
CAST AI E-commerce applications 90% reduction (monthly cost dropped from £691.20 to ~£65) Automated Spot Instance lifecycle management Handling interruptions for customer-facing applications
Salesforce CRM and cloud services Estimated 70% savings Containerised workloads managed with Kubernetes Maintaining service reliability across global infrastructure
Lyft Ride-sharing compute workloads 75% reduction in monthly compute costs Minimal code adjustments with automated redistribution Building fault-tolerant, stateless infrastructure

This comparison underscores a key takeaway: strategic use of Spot Instances can lead to significant cost reductions across industries. CAST AI stands out with a 90% cost reduction, slashing monthly expenses from £691.20 to approximately £65 by employing aggressive Spot Instance policies [2]. Lyft also achieved impressive savings, reducing costs by 75% with only minimal code changes.

Automation plays a pivotal role in these success stories. Companies that invest in advanced automation tools consistently secure greater savings while maintaining reliability. Stateless and batch processing workloads tend to benefit the most, as they can handle interruptions more effectively. Meanwhile, customer-facing applications require robust fault-tolerance measures to achieve similar financial advantages, as seen in the case studies.

We've improved the resilience of our application and reduced the chance of outages using diversified Amazon EC2 Spot Instances. – Isaac Gittins, Cloud Architect, amaysim [16]

Samsung SDS’s example highlights how even moderate savings - 60% in their case - can translate into substantial annual benefits for enterprises. At this scale, every percentage point of savings provides meaningful budget flexibility, reinforcing the value of optimised cloud strategies.

Expert Help for Spot Instance Setup

As these case studies show, cutting costs with Spot Instances is a real possibility. But to make it work seamlessly, expert guidance can make all the difference.

Switching to Spot Instances isn't as simple as flipping a pricing switch. It involves managing automated workloads, ensuring fault tolerance, and navigating dynamic cloud environments to see actual savings. This level of complexity often requires a professional touch to fully tap into the potential savings Spot Instances offer.

Hokstad Consulting specialises in helping UK businesses tackle these challenges through tailored cloud cost optimisation services. Their methods focus on slashing cloud expenses by 30–50% by implementing Spot Instances strategically and optimising overall infrastructure [18].

The process kicks off with a free assessment, where Hokstad Consulting analyses your current cloud setup. This step identifies workloads that can move to Spot Instances, assesses automation needs, and pinpoints opportunities for cost reduction while maintaining service reliability.

From there, DevOps transformation becomes the core of the strategy. Hokstad Consulting creates automated CI/CD pipelines and monitoring systems to ensure smooth failover during Spot Instance interruptions, keeping your operations running without a hitch.

To tackle the challenges of dynamic workloads, their custom automation development service builds lifecycle management systems. These systems are designed to keep services available while squeezing the most out of Spot Instance savings.

Strategic cloud migration ensures that transitioning to Spot Instance architectures is smooth and downtime is kept to a minimum. By using multi-availability zone deployments and automated failover mechanisms, Hokstad Consulting guarantees high availability throughout the process.

What’s more, Hokstad Consulting offers a No Savings, No Fee pricing model, where fees are capped as a percentage of the savings achieved [18]. This ensures their success is directly tied to how much they save you.

Their ongoing support includes continuous monitoring, performance tweaks, security checks, and infrastructure updates as your workloads evolve, ensuring your cloud environment stays optimised over time.

For UK businesses looking to cut cloud costs, professional help can turn Spot Instance adoption into a dependable strategy. With technical know-how, proven automation techniques, and a results-driven pricing model, Hokstad Consulting offers a solid path to achieving meaningful cloud savings.

Conclusion

Case studies highlight how Spot Instances can significantly lower cloud costs for UK businesses. Take SmartNews, for example: they cut expenses for their main compute workloads by up to 50%. Similarly, Rippling managed to save 60% on EC2 compute costs and slash total cloud expenses by 50%.

However, achieving success with Spot Instances isn’t just about implementation - it’s about strategy. These examples show that careful planning and execution are non-negotiable. Companies that thrive with Spot Instances start by evaluating workload suitability, setting up strong automation systems, and ensuring thorough monitoring.

Using Spot Instances has been a success story for the company. We already know what capacity we will need depending on the criticality of the news, and we scale up the systems appropriately. – Ankit Singhal, Group Head of Core Infrastructure at SmartNews [17]

Although Amazon reports interruption rates of less than 5% [1], managing Spot Instances effectively requires fault-tolerant architectures that can respond quickly to any disruptions.

For businesses aiming to tap into these savings, having expert support is crucial. Hokstad Consulting specialises in creating optimised cloud infrastructures, offering tailored solutions to design automated, resilient Spot Instance strategies that maximise savings while maintaining service reliability.

For UK businesses ready to reduce costs and improve efficiency, Spot Instances, combined with professional guidance, can become a powerful tool for operational success.

FAQs

What are Spot Instances, and how do they differ from on-demand cloud instances?

Spot Instances offer a budget-friendly option compared to on-demand cloud instances by taking advantage of unused cloud resources. The cost savings can be substantial - sometimes as much as 90% less. This makes them a great choice for tasks that don't require constant uptime, like batch processing, data analysis, or testing environments.

That said, Spot Instances come with a trade-off. Since they rely on surplus capacity, they can be interrupted by the cloud provider if those resources are needed elsewhere. On the other hand, on-demand instances, while pricier, provide consistent availability and are a better fit for critical applications where reliability is non-negotiable.

How can businesses effectively handle interruptions when using Spot Instances?

To handle interruptions with Spot Instances effectively, businesses can use a few smart strategies. One of the most important is automation - leveraging tools that can automatically retry tasks, redistribute workloads, and manage failovers. This helps reduce downtime and keeps operations running smoothly.

Another approach is designing workloads to be interruption-tolerant. This means structuring critical processes so they can continue functioning even when instances are interrupted.

You should also make use of proactive notifications. Spot Instance interruption notices and rebalance recommendations allow you to anticipate changes and adjust resources before disruptions occur. By integrating these alerts into your workflows, you can respond dynamically and maintain stability while still enjoying the cost benefits of Spot Instances.

What workloads are ideal for Spot Instances, and how can businesses check if theirs are compatible?

Spot Instances work best for stateless, fault-tolerant, and flexible workloads. Typical examples include big data analytics, containerised applications, web servers, high-performance computing (HPC), and development or testing environments. They're especially useful for tasks that can handle occasional interruptions and adjust their scale as needed.

To see if Spot Instances are a good fit, businesses should assess whether their workloads can tolerate interruptions, function without persistent storage, and adapt to fluctuating availability. These instances are a budget-friendly option for non-critical tasks where constant reliability isn't a top priority.