April 8, 2023
Five Cloud billing horror stories and the reason I created Qumulus

I started Qumulus to build a better cloud—one that is better in performance, usability, security, and, most importantly, free of bill shock. It’s not uncommon to find memes like the one below floating around developer forums. People would laugh (because no one thinks they’re that stupid) and move on without giving it much thought.

But look beyond the memes, and you’ll see a very serious problem, which I will illustrate below with five actual AWS bill shock stories.

1 - $200 bill for a machine-learning tutorial

Let’s start small. In 2021, a student accidentally racked up a $200 bill for supposedly running a tutorial with AWS SageMaker. Though the post didn’t include details of how this happened, it’s likely that the student forgot to close their session after completing the tutorial, leaving charges to accumulate quickly, which could happen as AWS charges by the hour for active sessions. It’s also possible that the student exceeded the 250 free hours provided by the free tier and simply racked up the charge from his extended use.

2 - $4,600 for a recursive loop

A developer found himself with a $4,500 bill for accidentally creating a recursive loop (a serverless function was calling itself). Even though he had an AWS billing alarm set up at $300, he had already racked up $1,484 by the time he received his first alert, and $4,600 by the second. The developer learned the hard way that billing alarms can be delayed as it takes AWS time to collate billing info from various regions.

3 - $20,000 for a forgotten AWS free-tier account

Another student was slapped with a surprise bill after trialling an AWS free-tier account. The account supposedly got compromised and had run up a $20,000 bill. Though AWS waived the whole bill eventually, the whole episode took a few months to resolve. The takeaway here is clear: Anyone, even free-tier users can find themselves with a massive bill if they’re not careful. There are no guarantees.

4 - $65,345 for an auto-scaling misconfiguration

A co-founder of a 3D animation startup woke up to a five-figure bill due to an auto-scaling misconfiguration. “We render a lot of video, and we used AWS to scale up and down with demand. However, something broke, and as we scaled, resources never got deployed. So it just scaled, and scaled again.” he said in a LinkedIn post. AWS eventually agreed to reduce the bill to $26,000 after six months of review. 

5 - $1,300 in a day for unauthorised attempted writes to a private S3 bucket

More recently, a developer found a $1,300 bill for his empty, private AWS S3 bucket. The charge was due to a burst of 100 million S3 PUT requests from external systems attempting to store their backups in said bucket. How did these systems find this private S3 bucket? Because it happened to share the same name as a popular open-source tool. What’s interesting here is that the developer was still charged for unauthorised incoming requests, creating a sort of financial DDoS.

As you can see, bill shock is almost impossible to prevent without moving off AWS or a similar public cloud service. 

The AWS billing system is just too complex, and the one-day (or several-hour) delay of billing alarms renders them almost useless. AWS offers a free tier, but you’ll most likely use multiple services, some of which won’t be free. Plus, the bill won’t stop running just because your credit card is declined. 

Even if you know what you’re doing, there’s always a non-zero chance of you getting hit through no fault of your own. And when you do get hit, it will be a highly stressful, months-long process to overturn the charge, after which you’ll promptly delete your AWS account.

While I don’t think companies can forego the public cloud completely, particularly for production workloads, I do think there are a lot of non-critical, developmental workloads (e.g., batch processing, stateful applications, staging environments) that could be migrated to a fixed-rate private cloud to minimize the risk of bill shock. 

With Qumulus, I’m hoping to reverse the order of adoption: Build on private cloud by default, before scaling up to public cloud. Hopefully, we’ll see far less of these horror stories on the internet when that happens.