articles // Aug 21, 2024

Preparing to leave

Should your company move off cloud?

The other day someone suggested Dragonfly move off the cloud ‘the way 37Signals’ did. Investigating 37Signals’ move from the cloud onto their own hardware, I learned that they not only describe how easy the move was, but they also come across as recommending it. Despite their endorsement, I’m not sure that it is the correct decision for everyone.

In the days before the cloud

In the early noughties, at my first real job, I worked at a boutique web development company headquartered in a smallish office near the centre of Cape Town.

At the back of the office next to the kitchen was an unremarkable door that was always kept locked. As the lowly front-end developer, I was not permitted inside unaccompanied. Because behind that locked door, in an air-conditioned room, were the computers that ran our company’s software. Not just our email and source control but, more importantly, the web servers that served our client work to the internet.

In those days, to deploy software to the internet, we copied it across our internal network to a server less than 30 metres away.

Back then, this was normal and there was a skill set required inside every company to support it. You needed someone who knew which computers to buy, someone to connect those computers to the internet, and someone to install, configure, and maintain all the required software: the operating system, web server, mail server, database, firewalls …

A decade later, when I started my second startup, the landscape had changed drastically. Gone was the locked, air-conditioned room, and in its place was ‘the cloud’. You still managed your own server, but it was virtual, securely sharing slices of the compute, storage, and memory of a much larger physical server with other virtual servers. The physical server itself was owned and maintained by a completely different company.

To deploy software to the internet, you copied it across thousands of kilometres of undersea cables to a server in a distant country. You would need a passport just to stand next to the building that housed the server. If the cloud company were to let you inside, it would be a non-trivial task for them to work out which, of the hundreds of servers in there, currently contained your own virtual server.

Technology has continued to improve and cloud companies now offer even more holistic services. Today, to release bespoke software to the internet, all a developer needs to know is how to code and use source control. Everything else is managed for them, often by layers of companies. Databases, email and other services are available at the click of a button as fully automated add-ons.

Today, cloud services empower individual developers to achieve – at a massively reduced cost – what two decades ago it took a small knowledgeable team and a locked air-conditioned room full of computers to do.

So it came as a surprise when 37Signals seemed to take a few steps back and moved off the cloud and completely onto their own physical servers.

Putting 37Signals’ move off the cloud into context

Founded in 1999, 37Signals has made a name for itself as a highly opinionated, contrarian and – perhaps because of that – a very successful SaaS company. They produce Basecamp, HEY.com, and other software projects, boasting an estimated seven to 16 million users.

As a company, they’re technically very strong. Co-owner and CTO David Heinemeier Hansson created Ruby on Rails, a seminal web framework. Among other projects, they recently released Kamal – a software tool to simplify deploying containerised software.

Since moving off the cloud, Hansson has written a lot about the ease with which they achieved the move as well as the cost benefits, suggesting that owning the hardware is a much preferable option to the cloud.

But it’s worth bearing in mind that Basecamp was founded seven years before AWS launched, meaning one can assume that part of their software ran on their own physical hardware from the start, and perhaps never left it.

In 2022, 37Signals Ops Site Reliability Engineer Fernando Álvarez confirmed as much by saying: ‘The current Basecamp, which is our biggest application by far, as well as the older Basecamp 2, both run almost entirely on our own servers, including application, database, and caching servers.’¹

In fact, looking at the number of physical servers, by my estimate their move off the cloud increased their total physical server count by only 10%². As such, this move appears to be less of a change in how they operate and more a paradigm shift in their thinking, from we should have some presence on the cloud, to we don’t need to be on the cloud at all.

While they highlight the savings of buying their own physical servers, they never mention the costs of running them. I suspect they don’t mention these as they were already existing costs that didn’t change with their move. They were already renting full server racks² and they already had IT guys on retainer as well as a DevOps team servicing their existing servers³.

The move made sense for 37Signals, but does it make sense for everyone?

Is moving off the cloud right for you?

37Signals is an inspirational company, but the reality is that many companies would not be able to make the move as easily as they did. To better understand if the move would be right for your company we’ll walk through four questions:

Have you prepared?
Will you save money?
Is your hardware usage predictable?
Do you know how to run a private cloud?

1. Have you prepared?

Before 37Signals made the move, they first went through a process to optimise and refine their existing cloud infrastructure³. Not only does this process reduce and help to familiarise yourself with your current costs, but it will also help you know what your actual requirements are.

Infrastructure often grows organically. From time to time, it’s worthwhile reviewing and pruning what you have. This is especially true if you’re looking to move from the cloud back to ‘metal’ (hardware). You should reconsider your architecture in light of new technological developments and how your business may have changed. Lastly, your cloud provider may also offer suggestions of where cost improvements can be made.

Once this is complete you’ll be set up to answer the next question.

2. Will you save money?

A core reason for 37Signals’ move off the cloud is that they would save millions⁴. But, of course, you’ve got to be spending millions to save millions. That’s not to say your annual infrastructure costs need to be in the millions for a move like this to make sense, but hardware will work out more expensive in some cases.

37Signals has mostly underscored the benefits of the move as a financial decision. Indeed, there’s no other clear benefit, besides bragging rights, one would gain from this. The negatives are easier to see. Unless you already have a significant portion of your system on metal, building your own private cloud will increase the cognitive load on your company. This additional complexity may still be worth it, but only if you save enough as a result of the move.

To put it plainly, if you’re not going to be saving significant amounts of money from the move, the decision is clear: don’t move off the cloud.

The costs you should consider are servers, a data centre, and staff. Below, I’ve laid out a rough MVP case as an example. Take a look, tweak what you need to to meet your requirements, and if your cloud spend is above that of your final outcome, you can start to build a case for moving off the cloud.

Servers

In the cloud, if the hardware your software is running on is faulty it’s someone else’s problem. In fact, you’re paying for it to be someone else’s problem. But on your own hardware it’s your problem. If a server fails you will need to replace it. Since this can take weeks, you’ll need redundancy to switch over to keep your services up while you wait for reinforcements to arrive. This means you’ll need more than one server to carry the load. This is a very different experience from the cloud, where spinning up a new virtual server takes minutes, without needing to make sure someone plugs the right cables into the right sockets.

To determine the size of the server you’ll need to look at your current usage. One large physical server can support many smaller virtual machines. You can assign two virtual CPUs to one physical CPU core or one to a hyperthread. 1GB of virtual machine memory = 1GB of physical memory. You also need to set aside at least 1GB RAM and a CPU Core for the host operating system and KVM (the software that manages the virtual servers).

Finally, unless you’re sure you won’t be growing in the next five years, you should leave some slack to create additional virtual machines in the future.

After some negotiation⁵, 37Signals paid around $15,000 per server³. To simplify, we’ll use the same number. Since you’ll be buying at least two servers for redundancy, you’ll pay around $30,000 in total. We can amortise that over five years which is the expected lifespan of a physical server.

$30,000 / 5 years / 12 months = $500 / month

As touched on above, you should plan on replacing each physical server every five years. Either you can take the once-off knock or you can budget for this by adding another $500 / month, which would bring the total to $1,000 / month.

This is a very lightweight and generalised solution. If you’re seriously thinking about this, you should get a professional to design a server configuration that matches your company’s unique workload.

An alternative to buying your own hardware is renting hardware from a hosting provider. This is simpler but restricts you to hardware configurations that they offer and are willing to maintain. Most likely there will be a premium on the rental too which goes against the original goal of reducing costs.

Data centre

If this is the first time you’ve considered this, it’s unlikely that you’re in a position to build your own data centre like Facebook or Google do. Instead, you should look to rent rack space in an existing data centre.

37Signals rents racks in two separate cities. This redundancy builds geographic resilience into their infrastructure – if one location sinks below the ocean they can shift all their traffic to the other. This is the on-prem equivalent of running in more than one cloud region. However, while recommended, most companies don’t do this. The complexity of keeping databases in sync across multiple regions alone is a strong deterrent.

So, to establish a baseline cost, I’ll assume that you’re only considering a single location. You may double the costs of everything if you want to use two locations from the beginning.

If you’re starting out, you may only need to rent rack space for a few individual servers. Rack space is measured in Us (also known as RUs – rack units). One U is equal to 44.45mm in height. Most servers are 1 or 2Us tall, while some are as much as 4Us (177.8mm). These servers are wide and flat, designed to slot into a rack.

Above we decided we needed two servers. The new 37Signals servers were 2Us each, so we need to rent 4Us in total. Costs and services between data centres vary considerably, but for the purposes of this we’ll use the lower end of $50 all-in for 1U. Data centres often include a certain amount of electricity and bandwidth in the price of a U. So we’re adding to the budget at least $50 x 4U = $200 / month. You should reach out to your friendly neighbourhood data centre for a more accurate pricing.

Staff

There are two distinct skill sets needed to run on metal.

Firstly, the ability to manhandle the physical computers into the rack and plug the correct cables into the correct sockets. This can be handled by IT support staff. As some data centres offer this as a service I’ll leave this out of the calculation. 37Signals has hired a separate company to perform this task for them.

Secondly, you’ll need at least two senior DevOps Engineers and they’ll need a skill set which includes architecting a system which can run on metal. They’ll need to be able to install and maintain any other software you intend to use, which may include but is not limited to: KVM, Linux, databases, web servers, a cache, CI/CD pipelines, firewalls, central logging, queues, and Kubernetes. They should not only be able to detect if a server catches on fire, but also have a plan in place for what to do when that happens. They should also get very excited about security and resiliency.

These are all important considerations, because this is broader than the typical skill set of DevOps today. Most of them work only in the cloud and they don’t need to know about any of these things because it’s all done for them.

Lastly, they’ll need to be on call 24 hours a day. Servers are mercurial and may not choose convenient daylight hours to fall over. Having two DevOps engineers means they can work in rotating shifts. This is a bare bones team and should be grown for the sake of the engineer’s health if nothing else. Generally speaking, it’s advisable that for a team that needs to be on-call 24 hours a day, seven days a week, you’ll need a team of at least eight people.

In South Africa, a senior DevOps person is around R90,000 / month = $5,000 / month, so for two you are looking at $10,000 / month. This is a simple calculation so we’ll leave it at that, but it should be noted that the budget could be higher for someone with the above rare skill set. Also, it’s very likely that that skill set will be even more expensive in other countries.

3. Is your hardware usage predictable?

A few years ago, I was speaking to the CTO of an e-commerce company that had daily sales. Every morning at 9am the sale email would go out to all their subscribers and their traffic would spike as shoppers piled onto the site. Over the course of the day, the traffic would die down to almost nothing until the next morning’s spike.

He told me that just before 9am every morning, they would pre-emptively scale up their infrastructure to deal with the expected morning traffic. Over the course of the day, it would auto-scale down as the traffic dwindled. This process saved them a significant amount in server costs as opposed to always running at peak capacity. But this kind of scaling isn’t possible to do cost-effectively with hardware – a server costs the same whether it’s running at full capacity or lying unplugged in a cupboard.

There are other use cases where hardware isn’t going to make sense. As Hansson puts it: ‘The cloud excels at two ends of the spectrum … The first end is when your application is so simple and low traffic that you really do save on complexity by starting with fully managed services … It remains a fabulous way to get started when you have no customers, and it’ll carry you quite far even once you start having some. The second is when your load is highly irregular. When you have wild swings or towering peaks in usage. When the baseline is a sliver of your largest needs.’³

To build on his first point, if your company is an early-stage startup, hardware is not for you. You need the flexibility provided by the cloud to experiment. You don’t need the additional cognitive load that comes with hardware, all you should be thinking about is finding and growing your product market fit.

It also doesn’t make sense to incur upfront costs that need to be amortised over years when you only have months of runway. As Hansson further notes, ‘When you have no idea whether you need ten servers or a hundred. There’s nothing like the cloud when that happens, like we learned when launching HEY, and suddenly 300,000 users signed up to try our service in three weeks instead of our forecast of 30,000 in six months.’³

The lesson we can take from HEY is that, if you’re launching something new, do it on the cloud. Only once the product usage has become predictable should you consider moving to metal.

Since they’ve been in operation for almost 30 years, 37Signals’ traffic has been consistent enough that they were able to predict how much they would save over five years by leaving the cloud⁶.

Another way to think about it is this: If you’re willing to pay for an AWS Reserved Instance upfront, you’re already taking advantage of discounts afforded by longer term commitments to capacity, and therefore hardware is something you could consider.

4. Do you know how to run a private cloud?

At FlexClub we explored using a managed database versus self-managing in the cloud. We concluded that for us it made more sense to self-manage. It would be more cost effective in the long run (after the initial developer time investment to perform the set-up we would mostly only be paying for server costs), and we would have more control over some functionality such as backups.

That decision worked out as planned for us, but only because we already had a lot of experience in-house managing databases. I wouldn’t recommend self-managing databases to everyone. If your team doesn’t have experience with databases, it could be a time-consuming and frustrating experience which may lead to substandard outcomes for your developers and customers. In that case, it’s simply better business sense to use a managed database, despite the higher cost.

Hansson makes a point that even the cloud requires skilled people to operate it: ‘This is the central deceit of the cloud marketing, that it’s all going to be so much easier that you hardly need anyone to operate it. I’ve never seen it. Not at 37signals, not from anyone else running large internet applications.’³

This is true, you do still need people to manage the cloud. But you need less people with less skills and experience to operate successfully in the cloud.

37Signals has a powerhouse of a team, with broad experience in this domain and the support of external companies. This is not a small, overworked and underappreciated IT department that needs to upskill massively to run on bare metal, but a team of hardened veterans who were executing on something they were already familiar with. Migration aside, it’s just another Tuesday for 37Signals.

For a company with a technical team lacking experience in certain areas, using the services offered by most cloud providers, such as a managed database, is a game changer. You don’t need to have someone who knows how to install, configure, backup, or replicate a database. You just need someone who knows which buttons to press and then you can go back to focusing on your product.

Are you using managed services, such as RDS, ElastiCache, or EKS?

If your company is using any of the managed services offered by your cloud provider the first question is: why?

Often these services are more expensive than if you were to run them yourself in the cloud. If it’s because you don’t have the skills in-house to manage these services, you should consider this a red flag for switching from the cloud to metal. These services will not exist on-prem unless you put them there, so you’ll need to know how to install and maintain them to complete a successful move.

There are solutions for the skills gap, such as hiring or outsourcing for these skills. Upskilling, while attractive, will take longer and, due to a lack of experience, the result is unlikely to be optimal. If your team does want to upskill (and if they’re a good team they should), then let them learn from experienced engineers as they build the systems for you.

The result of meticulous planning, hard work and extreme bravery - a safe landing.

Conclusion

The success of 37Signals’ move speaks more to the technical powerhouse that 37Signals is than to the actual difficulty of the procedure.

Yes, moving off cloud can be cheaper – significantly so – if you have all the same boxes checked that 37Signals did. But moving onto metal is not going to save your company, unless the biggest strategic problem your company has is that it’s spending too much on the cloud.

Unless you were already running part of your infrastructure on metal, moving there will introduce a new level of complexity to your business. Ultimately this is the type of move a mature business makes when it’s looking for cost-cutting opportunities. Even then, it’s not for the faint of heart.

If you are considering this, I recommend that your first step be to rent dedicated servers from a hosting provider and attempt to create a secure staging environment for your software. This will teach you a lot about what is required and help you identify any gaps you may have as a company.

Ultimately, this lightweight experiment will set you up for future success – whether that success means leaving the cloud or staying on it.