Post - Get Your Head In the Cloud | Thomas Chia

Everybody loves to talk about being cloud native. Like so many things in technology, the cloud has evolved organically over a relatively short period of time, leading (at times) to a slightly confused identity. In this post, I share how I see the cloud and my views on working with — and in — it.

The Cloud is not just Someone Else's Computer

To some of you, this may be stating the obvious, but I've often heard people talk about becoming cloud native as lifting and shifting an on-premise system into the cloud — in other words, taking a bunch of installed third-party and/or custom-written software running on your computer(s) and loading it up on a virtual machine in the cloud. Let me caveat that I am not saying this is a bad thing; in fact, it may be the best thing to do, for example as part of a cloud migration strategy. After all, it's the first R in the 5Rs of Cloud Migration. Regardless of whether you take this approach as part of a migration or a new build, if you simply view the cloud as a bunch of other computers to run stuff in, I do not believe you are truly embracing the approach of the cloud native. A cloud tourist perhaps, but you'll need to work a little harder to get that residency status. The cloud is not just someone else's computer. It may have started that way, it may still look that way from certain angles, but in this day and age it is so much more than that, and if you don't treat it as such, you are just a tourist hitting up the boring old mainstream landmarks — most definitely not native.

What is it then?

So if dumping everything into VMs and letting her rip doesn't cut it, what does? When was the last time you built or installed a object storage application on a VM? Probably a long time ago (maybe never). AWS' S3 came out in 2006 as a managed object storage service and since then has become ubiquitous in the object storage space. Even if you don't use it, you know of it (and more to the point are probably still using a managed object storage service, just from someone else). Most likely you chose to use it because it's much less resource intensive to use something that someone else has already built than doing it yourself, provided that theirs is better than yours. This example is almost contrived, but it highlights the foundation of my point in this post — that in addition to the cloud being someone else's computers, it also has someone else's applications. By using these applications where appropriate, you start to leverage more of what the cloud has to offer. The cloud is full of managed services: storage, queues, machine learning, authentication, you name it. The cloud native embraces their existence and makes full and proper use of them wherever she can.

I'm not going to cover multi-cloud setups here, as that is an entire topic on its own. In a single-cloud provider design, you have the ability to make use of built-in integrations between managed services, that further reduce the need for you to write integration and orchestration logic. Examples in AWS include integrating API Gateway with Cognito for authentication and authorisation of endpoints, SNS with SQS for fan-out, event-based designs, or Lambda with... pretty much anything. These are not just a hodgepodge of different services tossed together (like some enterprise software "suites"), their creators have made sure these applications work together coherently, and that they can be better than the sum of their parts.

In spite of all this talk about using "someone else's applications" (i.e. managed services), at some point you still need to sit down and write something. The cloud does not know what your business is, and probably never will, so some part of your system will still need a general-purpose execution environment where you can do what you need to fill in the gaps. Does this mean that some part of the cloud tourist design endures? Maybe, but maybe not.

Serverless

Serverless used to be looked upon with disdain as something of a buzzword. I never viewed it as such; sure it had teething problems and I think a lot of people got disillusioned because they believed it was a silver bullet (hint: there are no silver bullets). Serverless is actually a broad term, and many of the examples of managed services I gave earlier are also serverless, but one in particular is the mother lode, the one which lets you fill in the gap that all the others, by virtue of them being single-purpose, leave you with: FaaS or Functions as a Service. Also known as AWS Lambda, GCP Cloud Functions, Azure Functions etc. With this, you could, if you wanted to, leave all semblance of your old life as a cloud tourist behind. Embracing FaaS means blurring the lines between infrastructure and application — you still have to define a few things like a runtime and some environmental configuration, but for the most part you just write application code and forget about everything else: scaling, right-sizing, OS patching etc.

Side note: I am not advocating the unconditional use of managed services/serverless; nor am I claiming that you must use them to be cloud native. What I am advocating is the awareness of such possibilities, and the deliberate adoption or otherwise.

I mentioned that serverless/FaaS had teething problems. A lot of these issues have 2 sides to them, an implementation side and a design side. Take a common complaint: slow cold starts. On the one hand, this is something that can be (and has been) improved. It's possible that what was unacceptable cold start time to someone 5 years ago has improved to a level that is acceptable. On the other hand, the need for a cold start is sort inevitable with the design of FaaS, and it is possible that this nature of FaaS makes it untenable for some use cases. Another, less common complaint, is cost. Typically pay-as-you-use, this is great for low to moderate workloads, maybe even high. But with infinite scaling comes infinite costs, and while yes, this could be "improved" by FaaS bceoming cheaper, at the time of writing I fully accept that workloads exist which would be much more expensive than non-FaaS implementations.

Side note: measure, or at least estimate before coming to the conclusion that FaaS is too expensive. Maybe even start off with FaaS and refactor when the time is right. The savings on operational overhead are much more significant in the early, make-or-break days of a company's existence.

Making it Happen

One of the big challenges when adopting such a cloud-centric design is that it's hard to replicate these services locally. In spite of a growing number of products and tools that help you to do this, I find that generally it's a game of catch up, and even small omissions or bugs can be showstopping. This is in contrast to a cloud tourist design, in which you can usually develop everything locally, then flash it up into your VM or container. At most you might have to write a stubbed implementation of some cloud service to run locally, which is replaced by the real thing when deployed. To me, there is an obvious solution — develop in the cloud directly. All engineers working on cloud native applications should have access to a cloud environment containing the actual services they plan to use. I think it is counterintuitive and backwards to withhold this. Personally, I love AWS, and between features such as Organizations and billing alerts, you can safely set up sandbox accounts for individual engineers.

As a corollary to this, the other thing that I believe is essential is the use of Infrastructure as Code (IaC). Aside from the usual reason of being able to track the history of your infrastructure changes, there are 2 more reasons for using IaC: playing around in the cloud can make it hard to keep track of things you have spun up. This isn't a terrible issue, but it can make for unnecessary costs and mess (and as engineers we hate mess). The second reason is more significant: with IaC, you can reliably move your system between cloud accounts. This makes it feasible for engineers to work on their "local" instance of the application (in their personal sandbox account) and propose changes to the infrastructure using an application style methodology (i.e. open a PR with changes to the IaC). These changes can be merged and deployed predictably to common environments like staging and production. Given the ability to do this, engineers can focus on making the changes that they need without having to tediously reproduce said changes when it's time to introduce them to a higher environment. If you are making use of serverless, then most of your resources can be configured identically across environments, and you will pay very little for the tiny amount of traffic running through your developer's sandbox account. If you have non-serverless resources like VMs, IaC typically lets you define different configurations (such as instance size/memory) for different environments, meaning you aren't over-provisioning.

Summary

There are 4 key points that I've tried to make in this post. These pillars form the "what" and "how" of the cloud as I see it.

The cloud is not just a bunch of someone else's computers sitting in a data centre with better locks and better air conditioning than your office. It's a plethora of higher level applications which can be leveraged for greater performance and efficiency.
Use managed services where possible. These are probably better than ones you could write yourself, and cheaper than self-hosting enterprise or open source ones.
Use serverless architectures where possible. In many cases, it's cost effective and has a relatively tiny operational overhead.
Use IaC to describe your infrastructure. This enables engineers to get their heads in the cloud completely, building things in the way that they will eventually run, and port those changes into shared environments predictably.

By accepting and embracing these pillars, we find ourselves adopting the way of the cloud native.