Learn how Porter executed their cloud migration at scale → Watch Now
In this post, I want to share some insights gained from my advisory engagements on cloud spend. I advised over 20 companies on cloud cost optimization. Unfortunately, what we envisaged to be short-term engagements where I'd analyze bloating costs and then share tips and tricks ended up being multi-quarter engagements requiring continued support. This was highly inefficient!
In this post, I want to share some insights gained from my advisory engagements on cloud spend. I advised over 20 companies on cloud cost optimization. Unfortunately, what we envisaged to be short-term engagements where I'd analyze bloating costs and then share tips and tricks ended up being multi-quarter engagements requiring continued support. This was highly inefficient!
How did we end up here?
To answer this, let’s take a step back and examine our DevOps practices and tools in more detail.
While investigating cloud spend, we would dig through spend dashboards, discover insights, sort them through priorities, assign to teams, and then measure them again - every week. The toil it took on teams was immense. It was like Groundhog day.
We had success on many occasions and celebrated the pure dollar savings. However, this really made me question our fundamental approach- how can we think preemptively about this?
Make no mistake, continuous verification and auditing is an absolutely necessary practice. We all need to routinely audit costs and practices, but it should not be the only way.
Expanding on this, I realized that it isn’t limited to cloud cost. We take this "by-audit’" approach in almost all of our DevOps practices.
I call this approach of taking stock retrospectively as 'by-audit'. These days when I come across a DevOps tool or process, I classify whether it fixes things "by-audit" or "by-design".
Let’s look closer at how we approach practices when you think by-audit:
This applies to tools as well.
For instance, let's consider you have a tool that analyzes exposed credentials of a database. A database expert then analyzes the report. The expert identifies the services connecting to the database, finds the owners for it and informs them so they can re-examine it.
With the "by-design" approach we ensure adoption of best practices and principles before implementation.
To continue the above example of the credentials for the database, a "by-design" approach would ensure that there is a formal way to request a credential for each of the services beforehand.
Also, it would require you to ensure that there is a policy on credential creation, isolation, storage, and rotation which is applied uniformly when this is fulfilled, programmatically. It must also be important to ensure that this is the only way credentials can be created.If the above is taken care by design, the need of verification tools will go down significantly. The codified policy and code can be statically verified and can be moved upstream to the CI pipeline.
Similarly, with AWS Cost Explorer, where you analyze costs per team and identify teams consuming the most resources, it is a "tool of verification".
While AWS Budgets, where you allocate budgets for a team combined with automation that prevents teams to go over budget, will be a "tool of design".
If you have a burning problem like cost bloating, then the by-audit process comes in handy. You would need to put a team together, pull out all reports, break them down and assign them to developers to fix them. However, more often than not, these practices become the norm. By-design processes provide long-term benefits and ensure things stay fixed. You may need audits still, but the workload dramatically reduces.
Typically, a central team is responsible in the by-audit approach. Here, the governance is said to be centralized, because ownership is with one team. Generally, they use certain tools and analyze reports. This is fine for the short term.
However, with the by-design approach the goals are long-term. Ownership is given to multiple teams so governance is de-centralized and teams get more autonomy. This ensures clear ownership boundaries so that every team is aware of their share of tasks.
Apart from routine audits, usually, by-audit processes are useful in case of anomalies e.g. a security breach or a cost spike. While it's easy to tackle these anomalies with audits, it's not sustainable for improving baselines. By-design processes ensure that the baseline improves overall - i.e. your default security posture, observability coverage, and cost baselines.
While moving from by-audit to by-design processes, the focus must shift from Tools to Platform thinking. The questions that need to be answered are :
Platform teams of today must shift from stitching tools and audit-driven practices. Indeed, Platform engineering is centered around a by-design mindset.
Platform teams need to think about how to devise ways where the well-architected aspects of cloud and practices are 'ensured' and not required to be audited.