Scuba, Stucco and SRE

Clockwork of the Holy Cross Church in DülmenNorth Rhine-Westphalia, Germany – Wikimedia Commons 2019

I know, I know. The title reeks of “Dejan took his Scuba equipment into a house with Stucco-finished exterior. You would not believe what happened next.” I promise there is a non-clickbait-y moral to the story. Let me prove it to you.

Sherwood vs Scubapro

The first story comes from my early Scuba days when I was trying to pick a regulator. For the uninitiated, ‘regulator’ is the thing that you breathe through – a thing that goes into your mouth, attaches to the gas tank through a hose, makes funny bubbles that scare the fishes and generally keeps you not dead at 80 feet. Which is a good thing, not being dead at 80 feet.

My local store carried a number of brands, including Sherwood Scuba and Scubapro. Scubapro is a high-end brand carrying a large number of fantastic-breathing regulators. They are the BMW of regulators, so to speak. Expensive, lots of parts, require regular maintenance with special tools. Then there is Sherwood Brute. Not as fancy as Scubapro, not as nice breathing. But: it has one serviceable part total. You just cannot break Brute. Maintenance takes like 5 minutes – replace one moving part, adjust it, done. A cold water version (Blizzard) can be taken under ice here in Canada. Not that I ever wanted, but you can. Store owners hate servicing Scubapro, love Sherwood. Customers (particularly new) get attracted to impressive and shiny Scubapro, but over the course of the years of ownership learn the darker side of the story. You cannot beat simplicity.

Stucco vs exposed brick

The second story comes from a contractor doing some work on the entrance of my first house. It was like many houses in Canada – exposed red brick. Not even fancy red brick, but kind of uneven, in all shades of red, with some toll of many a Toronto winter (bricks probably came from a quarry called Brickworks that is now a great bike ride near Toronto downtown). I asked the contractor if it would be a good idea to cover the bricks with stucco. At that time stucco looked to me like a nicer façade, definitely compared to my bricks. He said something that stuck with me:

Why would you do that? This is Toronto, with cold winters and hot and humid summers – with those extremes you would be setting yourself up for constant repairs and maintenance. Whenever you do something, it should be with less, not more maintenance as a goal.

Friendly contractor

Number of moving parts and SRE

OK, let me bring it home to microservices and the Cloud. When building a modern microservice system with the intention of managing it yourself, or sold as software for customers to run, your ultimate goal is Quality of Service. That means high availability, uptime, and few bugs that do whatever the opposite is of “delighting the users”.

Now, every system can be brought to the desired QoS given enough time and resources. But Murphy’s laws taught us that there is never enough of either. In a more realistic scenario, you will be closer to the desired QoS in a system with fewer moving parts.

As this is a modern Cloud system and not a brick wall or a scuba regulator, the ‘number of moving parts’ is a multi-dimensional indicator. You can increase the complexity of the system in many different ways:

  1. More complex architecture – you need a REST API, so you create a microservice that powers it. This API needs persistence, so you add a database. But the database is struggling with query performance, so you add a cache in front of the API to speed it up. Now, other parts of the system keep polling the API to stay in sync with the state. This is too much for the API, so you add a message broker to push changes out instead of pull. Before you know it, you have all these extra things to scale, keep working, and reason about.
  2. Different part types – one team likes Postgres, another likes Couchdb. One is relational, another is document-based NoSQL. The third team hates both and uses Cockroach and MongoDB. These databases cannot be optimised together, deployed, debugged, scaled together. Skills cannot be reused.
  3. Drinking too much microservice Kool-Aid – each new microservice eats into resources (even when cold), needs to be scaled for HA, monitored, debugged, deployed with zero downtime, and needs to pass security and other certifications. More microservices just increase the number of things that can go wrong and that you need to manage.
  4. Too many UIs – with or without new microservices, do you really need that new page? Each UI page needs to be designed, implemented, debugged, translated, made accessible, it may need to support ‘dark mode’, it should be responsive, it should be fast etc. That’s just another dimension that increases system cost and opens up another QoS problem vector.
  5. Too much Kubernetes – your system may have too complex an Ingress, too many operators/custom resources/pods being spawned dynamically, doing ‘interesting’ things in a way that is really hard to track and manage at any given time.

What to do?

I guess at this point you are looking for a magic bullet. Sadly, there isn’t one – this is a complex discipline and there are too many variables. However, the point of this article is to suggest that reducing the system complexity is almost always a move in the right direction. Far from being Mary Kondo of system design and suggesting to remove microservices that don’t bring you joy, it is healthy for a seasoned architect to have this constant urge to avoid ‘moving parts creep’. Based on the incomplete list of problems above, here is a (also incomplete) list of remedies:

  1. Use the simplest architecture that does the job. There are no points for complexity – you are after QoS, not parts count.
  2. Reduce the number and type of databases in the system – Martin Fowler and Polyglot Persistence notwithstanding, there is a real cost in proliferation. Try to standardize around one relational type, one NoSQL and maybe one cache and call it a day (and of course, no need to use all three if the system does not require it).
  3. Make every microservice earn the right to exist – you can aggregate REST API endpoints, as well as micro-frontends. Keep the number of microservices at the level you can realistically debug, monitor, manage and deploy.
  4. Do not get carried away with Kube.

In the end, have Sherwood Brute as a reference – do not go after impressive and shiny, but rather after something that gets the job done with minimal fuss. Heed the wisdom of my contractor: architect with an eye towards less maintenance, and that almost always means – as few moving parts as possible.

© Dejan Glozic, 2021

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: