Angular.js 2.0, Index Investing and Micro-Services

Beuckelaer_Girl_with_a_basket_of_eggs
Beuckelaer: Girl with a basket of eggs, Wikimedia Commons.

Now here is somebody with all her eggs in one basket, literarily. I used it to illustrate what index investing tries to avoid. I thought of index investing while reading the bitter and often hilarious reactions to the announced changes in Angular.js 2.0 on Reddit. I also thought about my experience with Google services. Watch me tie all these things together in one magic feat.

First off, let me be the first to acknowledge that righteous indignation about changes to a free product or service is always a bit rich. “I used this for my benefit for months and now they changed it – I demand they fix it and continue to invest real money so that I can continue using it for free”. Right.

That thing off the table, here is my amusing experience with Google services. A while ago I amassed a number of feeds I wanted to keep up with every morning over breakfast. I created a nice multi-page dashboard using iGoogle. It worked, and it was even responsive – it loaded fast and works well on my iPhone.

Then one day Google pulled the plug on it. After venting my, you guessed it, righteous indignation for a while, I looked for a replacement and found that Google Reader can be used for that. So I moved all my feeds to it. You guess what happened next – they pulled the plug on it too, and I had to move my feeds to Feedly, where they remain to this day.

Apart from feeling like the first of the three little piggies forced to change its address often due to the certain wolf, it taught me how Google feels about its free services. While it has a number of exciting and often groundbreaking products in the air at any point, you better steer clear if long term stability is important to you. Google engineers are not sentimental about their software and change their minds at an alarming frequency, which moves things forward but also leaves a lot of victims in their wake.

The moment I learned about Angular.js and how it was bestowed on the world and maintained by Google, my first thought was ‘uh, oh’. It had Google fingerprints all over it:

  1. A vertically integrated opinionated framework that attempts to solve all your needs.
  2. It does not play well with other well loved and popular libraries
  3. A lot of the approaches need getting used to and don’t look like anything you have seen before
  4. As a result, the learning curve is steep, and once you climbed it, you feel personally invested to a degree that is not healthy

And now with the announcement of Angular.js 2.0, we have the final shoe to drop – Google’s famous impatience with continuity and careful evolution. There were many people on Reddit who evangelized for Angular 1.x in their companies, and now feel betrayed. Others are contemplating switching to Knockout, Backbone, or leaving Web development altogether.

Index Investing

To change the pace a bit, let’s look at index investing. It is an investment technique that openly gives up on picking stock market winners and losers. Through the bitter experience, some people discovered that their stock picks are worse than if they let a blindfolded monkey choose their investments by throwing darts. Instead, they decided to invest in a basket of investments in each category, using low-fee instruments such as ETFs (Exchange Traded Fonds). All empirical evidence suggests most people can’t pick winners if their life depended on it, and even those who can cannot sustain that track record over any length of time. Full disclosure – I moved all my investments to index funds and my results are way better than my dart throwing years.

Index investing is all about diversification, asset allocation and risk containment. It applies to Web development more than you think.

The trouble with frameworks

I was picking on Angular.js, but I didn’t need to go that far outside my own company. We at IBM have written a hot mess of Web UIs using the Dojo framework. Truth to be told, a few years ago it was a pretty decent option and had some solutions that mattered to enterprises (i18n, important widgets such as sorting tables and trees, and support for all god-awful IE browsers known to man). The problem is that once you write all that code on top of it, you are stuck – you are forever bound to it. Dojo was a basket we put all our eggs in, like that girl above.

In the investment analogy, we bought one stock for all our money. Any investor will tell you that such a strategy is crazy – way too much alpha for a good night sleep. At the first opportunity to reflect on our strategy and devise something saner, we decided to use stable, standard-based protocols for integration, and confine stack choices to individual services. While we can change our minds on the implementation of one service, the other services can continue to work because the integration protocols are stable.

We also learned the hard way that frameworks tend to generate additional work – the source of accidental complexity. If you find yourself spending nontrivial amount of time ‘feeding the framework’ i.e. writing code not because it makes sense for your project but because the framework needs it done a certain special way, you are the victim of accidental complexity.

As a result, we also developed a strong preference for toolkits over frameworks. It allows us to maintain control and have a better chance of avoiding nasty surprises such as Angular.js 2.0.

Micro-services and risk minimization

Our current love affair with micro-services have several reasons, many of which I have written about in the previous posts. However, one of the reasons is closely related to the subject of this post: risk control. Like in investing, our ability to pick the right framework has a dismal track record. Therefore, with our switch to micro-services, we focused first on the way they communicate with each other. We invested in stable REST APIs, and message brokers that pass messages around using open protocols such as MQTT and AMQP 1.0. Due partially to the glacial pace of protocol standardization, the danger of them changing overnight is much lower.

Our approach to individual micro-service implementation is then to confine risk to the service boundary. If the service is small enough, picking the wrong framework (if you even need a framework) will doom only that one service. A small service can be re-written if need be. The entire system cannot without a world of pain.

Essentially our approach is to officially declare that we will not assemble a committee to choose one framework to rule them all. We will apply the following mitigation rules instead:

  1. Use the simplest approach you can get away with
  2. If you can get away with server-side generated content, do it
  3. If you can get away with server-side content + jQuery + Bootstrap, do it
  4. If you need a bit of MV* magic, try Backbone combined with isomorphic templates (e.g. Dust.js partials that are reused on both server and client)
  5. If you must use Angular.js 1.3, do it, but you are on the hook to keep up with Google, and have a contingency rewriting plan
  6. We will NOT base any of the integration code on any of the frameworks de jour. Instead, we will use REST, AMQP/MQTT, JSON, HTML5, CSS3 and vanilla JS.

Be pessimistic

Someone once said that we are all writing legacy code every day, so we should strive to make it the best legacy code we can muster. Angular 1.3 turned from shiny to legacy in a flash (even though in ultra slow motion since 2.0 will only arrive in the early 2016). Our approach may be pessimistic, but it will help us sleep better in the years to come, and will make those that come after us curse us a bit less. Micro-services help in this regard because they confine the risk, in the way oil tankers break up the cargo space into compartments. If you ensure that you can change your mind about the implementation of each service, the risk and importance of choosing the right framework diminishes.

The right question should not be “should I use Backbone.js, Angular.js, Ember.js or something else”. The question should be: will I be able to recover when the ADD-suffering maintainers of your framework of choice inevitably lose interest.

Right now, a starry-eyed college dropout is writing the next shiny framework to take the world by storm. With the micro-service approach, you will be able to give it a shot without betting the farm on it. You are welcome.

© Dejan Glozic, 2014

Advertisements

Socket.io: Mind the Gap

Wikimedia Commons, 'Mind the Gap', 2008, Clicsouris
Wikimedia Commons, ‘Mind the Gap’, 2008, Clicsouris

Welcome to our regular edition of ‘Socket.io version 1.0 watch’ or ‘Making sure Guillermo Rauch is busy working on Socket.io 1.0 instead of whatever he does to pay the rent that does nothing for me’. I am happy to inform you that Socket.io 1.0 is now available, with the new logo and everything. Nice job!

With that piece of good news, back to our regular programming. First, a flashback. When I was working on my doctoral studies in London, England, one of the most memorable trivia was a dramatic voice on the London Underground PA system warning me to ‘Mind the Gap’. Since then I seldomly purchase my clothes in The Gap, choosing its more upmarket sibling Banana Republic. JCrew is fine too. A few years ago a friend went to London to study and she emailed me that passengers are still reminded about the dangers of The Gap.

We have recently experienced a curious problem in our usage of WebSockets – our own gap to mind, as it were. It involves a topology I have already written about. You will most likely hit it too, so here it goes:

  1. A back-end system uses message queue to pass messages about state changes that affect the UI
  2. A micro-service serves a Web page containing Socket.io client that turns around and establishes a connection with the server once page has been loaded
  3. In the time gap between the page has been served and the client calls back to establish a WebSockets connection, new messages arrive that are related to the content on the page.
  4. By the time the WebSockets connection has been established, any number of messages will have been missed – a message gap of sorts.
mind-the-gap
WebSockets gap: in the period of time from HTTP GET response to the establishment of the WebSockets connection, msg1 and msg2 were missed. The client will receive messages starting from the msg3.

The gap may or may not be a problem for you depending on how you are using the message broker to pass messages around micro-services. As I have already written in the post about REST/MQTT mirroring, we are using MQTT to augment the REST API. This augmentation mirrors the CRUD verbs that result in state change (CUD). The devil is in the details here, and the approach taken will decide whether the ‘message gap’ is going to affect you or not.

When deciding what to publish to the subscribers using MQ, we can take two approaches:

  1. Assume subscribers have made a REST call to establish the baseline state, and only send deltas. The subscribers will work well as long as the took the baseline and didn’t miss any of the deltas for whatever reason. This is similar to showing a movie on a cable channel in a particular time slot – if you miss it, you miss it.
  2. Don’t assume subscribers have the baseline state. Instead, assume they may have been down or not connected. Send a complete state of the resource alongside the message envelope. This approach is similar to breaking news being repeated many times during the day on a news channel. If you are just joining, you will be up to date soon.

The advantages of the first approach are the message payloads. There is no telling how big JSON resources can be (a problem recently addressed by Tim Bray in his fat JSON blog post). Imagine we are tracking a build resource and it is sending us updates on the progress (20%, 50%, 70%). Do we really want to receive the entire Build resource JSON alongside this message?

On the other hand, the second approach is not inconsistent with the recommendation for PUT and PATCH REST responses. We know that the newly created resource is returned in the response body for POST requests (alongside Location header). However, it is considered a good practice to do the same in the requests for PUT and PATCH. If somebody moves the progress bar of a build by using PATCH to update the ‘progress’ property, the entire build resource will thus be returned in the response body. The service fielding this request can just take that JSON string and also attach it to the message under the ‘state’ property, as we are already doing for POST requests.

Right now we didn’t make up our minds. Sending around entire resources in each message strikes us as wasteful. This message will be copied into each queue of the subscribers listening to it, and if it is durable, will also be persisted. That’a a lot of bites to move around while using a protocol whose main selling point is that it is light on the resources. Imagine pushing these messages to a native mobile client over the air. Casually attaching entire JSON resources to messages is not something you want to do in these situations.

In the end, we solved the problem without changing our ‘baseline + deltas’ approach. We tapped into the fact that messages have unique identifiers attached to them as part of the envelope. Each service that is handling clients via WebSockets has a little buffer of messages that are published by the message broker. When we send the page the client, we also send the ID of the last known message embedded in HTML as data. When WebSockets connection is established, the client will communicate (emit) this message ID to the server, and the server will check the buffer if new messages have arrived since then. If so, it will send those messages immediately, allowing the client to catch up – to ‘bridge the gap’. After it has been caught up, the message traffic resumes as usual.

As a bonus, this approach works for cases where the client drops the WebSockets connection. When connection is re-established, it can use the same approach to catch up on the messages it has missed.

The fix: the service sends the ‘message marker’ (last message id). Client echoes the marker when connecting with WebSockets. Detecting the hole in message sequence, the service immediately sends the missing messages allowing the client to catch up.

As you can see, we are still learning and evolving our REST/MQTT mirroring technique, and we will most likely encounter more face-palm moments like this. The solution is not perfect – in an extreme edge case, the WebSockets connection can take so long that the service message buffer fills up and old messages start dropping off. A solution in those cases is to refresh the browser.

We are also still intrigued with sending the state in all messages – there is something reassuring about it, and the fact that the similarity to PATCH/PUT behavior only reinforces the mirroring aspect is great. Perhaps our resources are not that large, and we are needlessly fretting over the message sizes. On the other hand, when making a REST call, callers can use ‘fields’ and ’embed’ to control the size of the response. Since we don’t know what any potential subscriber will need, we have no choice but to send the entire resource. We need to study that approach more.

That’s it from me this week. Live long, prosper and mind the gap.

© Dejan Glozic, 2014

Micro-service APIs With Some Swag (part 2)

London Cries: A Man Swaggering, Paul Sandby, 1730
London Cries: A Man Swaggering, Paul Sandby, 1730

Read part 1 of the article.

Last week I delved into the problem of presenting a unified API doc from a distributed system composed of micro-services. In the second installment, we will get our hands dirty with help of Swagger by Wordnik.

A quick recap: the problem we are trying to solve is how to document all the APIs of the system when these APIs are an aggregation of endpoints contributed by individual micro-services. To illustrate the possible solution, we will imagine a distributed system consisting of two micro-services:

  • Projects – this service provides APIs to query update and delete projects (very simplified, of course).
  • Users – this service provides APIs to query and create user profiles. Projects API will make a reference to users when listing project members.

For simplicity, we will choose only a handful of properties and methods. Project model is defined in Swagger like this:

"Project": {
   "id": "Project",
   "description": "A single project model",
   "properties": {
      "name": {
      "type": "string",
      "description": "name of the project",
      "required": true
   },
   "description": {
      "type": "string",
      "description": "description of the project"
   },
   "avatar": {
      "type": "string",
      "description": "URL to the image representing the project avatar"
   },
   "owner": {
      "type": "string",
      "description": "unique id of the projects' owner",
      "required": true
   },
   "members": {
      "type": "array",
      "description": "array of unique ids of project members",
      "items": {
         "type": "string"
      }
   }
}

Users are provided by another service and have the following model, defined the same way:

"User": {
   "id": "User",
   "description": "A single user model",
   "properties": {
      "id": {
         "type": "string",
         "description": "unique user id",
         "required": true
      },
      "name": {
      "type": "string",
         "description": "user name",
         "required": true
      },
      "email": {
         "type": "string",
         "description": "user email",
         "required": true
      },
      "picture": {
         "type": "string",
         "description": "thumbnail picture of the user"
      }
   }
}

The key to the proposed solution lies in Swagger’s feature that allows the composite API document to be composed of APIs coming from multiple places. The entry point to the document definition will look like this:

{
    "apiVersion":"1.0",
    "swaggerVersion":"1.2",
    "apis":[
        {
            "path": "http://localhost:3000/api-docs/projects.json",
            "description":"Projects"
        },
        {
            "path": "http://localhost:3001/api-docs/users.json",
            "description":"Users"
        }
    ],
    "info":{
        "title":"30% Turtleneck, 70% Hoodie API Example",
        "description":"This API doc demonstrates how API definitions can be aggregated in a micro-service system, as demonstrated at <a href=\"http://dejanglozic.com\">http://dejanglozic.com</a>."
    }
}

Each API group with its set of endpoints can be sources from a different URL, allowing us the flexible solution to provide this resource by the actual micro-service that owns that API portion.

Each individual API document will list the endpoints and resources it handles. For each endpoint and verb combination, it will list parameters, request/response bodies, as well as data models and error responses. This and much more is fully documented in Swagger 1.2 specification.

{
  "path": "/users/{id}",
  "operations": [
    {
      "method": "GET",
      "summary": "Returns a user profile",
      "notes": "Implementation notes on GET.",
      "type": "User",
      "nickname": "getUser",
      "authorizations": {},
      "parameters": [
        {
          "name": "id",
          "description": "Unique user identifier",
          "paramType": "path",
          "type": "string"
        }
      ],
      "responseMessages": [
        {
          "code": 404,
          "message": "User with a provided id not found"
        }
      ]
    }
  ]
}

Swagger handles most of its specification through the parameter list, which is fairly clever. In addition to query parameters, they can be used to define path segments, as well as request body for POST, PUT and PATCH. In addition, request and response body schemas are linked to the model specifications further down the document.

Needless to say, I skipped over huge swaths of the specification dealing with authentication. Suffice to say that Swagger specification currently supports basic Auth, API key and OAuth 2 as authentication options.

At this point of the article, I realized that I cannot show you the actual JSON files without making the article long and unwieldy. Luckily, I also realized I actually work for IBM and more importantly, IBM DevOps Services (JazzHub). So here is what I have done:

The entire Node.js app is now available as a public project on DevOps Services site:

ids-swagger

Once you explored the code, you can see it running in IBM Bluemix. You can drill into the Swagger API UI (despite what the UI is telling you, you cannot yet ‘Try it’ – the actual API is missing, this is just a doc; for a real API, this part will also work).

bm-swagger

I hope you agree that showing a running app is better than a screenshot. From now on, I will make it my standard practice to host complete demo apps in DevOps Services and run them in Bluemix for your clicking pleasure.

© Dejan Glozic, 2014

Micro-service APIs With Some Swag (part 1)

London Cries: A Man Swaggering, Paul Sandby, 1730
London Cries: A Man Swaggering, Paul Sandby, 1730

Every aspect of the API matters to some Client.

Jim des Rivieres, Evolving Eclipse APIs

It is fascinating that the quote above is 14 years old now. It was coined by the Benevolent Dictator of Eclipse APIs Jim des Rivieres in the days when we defined how Eclipse Platform APIs were to be designed and evolved. Of course, APIs in question were Java, not the REST variety that is ruling the API economy these days. Nevertheless, the key principles hardly changed.

Last week when I wrote about the switch to micro-services by SoundCloud, I noted that APIs are predominantly a public-facing concern in monolithic applications. There is no arms-length relationship between providers and consumers of functional units, enabling a low-ceremony evolution of the internal interfaces. They are the ‘authorized personal only’ rooms in a fancy restaurant – as long as the dining room is spotless, we will ignore the fact that the gourmet meals are prepared by a cute rat that sounds a lot like Patton Oswald.

Put another way, APIs are not necessary in order to get the monolithic application up and running. They are important the moment you decide to share your data with third-party developers, write a mobile app, or enable partner integrations. Therefore, monolithic applications predominantly deal with public API.

Things are much different for a micro-service based distributed system. Before any thought is put in how the general public will interact with the system, micro-services need to figure out how they will interact with each other.

In the blog post about Node.js clustering, I pointed out that Node is inherently single-threaded, and clustering is required just to stretch to all the cores of a single server, never mind load balancing across multiple VMs. This ‘feature’ essentially makes clustering an early consideration, and switching from vertical to horizontal scaling (across multiple machines) mostly a configuration issue. Presumably your instances have already been written to share-nothing and do not really care where they are physically running.

Micro-service APIs are very similar in that regard. They force developers to start with a clean API for each service, and since a complex system is often built with several teams working in parallel, it would turn into a total chaos without clean contracts between the services. In micro-service systems, APIs are foundational.

Internal APIs – an oxymoron?

In the previous post, I put forward a few rules of writing a micro-service based distributed system that concern APIs. Here they are again:

  • Rule 3: APIs should be the only way micro-services talk to each other and the outside world.
  • Rule 4: Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.
  • Rule 5: Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.

The aforementioned Jim des Rivieres used to say that “there is no such a thing as internal API”. Interfaces are either firm contracts exhibiting all the qualities of APIs or they can change at any time without warning. There is no mushy middle ground. I tend to agree with him when it comes to monolithic systems, where ‘internal’ refers to ‘written for systems’ internal use only’. However, in distributed systems ‘internal’ refers to traffic between services, or between systems behind the firewall. It is more to do with ‘things we say to the members of our own family’, presumably versus ‘things we say to the outside world’.

In this context, ‘internal APIs’ is a legitimate thing because ‘internal’ refers to the visibility rules, not the quality of the API contract. Rule #4 above explicitly states that – there is nothing different about internal APIs except visibility.

Presenting unified API front

If APIs are the only way micro-services should communicate with each other and the outside world, the consumers need to be presented with a cleanly documented contract. Documenting the APIs cannot be an afterthought – it needs to be built with the micro-service, sometimes even before the documented endpoints actually work.

The fact that our distributed system is composed of micro-services is a great feature for us and our ability to quickly evolve and deploy the system with little of no downtime. However, API consumers can’t care less about it – they want one place to go to see all the APIs.

There are multiple ways of skinning that particular cat, but we have decided to do as follows:

  1. Proxy all the APIs to the common path (e.g. https://example.com/api)
  2. Expose the API version in the URL (I know, I know, we can yell at each other until the cows come home about how that is great or stupid, but many popular APIs are doing it and so are we). Thus the common path gets a version (e.g. https://example.som/api/v1)
  3. Reserve a segment after the version for each micro-service that exposes APIs (e.g. /projects, /users etc.).
  4. Provide API specification using a popular Open Source API doc solution

On the last point, we looked around and considered several alternatives, finally settling on Swagger by Wordnik. It is a popular solution, with a vibrant community, fairly well defined API spec, a reusable live API UI that can be included in our UI, and with a path forward towards version 2.0 that promises to address currently missing features (the current version is 1.2).

A micro-service based system using Swagger to define APIs could look like this:

swagger

Each micro-service that provides APIs will make a Swagger API doc resource available, describing all the endpoints, verbs, parameters and request/response bodies.  Documentation micro-service can render these in two ways – using Swagger Live UI and rendering static docs.

Swagger Live UI is available as an Open Source project and allows users to not only read the rendered documentation, but enter values and try it out in place. To see it in action, try out the Pat Store sample.

The UI is all client side, which makes it stack-agnostic and fit for being served by a multitude of platforms, but if you are aggregating your definitions like we do, you need go around browser’s Single-Origin limitation. You can either proxy the API definitions or use CORS. In our case, it helps that we proxy all the services to the single external URL root, which is on the same domain as the doc UI – problem solved.

I can stop now while I am ahead – this being the part 1 of a multi-part article. In the next installment, I will walk you through an example of two micro-services – one providing API for Projects, another for Users. We will spec out the API, document the spec using Swagger, write a Node.js app to serve the UI from these definitions, and also render an alternative static version of the API doc.

See you next week, off to write some API micro-services.

© Dejan Glozic, 2014

 

SoundCloud is Reading My Mind

Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, Cincinnati & New York, 1900
Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, 1900

“Bad artists copy. Good artists steal.”

– Pablo Picasso

It was bound to happen. In the ultra-connected world, things are bound to feed off of each other, eventually erasing differences, equalizing any differential in electric potentials between any two points. No wonder the weirdest animals can be found on islands (I am looking at you, Australia). On the internet, there are no islands, just a constant primordial soup bubbling with ideas.

The refactoring of monolithic applications into distributed systems based on micro-services is slowly becoming ‘a tale as old as time’. They all follow a certain path which kind of makes sense when you think about it. We are all impatient, reading the first few Google search and Stack Overflow results ‘above the fold’, and it is no coincidence that the results start resembling majority rule, with more popular choices edging out further and further ahead with every new case of reuse.

Luke Wroblewski of Mobile First fame once said that ‘two apps do the same thing and suddenly it’s a pattern’. I tend to believe that people researching the jump into micro-services read more than two search results, but once you see certain choices appearing in, say, three or four stories ‘from the trenches’, you become reasonably convinced to at least try them yourself.

If you were so kind as to read my past blog posts, you know some of they key points of my journey:

  1. Break down a large monolithic application (Java or RoR) into a number of small and nimble micro-services
  2. Use REST API as the only way these micro-services talk to each other
  3. Use message broker (namely, RabbitMQ) to apply event collaboration pattern and avoid annoying inter-service polling for state changes
  4. Link MQ events and REST into what I call REST/MQTT mirroring to notify about resource changes

Then this came along:

As I was reading the blog post, it got me giddy at the realization we are all converging on the emerging model for universal micro-service architecture. Solving their own unique SoundCloud problems (good problems to have, if I may say – coping with millions of users falls into such a category), SoundCloud developers came to very similar realizations as many of us taking a similar journey. I will let you read the post for yourself, and then try to extract some common points.

Stop the monolith growth

Large monolithic systems cannot be refactored at once. This simple realization about technical debt actually has two sub-aspects: the size of the system at the moment it is considered for a rewrite, and the new debt being added because ‘we need these new features yesterday’. As with real world (financial) debt, the first order of business is to ‘stop the bleeding’ – you want to stop new debt from accruing before attempting to make it smaller.

At the beginning of this journey you need to ‘draw the line’ and stop adding new features to the monolith. This rule is simple:

Rule 1: Every new feature added to the system will from now on be written as a micro-service.

This ensures that precious resources of the team are not spent on making the monolith bigger and the finish line farther and farther on the horizon.

Of course, a lot of the team’s activity involves reworking the existing features based on validated learning. Hence, a new rule is needed to limit this drain on resources to critical fixes only:

Rule 2: Every existing feature that requires significant rework will be removed and rewritten as a micro-service.

This rule is somewhat less clear-cut because it leaves some room for the interpretation of ‘significant rework’. In practice, it is fairly easy to convince yourself to rewrite it this way because micro-service stacks tend to be more fun, require fewer files, fewer lines of code and are more suitable for Web apps today. For example, we don’t need too much persuasion to rewrite a servlet/JSP service in the old application as a Node.js/Dust.js micro-service whenever we can. If anything, we need to practice restraint and not fabricate excuse to rewrite features that only need touch-ups.

US_Beef_cuts_svg
Micro-services as BBQ. Mmmmm, BBQ…

An important corollary of this rule is to have a plan of action ahead of time. Before doing any work, have a ‘cut of beef’ map of the monolith with areas that naturally lend themselves to be rewritten as micro-services. When the time comes for a significant rework in one of them, you can just act along that map.

As is the norm these days, ‘there’s a pattern for that’, and as SoundCloud guys noticed, the cuts are along what is known as bounded context.

Center around APIs

As you can read at length on the API evangelist’s blog, we are transforming into an API economy, and APIs are becoming a central part of your system, rather than something you tack on after the fact. If you could get by with internal monolith services in the early days, micro-services will force you to accept APIs as the only way you communicate both inside your system and with the outside world. As SoundCloud developers realized, the days of integration around databases are over – APIs are the only contact points that tie the system together.

Rule 3: APIs should be the only way micro-services talk to each other and the outside world.

With monolithic systems, APIs are normally not used internally, so the first APIs to be created are outward facing – for third party developers and partners. A micro-service based system normally starts with inter-service APIs. These APIs are normally more powerful since they assume a level of trust that comes from sitting behind a  firewall. They can use proprietary authentication protocols, have no rate limiting and expose the entire functionality of the system. An important rule is that they should in no way be second-class compared to what you would expose to the external users:

Rule 4: Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.

Once you have the internal APIs designed this way, deciding which subset to expose as public API stops becoming a technical decision. Your external APIs look like internal with the exception of stricter visibility rules (who can see what), rate limiting (with the possibility of a rate-unlimited paid tier), and authentication mechanism that may differ from what is used internally.

Rule 5: Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.

SoundClound developers went the other way (public API first) and realized that they cannot build their entire system with the limitations in place for the public APIs, and had to resort to more powerful internal APIs. The delicate balance between making public APIs useful without giving out the farm is a decision every business need to make in the API economy. Micro-services simply encourage you to start from internal and work towards public.

Messaging

If there was a section in SoundCloud blog post that made me jump with joy was a section where they discussed how they arrived at using RabbitMQ for messaging between micro-services, considering how I write about that in every second post for the last three months. In their own words:

Soon enough, we realized that there was a big problem with this model; as our microservices needed to react to user activity. The push-notifications system, for example, needed to know whenever a track had received a new comment so that it could inform the artist about it. At our scale, polling was not an option. We needed to create a better model.

 

We were already using AMQP in general and RabbitMQ in specific — In a Rails application you often need a way to dispatch slow jobs to a worker process to avoid hogging the concurrency-weak Ruby interpreter. Sebastian Ohm and Tomás Senart presented the details of how we use AMQP, but over several iterations we developed a model called Semantic Events, where changes in the domain objects result in a message being dispatched to a broker and consumed by whichever microservice finds the message interesting.

I don’t need to say much about this – read my REST/MQTT mirroring post that describes the details of what SoundCloud guys call ‘changes in the domain objects result in a message’. I would like to indulge in a feeling that ‘great minds think alike’, but more modestly (and realistically), it is just common sense and RabbitMQ is a nice, fully featured and reliable open source polyglot broker. No shocking coincidence – it is seen in many installations of this kind. Let’s make a rule about it:

Rule 6: Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

All together now

Let’s pull all the rules together. As we speak, teams around the world are suffering under the weight of large unwieldy monolithic applications that are ill-fit for the cloud deployment. They are intrigued by micro-services but afraid to take the plunge. These rules will make the process more manageable and allow you to arrive at a better system that is easier to grow, deploy many times a day, and more reactive to events, load, failure and users:

  1. Every new feature added to the system will from now on be written as a micro-service.
  2. Every existing feature that requires significant rework will be removed and rewritten as a micro-service.
  3. APIs should be the only way micro-services talk to each other and the outside world.
  4. Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.
  5. Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.
  6. Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

This is a great time to build micro-service based systems, and collective wisdom on the best practices is converging as more systems are coming online. I will address the topic of APIs in more detail in one of the future posts. Stay tuned, and keep reading my mind!

© Dejan Glozic, 2014

REST and MQTT: Yin and Yang of Micro-Service APIs

Yin_and_yang_stones

It seemed that the worst was over – I haven’t heard a single new portmanteau of celebrity names in a while (if you exclude ‘Shamy’ which is a super-couple name of Sheldon and Amy from The Big Bang Theory but being a plot device, I don’t think it counts). Then when I researched for this blog post I stumbled upon project QEST, a mashup of MQTT and REST. Et tu, Matteo Collina?

What Matteo did in the project QEST is an attempt to bridge the world of apps speaking REST and the world of devices speaking MQTT with one bilingual broker. I find the idea intriguing and useful in the context of the IoT world. However, what I am trying to achieve with this post is address the marriage of these two protocols in the context of micro-service-based distributed systems. In a sense, we are re-purposing a protocol not primarily created for this but that exhibits enough flexibility and simplicity to fit right in.

You keep saying that

I think I have written about usefulness of message brokers in micro-service systems often enough to reasonably expect it to be axiomatic by now. From the point of view of service to service interaction, REST poses a problem when services depend on being up to date with data they don’t own and manage. Being up to date requires polling, which quickly add up in a system with enough interconnected services. As Martin Fowler has pointed out in the article on the event collaboration pattern, reversing the data flow has the benefits of reacting to data changes, rather than unceasingly asking lest you miss a change.

However, the problem with this data flow reversal when implemented literally is that onus of storing the data is put on the event recipients. Storing the data in event subscribers allows them to be self-sufficient and resilient – they can operate even if the link to the event publisher is temporarily severed. However, with every second of the link breakage they operate on potentially more and more stale data. It is a case of ‘pick your poison’ – with apps using the request-response collaboration pattern, a broken link will mean that no collaboration is happening, which may or may not be preferred to acting on outdated information.

As we are gaining more experience with micro-service-based systems, and with the pragmatic assumption that message broker can fail, we are finding event collaboration on its own insufficient. However, augmenting REST with messaging results in a very powerful combination – two halves of one complete picture. This is how this dynamic duo works:

  1. The service with a REST API will field requests by client services as expected. This establishes the baseline state.
  2. All the services will simultaneously connect to a message broker.
  3. API service will fire messages notifying about data changes (essentially for all the verbs that can cause the change, in most cases POST, PUT, PATCH and DELETE).
  4. Clients interested in receiving data updates will react to these changes according to their functionality.
  5. In cases where having the correct data is critical, client services will forgo the built-up baseline + changes state and make a new REST call to establish a new baseline before counting on it.

How is this different from a pure implementation of Event Collaboration pattern?

  1. Messages are used to augment, not replace REST. This is in contrast to, say, Twitter streaming API where you need to make a choice (you will either use REST or stream the tweets using an HTTP connection that you keep open).
  2. While message brokers are reliable and there are ways to further increase this durability (delivery guarantees, durable queues, quality of service etc.), REST is still counted on establishing a ‘clean slate’. Of course, REST can fail too, but if it does, you have no data, as opposed to old and therefore incorrect data.
  3. Client services are not required to store data. For this to work, they still need to track the baseline data they obtained through the REST call and be able to correlate messages to this baseline. For example, if a client service rendered a Web page from the data obtained from a REST API, it should be able to detect that a message it received will affect this web page and use something like Web Sockets to update the page accordingly.

OK, but what is the actual contract?

Notice how I have mentioned the word ‘API’ multiple times, while I keep talking about ‘messaging’ in a non-committal way. And yet, there is no ‘generic’ API – by definition it requires clear contract in the way client services can interact with the API service. If we are to extend REST Yin with the messaging Yang, it has to be a true companion and become part of the API contract.

This is where MQTT comes in. As an Oasys standard, it is vendor-neutral in the same way as REST. While the protocol spec itself is detailed and intricate, most of the experience of using the protocol is ‘publishers publish messages into topics and subscribers subscribe to said topics’. That’s it.

A very useful characteristic of MQTT topic structure is that it can contain delimiters (‘/’), which opens up a possibility to sync up REST URLs and topics. This prompted some developers such as Matteo to go for full parity (essentially using the REST URL as a topic). I don’t think we need to go that far – as long as the segments that matter match, we don’t need to have the same root. I don’t think that the entire URL makes sense as a topic other than symbolically, unless you are writing a ‘superbroker’ – a server that is both a broker and a REST server (and a floor vax and a desert topping). Or an MQTT-REST bridge. Our approach is purely that of API mirroring – a convention that still expects from services to connect to a MQTT broker of their choice.

REST/MQTT API in action

So how does our approach look in practice? Essentially, you start with a normal REST API and add MQTT messages for REST endpoints that result in a state change (POST/PUT/PATCH/DELETE).

For example, let’s say we have an API service responsible for serving people profiles. The REST endpoints may look something like this:

GET /people – this returns an array of JSON objects, one per person

GET /people/:id – this returns a single JSON object of a person with the provided id, something like:

{
  "id": "johndoe",
  "name": "John Doe",
  "email": "jdoe@example.com"
}

PATCH /people/:id – this updates select fields of the person (say, name and email – we don’t support changing the id). The sequence diagram of using such an API service may look like this:

MQTT-REST-sequence

The sequence starts with the client service B making an HTTP GET request to fetch a resource for John Doe. API service will return JSON for the requested person as expected. After that, another service (client A) issues a PATCH request to update John Doe’s email address. API service will execute the request, return updated JSON for John Doe in the response, then turn around and publish a message to notify subscribers that ‘/people/johndoe’ has changed. This message is delivered to both clients that are subscribed to ‘people/+’ topics (i.e. changes to all people resources). This allows service B to react to this change.

Topics and message bodies

Since MQTT is now part of the formal API contract, we must document it for each REST endpoint that causes state change. There is no hard and fast rule on how to do this, but we are using the following conventions:

POST endpoints publish a message into the matching MQTT topic with the following shape:

{
  "event": "created",
  "state": { /* JSON returned in the POST response body */ }
}

PUT and PATCH endpoints use the following shape:

{
  "event": "modified",
  "changes": { "email": "johndoe@example.com" }
}

The shape above is useful when only a few properties have changed. If the entire object has been replaced, an alternative would be:


{
  "event": "modified",
  "state": { /* JSON returned in the PUT response body */ }
}

Finally, a message published upon a DELETE endpoint looks like this:

{
  "event": "deleted"
}

Handling i18n

If the API service is returning JSON with translatable strings, it is customary to honor ‘Accept-Language’ HTTP header if present and return the string in the appropriate locale. Alternatively, ‘lang’ query parameter can be used either on its own or as an override of the header. This all seems straightforward.

The things get complicated when you reverse the flow. API service publishing a message cannot know in advance which languages will be needed by the subscribers. We don’t have a fully satisfactory answer for this, but our current thinking is to borrow from JSON-LD and include multiple versions of translatable strings in the message body, in a way that is done in Activity Streams 2.0 draft:

{
  "object": {
    "type": "article",
    "displayName": {
      "en": "A basic example",
      "fr": "Un exemple basique"
    }
  }
}

Conclusion

While others have attempted to create a formal bridge between the REST and MQTT worlds, when building a system using micro-services we are content with achieving REST/MQTT API mirroring through convention. We find the two protocols to be great companions, packing a mighty one-two punch that maintains API testability and clear contract while making the system more dynamic, and providing for looser coupling and more sustainable future growth.

© Dejan Glozic, 2014