Socket.io: Mind the Gap

Wikimedia Commons, 'Mind the Gap', 2008, Clicsouris
Wikimedia Commons, ‘Mind the Gap’, 2008, Clicsouris

Welcome to our regular edition of ‘Socket.io version 1.0 watch’ or ‘Making sure Guillermo Rauch is busy working on Socket.io 1.0 instead of whatever he does to pay the rent that does nothing for me’. I am happy to inform you that Socket.io 1.0 is now available, with the new logo and everything. Nice job!

With that piece of good news, back to our regular programming. First, a flashback. When I was working on my doctoral studies in London, England, one of the most memorable trivia was a dramatic voice on the London Underground PA system warning me to ‘Mind the Gap’. Since then I seldomly purchase my clothes in The Gap, choosing its more upmarket sibling Banana Republic. JCrew is fine too. A few years ago a friend went to London to study and she emailed me that passengers are still reminded about the dangers of The Gap.

We have recently experienced a curious problem in our usage of WebSockets – our own gap to mind, as it were. It involves a topology I have already written about. You will most likely hit it too, so here it goes:

  1. A back-end system uses message queue to pass messages about state changes that affect the UI
  2. A micro-service serves a Web page containing Socket.io client that turns around and establishes a connection with the server once page has been loaded
  3. In the time gap between the page has been served and the client calls back to establish a WebSockets connection, new messages arrive that are related to the content on the page.
  4. By the time the WebSockets connection has been established, any number of messages will have been missed – a message gap of sorts.
mind-the-gap
WebSockets gap: in the period of time from HTTP GET response to the establishment of the WebSockets connection, msg1 and msg2 were missed. The client will receive messages starting from the msg3.

The gap may or may not be a problem for you depending on how you are using the message broker to pass messages around micro-services. As I have already written in the post about REST/MQTT mirroring, we are using MQTT to augment the REST API. This augmentation mirrors the CRUD verbs that result in state change (CUD). The devil is in the details here, and the approach taken will decide whether the ‘message gap’ is going to affect you or not.

When deciding what to publish to the subscribers using MQ, we can take two approaches:

  1. Assume subscribers have made a REST call to establish the baseline state, and only send deltas. The subscribers will work well as long as the took the baseline and didn’t miss any of the deltas for whatever reason. This is similar to showing a movie on a cable channel in a particular time slot – if you miss it, you miss it.
  2. Don’t assume subscribers have the baseline state. Instead, assume they may have been down or not connected. Send a complete state of the resource alongside the message envelope. This approach is similar to breaking news being repeated many times during the day on a news channel. If you are just joining, you will be up to date soon.

The advantages of the first approach are the message payloads. There is no telling how big JSON resources can be (a problem recently addressed by Tim Bray in his fat JSON blog post). Imagine we are tracking a build resource and it is sending us updates on the progress (20%, 50%, 70%). Do we really want to receive the entire Build resource JSON alongside this message?

On the other hand, the second approach is not inconsistent with the recommendation for PUT and PATCH REST responses. We know that the newly created resource is returned in the response body for POST requests (alongside Location header). However, it is considered a good practice to do the same in the requests for PUT and PATCH. If somebody moves the progress bar of a build by using PATCH to update the ‘progress’ property, the entire build resource will thus be returned in the response body. The service fielding this request can just take that JSON string and also attach it to the message under the ‘state’ property, as we are already doing for POST requests.

Right now we didn’t make up our minds. Sending around entire resources in each message strikes us as wasteful. This message will be copied into each queue of the subscribers listening to it, and if it is durable, will also be persisted. That’a a lot of bites to move around while using a protocol whose main selling point is that it is light on the resources. Imagine pushing these messages to a native mobile client over the air. Casually attaching entire JSON resources to messages is not something you want to do in these situations.

In the end, we solved the problem without changing our ‘baseline + deltas’ approach. We tapped into the fact that messages have unique identifiers attached to them as part of the envelope. Each service that is handling clients via WebSockets has a little buffer of messages that are published by the message broker. When we send the page the client, we also send the ID of the last known message embedded in HTML as data. When WebSockets connection is established, the client will communicate (emit) this message ID to the server, and the server will check the buffer if new messages have arrived since then. If so, it will send those messages immediately, allowing the client to catch up – to ‘bridge the gap’. After it has been caught up, the message traffic resumes as usual.

As a bonus, this approach works for cases where the client drops the WebSockets connection. When connection is re-established, it can use the same approach to catch up on the messages it has missed.

The fix: the service sends the ‘message marker’ (last message id). Client echoes the marker when connecting with WebSockets. Detecting the hole in message sequence, the service immediately sends the missing messages allowing the client to catch up.

As you can see, we are still learning and evolving our REST/MQTT mirroring technique, and we will most likely encounter more face-palm moments like this. The solution is not perfect – in an extreme edge case, the WebSockets connection can take so long that the service message buffer fills up and old messages start dropping off. A solution in those cases is to refresh the browser.

We are also still intrigued with sending the state in all messages – there is something reassuring about it, and the fact that the similarity to PATCH/PUT behavior only reinforces the mirroring aspect is great. Perhaps our resources are not that large, and we are needlessly fretting over the message sizes. On the other hand, when making a REST call, callers can use ‘fields’ and ’embed’ to control the size of the response. Since we don’t know what any potential subscriber will need, we have no choice but to send the entire resource. We need to study that approach more.

That’s it from me this week. Live long, prosper and mind the gap.

© Dejan Glozic, 2014

Advertisements

SoundCloud is Reading My Mind

Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, Cincinnati & New York, 1900
Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, 1900

“Bad artists copy. Good artists steal.”

– Pablo Picasso

It was bound to happen. In the ultra-connected world, things are bound to feed off of each other, eventually erasing differences, equalizing any differential in electric potentials between any two points. No wonder the weirdest animals can be found on islands (I am looking at you, Australia). On the internet, there are no islands, just a constant primordial soup bubbling with ideas.

The refactoring of monolithic applications into distributed systems based on micro-services is slowly becoming ‘a tale as old as time’. They all follow a certain path which kind of makes sense when you think about it. We are all impatient, reading the first few Google search and Stack Overflow results ‘above the fold’, and it is no coincidence that the results start resembling majority rule, with more popular choices edging out further and further ahead with every new case of reuse.

Luke Wroblewski of Mobile First fame once said that ‘two apps do the same thing and suddenly it’s a pattern’. I tend to believe that people researching the jump into micro-services read more than two search results, but once you see certain choices appearing in, say, three or four stories ‘from the trenches’, you become reasonably convinced to at least try them yourself.

If you were so kind as to read my past blog posts, you know some of they key points of my journey:

  1. Break down a large monolithic application (Java or RoR) into a number of small and nimble micro-services
  2. Use REST API as the only way these micro-services talk to each other
  3. Use message broker (namely, RabbitMQ) to apply event collaboration pattern and avoid annoying inter-service polling for state changes
  4. Link MQ events and REST into what I call REST/MQTT mirroring to notify about resource changes

Then this came along:

As I was reading the blog post, it got me giddy at the realization we are all converging on the emerging model for universal micro-service architecture. Solving their own unique SoundCloud problems (good problems to have, if I may say – coping with millions of users falls into such a category), SoundCloud developers came to very similar realizations as many of us taking a similar journey. I will let you read the post for yourself, and then try to extract some common points.

Stop the monolith growth

Large monolithic systems cannot be refactored at once. This simple realization about technical debt actually has two sub-aspects: the size of the system at the moment it is considered for a rewrite, and the new debt being added because ‘we need these new features yesterday’. As with real world (financial) debt, the first order of business is to ‘stop the bleeding’ – you want to stop new debt from accruing before attempting to make it smaller.

At the beginning of this journey you need to ‘draw the line’ and stop adding new features to the monolith. This rule is simple:

Rule 1: Every new feature added to the system will from now on be written as a micro-service.

This ensures that precious resources of the team are not spent on making the monolith bigger and the finish line farther and farther on the horizon.

Of course, a lot of the team’s activity involves reworking the existing features based on validated learning. Hence, a new rule is needed to limit this drain on resources to critical fixes only:

Rule 2: Every existing feature that requires significant rework will be removed and rewritten as a micro-service.

This rule is somewhat less clear-cut because it leaves some room for the interpretation of ‘significant rework’. In practice, it is fairly easy to convince yourself to rewrite it this way because micro-service stacks tend to be more fun, require fewer files, fewer lines of code and are more suitable for Web apps today. For example, we don’t need too much persuasion to rewrite a servlet/JSP service in the old application as a Node.js/Dust.js micro-service whenever we can. If anything, we need to practice restraint and not fabricate excuse to rewrite features that only need touch-ups.

US_Beef_cuts_svg
Micro-services as BBQ. Mmmmm, BBQ…

An important corollary of this rule is to have a plan of action ahead of time. Before doing any work, have a ‘cut of beef’ map of the monolith with areas that naturally lend themselves to be rewritten as micro-services. When the time comes for a significant rework in one of them, you can just act along that map.

As is the norm these days, ‘there’s a pattern for that’, and as SoundCloud guys noticed, the cuts are along what is known as bounded context.

Center around APIs

As you can read at length on the API evangelist’s blog, we are transforming into an API economy, and APIs are becoming a central part of your system, rather than something you tack on after the fact. If you could get by with internal monolith services in the early days, micro-services will force you to accept APIs as the only way you communicate both inside your system and with the outside world. As SoundCloud developers realized, the days of integration around databases are over – APIs are the only contact points that tie the system together.

Rule 3: APIs should be the only way micro-services talk to each other and the outside world.

With monolithic systems, APIs are normally not used internally, so the first APIs to be created are outward facing – for third party developers and partners. A micro-service based system normally starts with inter-service APIs. These APIs are normally more powerful since they assume a level of trust that comes from sitting behind a  firewall. They can use proprietary authentication protocols, have no rate limiting and expose the entire functionality of the system. An important rule is that they should in no way be second-class compared to what you would expose to the external users:

Rule 4: Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.

Once you have the internal APIs designed this way, deciding which subset to expose as public API stops becoming a technical decision. Your external APIs look like internal with the exception of stricter visibility rules (who can see what), rate limiting (with the possibility of a rate-unlimited paid tier), and authentication mechanism that may differ from what is used internally.

Rule 5: Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.

SoundClound developers went the other way (public API first) and realized that they cannot build their entire system with the limitations in place for the public APIs, and had to resort to more powerful internal APIs. The delicate balance between making public APIs useful without giving out the farm is a decision every business need to make in the API economy. Micro-services simply encourage you to start from internal and work towards public.

Messaging

If there was a section in SoundCloud blog post that made me jump with joy was a section where they discussed how they arrived at using RabbitMQ for messaging between micro-services, considering how I write about that in every second post for the last three months. In their own words:

Soon enough, we realized that there was a big problem with this model; as our microservices needed to react to user activity. The push-notifications system, for example, needed to know whenever a track had received a new comment so that it could inform the artist about it. At our scale, polling was not an option. We needed to create a better model.

 

We were already using AMQP in general and RabbitMQ in specific — In a Rails application you often need a way to dispatch slow jobs to a worker process to avoid hogging the concurrency-weak Ruby interpreter. Sebastian Ohm and Tomás Senart presented the details of how we use AMQP, but over several iterations we developed a model called Semantic Events, where changes in the domain objects result in a message being dispatched to a broker and consumed by whichever microservice finds the message interesting.

I don’t need to say much about this – read my REST/MQTT mirroring post that describes the details of what SoundCloud guys call ‘changes in the domain objects result in a message’. I would like to indulge in a feeling that ‘great minds think alike’, but more modestly (and realistically), it is just common sense and RabbitMQ is a nice, fully featured and reliable open source polyglot broker. No shocking coincidence – it is seen in many installations of this kind. Let’s make a rule about it:

Rule 6: Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

All together now

Let’s pull all the rules together. As we speak, teams around the world are suffering under the weight of large unwieldy monolithic applications that are ill-fit for the cloud deployment. They are intrigued by micro-services but afraid to take the plunge. These rules will make the process more manageable and allow you to arrive at a better system that is easier to grow, deploy many times a day, and more reactive to events, load, failure and users:

  1. Every new feature added to the system will from now on be written as a micro-service.
  2. Every existing feature that requires significant rework will be removed and rewritten as a micro-service.
  3. APIs should be the only way micro-services talk to each other and the outside world.
  4. Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.
  5. Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.
  6. Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

This is a great time to build micro-service based systems, and collective wisdom on the best practices is converging as more systems are coming online. I will address the topic of APIs in more detail in one of the future posts. Stay tuned, and keep reading my mind!

© Dejan Glozic, 2014

REST and MQTT: Yin and Yang of Micro-Service APIs

Yin_and_yang_stones

It seemed that the worst was over – I haven’t heard a single new portmanteau of celebrity names in a while (if you exclude ‘Shamy’ which is a super-couple name of Sheldon and Amy from The Big Bang Theory but being a plot device, I don’t think it counts). Then when I researched for this blog post I stumbled upon project QEST, a mashup of MQTT and REST. Et tu, Matteo Collina?

What Matteo did in the project QEST is an attempt to bridge the world of apps speaking REST and the world of devices speaking MQTT with one bilingual broker. I find the idea intriguing and useful in the context of the IoT world. However, what I am trying to achieve with this post is address the marriage of these two protocols in the context of micro-service-based distributed systems. In a sense, we are re-purposing a protocol not primarily created for this but that exhibits enough flexibility and simplicity to fit right in.

You keep saying that

I think I have written about usefulness of message brokers in micro-service systems often enough to reasonably expect it to be axiomatic by now. From the point of view of service to service interaction, REST poses a problem when services depend on being up to date with data they don’t own and manage. Being up to date requires polling, which quickly add up in a system with enough interconnected services. As Martin Fowler has pointed out in the article on the event collaboration pattern, reversing the data flow has the benefits of reacting to data changes, rather than unceasingly asking lest you miss a change.

However, the problem with this data flow reversal when implemented literally is that onus of storing the data is put on the event recipients. Storing the data in event subscribers allows them to be self-sufficient and resilient – they can operate even if the link to the event publisher is temporarily severed. However, with every second of the link breakage they operate on potentially more and more stale data. It is a case of ‘pick your poison’ – with apps using the request-response collaboration pattern, a broken link will mean that no collaboration is happening, which may or may not be preferred to acting on outdated information.

As we are gaining more experience with micro-service-based systems, and with the pragmatic assumption that message broker can fail, we are finding event collaboration on its own insufficient. However, augmenting REST with messaging results in a very powerful combination – two halves of one complete picture. This is how this dynamic duo works:

  1. The service with a REST API will field requests by client services as expected. This establishes the baseline state.
  2. All the services will simultaneously connect to a message broker.
  3. API service will fire messages notifying about data changes (essentially for all the verbs that can cause the change, in most cases POST, PUT, PATCH and DELETE).
  4. Clients interested in receiving data updates will react to these changes according to their functionality.
  5. In cases where having the correct data is critical, client services will forgo the built-up baseline + changes state and make a new REST call to establish a new baseline before counting on it.

How is this different from a pure implementation of Event Collaboration pattern?

  1. Messages are used to augment, not replace REST. This is in contrast to, say, Twitter streaming API where you need to make a choice (you will either use REST or stream the tweets using an HTTP connection that you keep open).
  2. While message brokers are reliable and there are ways to further increase this durability (delivery guarantees, durable queues, quality of service etc.), REST is still counted on establishing a ‘clean slate’. Of course, REST can fail too, but if it does, you have no data, as opposed to old and therefore incorrect data.
  3. Client services are not required to store data. For this to work, they still need to track the baseline data they obtained through the REST call and be able to correlate messages to this baseline. For example, if a client service rendered a Web page from the data obtained from a REST API, it should be able to detect that a message it received will affect this web page and use something like Web Sockets to update the page accordingly.

OK, but what is the actual contract?

Notice how I have mentioned the word ‘API’ multiple times, while I keep talking about ‘messaging’ in a non-committal way. And yet, there is no ‘generic’ API – by definition it requires clear contract in the way client services can interact with the API service. If we are to extend REST Yin with the messaging Yang, it has to be a true companion and become part of the API contract.

This is where MQTT comes in. As an Oasys standard, it is vendor-neutral in the same way as REST. While the protocol spec itself is detailed and intricate, most of the experience of using the protocol is ‘publishers publish messages into topics and subscribers subscribe to said topics’. That’s it.

A very useful characteristic of MQTT topic structure is that it can contain delimiters (‘/’), which opens up a possibility to sync up REST URLs and topics. This prompted some developers such as Matteo to go for full parity (essentially using the REST URL as a topic). I don’t think we need to go that far – as long as the segments that matter match, we don’t need to have the same root. I don’t think that the entire URL makes sense as a topic other than symbolically, unless you are writing a ‘superbroker’ – a server that is both a broker and a REST server (and a floor vax and a desert topping). Or an MQTT-REST bridge. Our approach is purely that of API mirroring – a convention that still expects from services to connect to a MQTT broker of their choice.

REST/MQTT API in action

So how does our approach look in practice? Essentially, you start with a normal REST API and add MQTT messages for REST endpoints that result in a state change (POST/PUT/PATCH/DELETE).

For example, let’s say we have an API service responsible for serving people profiles. The REST endpoints may look something like this:

GET /people – this returns an array of JSON objects, one per person

GET /people/:id – this returns a single JSON object of a person with the provided id, something like:

{
  "id": "johndoe",
  "name": "John Doe",
  "email": "jdoe@example.com"
}

PATCH /people/:id – this updates select fields of the person (say, name and email – we don’t support changing the id). The sequence diagram of using such an API service may look like this:

MQTT-REST-sequence

The sequence starts with the client service B making an HTTP GET request to fetch a resource for John Doe. API service will return JSON for the requested person as expected. After that, another service (client A) issues a PATCH request to update John Doe’s email address. API service will execute the request, return updated JSON for John Doe in the response, then turn around and publish a message to notify subscribers that ‘/people/johndoe’ has changed. This message is delivered to both clients that are subscribed to ‘people/+’ topics (i.e. changes to all people resources). This allows service B to react to this change.

Topics and message bodies

Since MQTT is now part of the formal API contract, we must document it for each REST endpoint that causes state change. There is no hard and fast rule on how to do this, but we are using the following conventions:

POST endpoints publish a message into the matching MQTT topic with the following shape:

{
  "event": "created",
  "state": { /* JSON returned in the POST response body */ }
}

PUT and PATCH endpoints use the following shape:

{
  "event": "modified",
  "changes": { "email": "johndoe@example.com" }
}

The shape above is useful when only a few properties have changed. If the entire object has been replaced, an alternative would be:


{
  "event": "modified",
  "state": { /* JSON returned in the PUT response body */ }
}

Finally, a message published upon a DELETE endpoint looks like this:

{
  "event": "deleted"
}

Handling i18n

If the API service is returning JSON with translatable strings, it is customary to honor ‘Accept-Language’ HTTP header if present and return the string in the appropriate locale. Alternatively, ‘lang’ query parameter can be used either on its own or as an override of the header. This all seems straightforward.

The things get complicated when you reverse the flow. API service publishing a message cannot know in advance which languages will be needed by the subscribers. We don’t have a fully satisfactory answer for this, but our current thinking is to borrow from JSON-LD and include multiple versions of translatable strings in the message body, in a way that is done in Activity Streams 2.0 draft:

{
  "object": {
    "type": "article",
    "displayName": {
      "en": "A basic example",
      "fr": "Un exemple basique"
    }
  }
}

Conclusion

While others have attempted to create a formal bridge between the REST and MQTT worlds, when building a system using micro-services we are content with achieving REST/MQTT API mirroring through convention. We find the two protocols to be great companions, packing a mighty one-two punch that maintains API testability and clear contract while making the system more dynamic, and providing for looser coupling and more sustainable future growth.

© Dejan Glozic, 2014