The Web of Hooks

hooks

It’s a new year, which means it is time to update the copyright at the bottom. I have been busy lately and I didn’t want to write fluffy blogs just to fill the space. But this time I actually have real content, so here it goes.

If you followed my blog in the past, you know I was very bullish on message brokers in the context of microservice architecture. I claimed that REST APIs alone are not sufficient to sustain a successful microservice system. Event collaboration pattern is necessary to ensure a scalable and robust system where microservices that own resource lifecycle don’t need to be burdened with executing all the code driven by that lifecycle. You can read more about it in my article about REST/MQ mirroring.

Not so fast

There is a fly in this particular ointment, however. While HTTP REST is well understood, messaging protocols are numerous. You may know from my writing that I liked MQTT for a while due to its simplicity and wide support. However, MQTT is actually not a great protocol for gluing your microservice system. It is designed to be lightweight and run on the smallest of IoT devices, and lacks some critical features such as anycast support and manual acknowledgement of messages.

Another popular protocol (AMQP) suffered a schism of sorts. The version supported in a popular RabbitMQ broker (0.91) is a very different protocol from an actual open standard AMQP 1.0 that is not as widely supported. This is a pity because I really like AMQP 1.0. It is like MQTT but with anycast and manual acknowledgment – perfect as a microservice system glue, yet still very simple to work from within clients.

And of course, there is Apache Kafka, a very powerful but odd choice given its origins in high-throughput distributed log aggregation. Kafka is unapologetically Java-centric, has a proprietary protocol and bolt-on REST APIs to connect client languages are still fairly low level. For example, where AMQP 1.0 guarantees quality of service and requires implementations to provide a buffer of messages that are re-delivered to a client that crashed, Kafka simply allows you to maintain a pointer in the queue but it is your job to work up the queue and catch up after restarting. You pay for performance by working at a level fairly low for general purpose application messaging.

Authentication pains

Choosing the messaging solution and protocol is only one part of the problem. If you want your system to be extensible and allow third-party integrations, you need to make it fairly easy for new clients to connect. On the other hand, you need to secure the clients because you don’t want a rogue app to eavesdrop on the events in the system without authorization (otherwise you have this egg on the bot face). Maintaining a large list of clients connected to the same topics can also be an scalability issue for some brokers.

For all these reasons, it has been widely acknowledged that message brokers are not a good match for external integrations. A cursory scan of popular cloud applications with large ecosystems all point at a more client-friendly alternative – WebHooks.

WebHooks to the rescue

You would think that if something is so popular and has a catchy name, there is an actual well written protocol for it. Wrong! WebHooks is the least common denominator you could imagine. In a nutshell, this is all there is:

  • You publish a list of valid events for which you will notify clients
  • You provide an API (and/or UI) for clients to register URLs for one or more of those events
  • When an event happens, you execute HTTP POST on the URLs registered for that event type.

That’s all. Granted, message brokers have topics you can publish and subscribe, and the actual messages they pass around are free-form, so this is not very different. But absent from WebHooks are things such as anycast, Quality of Service, manual acknowledgement etc.

I don’t intend to go into the details of what various implementations of WebHooks in apps such as GitHub, Slack etc. provide because thankfully Giuliano Iacobelli already wrote such an article. My interest here is to apply this knowledge to a microservice system we are building and try to anticipate pros and cons of going with WebHooks.

What it would take

First thing that comes to mind is that in order to support WebHooks, we would need to write a new WebHook service. Its role would be to accept registrations, and store URL and event type mappings for subsequent invocation. Right there, my first thought is about the difference between external and internal clients. External clients would most likely use the UI to register a URL of their integration. This is how you register your script in GitHub so that it runs on every commit, for example.

However, with internal clients we would have a funny problem: every time I restart a microservice instance, I would need to register somewhere in startup. That would make a POST endpoint a nonstarter, because I don’t want to keep creating new registrations. Instead, a PUT with a client ID would work better, where an existing registration for the same ID would just be updated if already there.

Other than that, the service would offer a POST for a new message into the provided event type that would be delivered to all registered URLs for that type. Obviously it would need to guard against 404s, 502s and URLs that take too long to return response, giving up on them after a set timeout.

The best of both worlds

The set timeout brings back the topic of the quality of service, implying that WebHooks are great for external integration but not that great for reliable glue of a microservice system. Why don’t we marry the two then? We could continue to use message broker for reliable delivery of internal messages, and hook it up to a WebHook service that would notify external integrations without the need to support our particular protocol, or get too much access into the sensitive innards. Hooking up a WebHook service to a message broker would have the added benefit of buffering the service itself so that it can be restarted and updated without interruption and missed events.

webhooks

In the diagram above, our microservice system has the normal architecture with a common routing proxy providing a single domain entry into the microservices. The microservices use normal message broker clients to publish to topics. A subset of these topics deemed suitable for external integrations is also listened to by the WebHook service, and for each of those messages it reaches into the stored list of registrations and calls HTTP POST on the registered URLs. If the WebHook service crashes, a reputable message broker will maintain a buffer of messages to re-deliver them upon restart. For performance reasons, WebHook service can choose to keep a subset of registrations in the in-memory cache depending on how frequently they are used.

Discussion

Obviously registering a URL with an HTTP PUT is much easier to implement, and providing a single POST endpoint to handle the event lowers the barrier of entry for external integrations. In fact, hooking up code to react to a single POST could very well be done using serverless architecture.

Are we losing something in the process? Inserting another service into the flow will add a bit of a delay but external notifications are normally for events that are not happening many times a second, so the tiny delay is more than acceptable tradeoff. In addition, if the client providing WebHook URL is itself load-balanced, this delivery will be hardcoded to anycast (the event notification will only hit one of the instances in the cluster).

Finally, this creates two classes of clients – ‘inner circle’ and external, segregated clients. Inner circle is hooked up directly to the message broker, while the external clients go through the service. In addition to this being an acceptable price to pay for easier integration, it is useful to be able to only expose a subset of events externally – some highly sensitive internal events may only be available to ‘trusted’ clients subscribing to message broker topics and having internal credentials.

Since the WebHook service will normally not keep retrying to deliver an event to an unresponsive URL, it is possible to miss an event. If this is a problem, external system would need to fashion a ‘belt and suspenders’ fortification, where event driven approach is augmented with a periodic REST API call to ‘compare notes’ and ensure the baseline it is working against is up to date.

© Dejan Glozic, 2017

Advertisements

Angular.js 2.0, Index Investing and Micro-Services

Beuckelaer_Girl_with_a_basket_of_eggs
Beuckelaer: Girl with a basket of eggs, Wikimedia Commons.

Now here is somebody with all her eggs in one basket, literarily. I used it to illustrate what index investing tries to avoid. I thought of index investing while reading the bitter and often hilarious reactions to the announced changes in Angular.js 2.0 on Reddit. I also thought about my experience with Google services. Watch me tie all these things together in one magic feat.

First off, let me be the first to acknowledge that righteous indignation about changes to a free product or service is always a bit rich. “I used this for my benefit for months and now they changed it – I demand they fix it and continue to invest real money so that I can continue using it for free”. Right.

That thing off the table, here is my amusing experience with Google services. A while ago I amassed a number of feeds I wanted to keep up with every morning over breakfast. I created a nice multi-page dashboard using iGoogle. It worked, and it was even responsive – it loaded fast and works well on my iPhone.

Then one day Google pulled the plug on it. After venting my, you guessed it, righteous indignation for a while, I looked for a replacement and found that Google Reader can be used for that. So I moved all my feeds to it. You guess what happened next – they pulled the plug on it too, and I had to move my feeds to Feedly, where they remain to this day.

Apart from feeling like the first of the three little piggies forced to change its address often due to the certain wolf, it taught me how Google feels about its free services. While it has a number of exciting and often groundbreaking products in the air at any point, you better steer clear if long term stability is important to you. Google engineers are not sentimental about their software and change their minds at an alarming frequency, which moves things forward but also leaves a lot of victims in their wake.

The moment I learned about Angular.js and how it was bestowed on the world and maintained by Google, my first thought was ‘uh, oh’. It had Google fingerprints all over it:

  1. A vertically integrated opinionated framework that attempts to solve all your needs.
  2. It does not play well with other well loved and popular libraries
  3. A lot of the approaches need getting used to and don’t look like anything you have seen before
  4. As a result, the learning curve is steep, and once you climbed it, you feel personally invested to a degree that is not healthy

And now with the announcement of Angular.js 2.0, we have the final shoe to drop – Google’s famous impatience with continuity and careful evolution. There were many people on Reddit who evangelized for Angular 1.x in their companies, and now feel betrayed. Others are contemplating switching to Knockout, Backbone, or leaving Web development altogether.

Index Investing

To change the pace a bit, let’s look at index investing. It is an investment technique that openly gives up on picking stock market winners and losers. Through the bitter experience, some people discovered that their stock picks are worse than if they let a blindfolded monkey choose their investments by throwing darts. Instead, they decided to invest in a basket of investments in each category, using low-fee instruments such as ETFs (Exchange Traded Fonds). All empirical evidence suggests most people can’t pick winners if their life depended on it, and even those who can cannot sustain that track record over any length of time. Full disclosure – I moved all my investments to index funds and my results are way better than my dart throwing years.

Index investing is all about diversification, asset allocation and risk containment. It applies to Web development more than you think.

The trouble with frameworks

I was picking on Angular.js, but I didn’t need to go that far outside my own company. We at IBM have written a hot mess of Web UIs using the Dojo framework. Truth to be told, a few years ago it was a pretty decent option and had some solutions that mattered to enterprises (i18n, important widgets such as sorting tables and trees, and support for all god-awful IE browsers known to man). The problem is that once you write all that code on top of it, you are stuck – you are forever bound to it. Dojo was a basket we put all our eggs in, like that girl above.

In the investment analogy, we bought one stock for all our money. Any investor will tell you that such a strategy is crazy – way too much alpha for a good night sleep. At the first opportunity to reflect on our strategy and devise something saner, we decided to use stable, standard-based protocols for integration, and confine stack choices to individual services. While we can change our minds on the implementation of one service, the other services can continue to work because the integration protocols are stable.

We also learned the hard way that frameworks tend to generate additional work – the source of accidental complexity. If you find yourself spending nontrivial amount of time ‘feeding the framework’ i.e. writing code not because it makes sense for your project but because the framework needs it done a certain special way, you are the victim of accidental complexity.

As a result, we also developed a strong preference for toolkits over frameworks. It allows us to maintain control and have a better chance of avoiding nasty surprises such as Angular.js 2.0.

Micro-services and risk minimization

Our current love affair with micro-services have several reasons, many of which I have written about in the previous posts. However, one of the reasons is closely related to the subject of this post: risk control. Like in investing, our ability to pick the right framework has a dismal track record. Therefore, with our switch to micro-services, we focused first on the way they communicate with each other. We invested in stable REST APIs, and message brokers that pass messages around using open protocols such as MQTT and AMQP 1.0. Due partially to the glacial pace of protocol standardization, the danger of them changing overnight is much lower.

Our approach to individual micro-service implementation is then to confine risk to the service boundary. If the service is small enough, picking the wrong framework (if you even need a framework) will doom only that one service. A small service can be re-written if need be. The entire system cannot without a world of pain.

Essentially our approach is to officially declare that we will not assemble a committee to choose one framework to rule them all. We will apply the following mitigation rules instead:

  1. Use the simplest approach you can get away with
  2. If you can get away with server-side generated content, do it
  3. If you can get away with server-side content + jQuery + Bootstrap, do it
  4. If you need a bit of MV* magic, try Backbone combined with isomorphic templates (e.g. Dust.js partials that are reused on both server and client)
  5. If you must use Angular.js 1.3, do it, but you are on the hook to keep up with Google, and have a contingency rewriting plan
  6. We will NOT base any of the integration code on any of the frameworks de jour. Instead, we will use REST, AMQP/MQTT, JSON, HTML5, CSS3 and vanilla JS.

Be pessimistic

Someone once said that we are all writing legacy code every day, so we should strive to make it the best legacy code we can muster. Angular 1.3 turned from shiny to legacy in a flash (even though in ultra slow motion since 2.0 will only arrive in the early 2016). Our approach may be pessimistic, but it will help us sleep better in the years to come, and will make those that come after us curse us a bit less. Micro-services help in this regard because they confine the risk, in the way oil tankers break up the cargo space into compartments. If you ensure that you can change your mind about the implementation of each service, the risk and importance of choosing the right framework diminishes.

The right question should not be “should I use Backbone.js, Angular.js, Ember.js or something else”. The question should be: will I be able to recover when the ADD-suffering maintainers of your framework of choice inevitably lose interest.

Right now, a starry-eyed college dropout is writing the next shiny framework to take the world by storm. With the micro-service approach, you will be able to give it a shot without betting the farm on it. You are welcome.

© Dejan Glozic, 2014

HA All The Things

HA all the things

I hate HA (High Availability). Today everything has to be highly available. All of the sudden SA (Standard Availability) isn’t cutting it any more. Case in point: I used to listen to music on my way to work. Not any more – my morning meeting schedule intrudes into my ride, forcing me to participate in meetings while driving, Bluetooth and all. My 8 speaker, surround sound Acura ELS system hates me – built for high resolution multichannel reproduction, it is reduced to ‘Hi, who just joined?’ in glorious mono telephony. But I digress.

You know that I wrote many articles on micro-services because it is our ongoing concern as we are slowly evolving our topology away from monolithic systems and towards micro-services. I have already written about my thoughts on now to scale and provide HA for Node.js services. We have also solved our problem of handling messaging in a cluster using AMQP worker queues.

However, we are not done with HA. Message broker itself needs to be HA, and we only have one node. We are currently using RabbitMQ, and so far it has been rock solid, but we know that in a real-world system it is not a matter of ‘if’ but ‘when’ it will suffer a problem, bringing all the messaging capabilities of the system with it. Or we will mess around with the firewall rules and block access to it by accident. Hey, contractors rupture gas pipes and power cables by accident all the time. Don’t judge.

Luckily RabbitMQ can be clustered. RabbitMQ documentation is fairly extensive on clustering and HA. In short, you need to:

  1. Stand up multiple RabbitMQ instances (nodes)
  2. Make sure all the instances use the same Erlang cookie which allows them to talk to each other (yes, RabbitMQ is written in Erlang; you learn on the first day when you need to install Erlang environment before you install Rabbit)
  3. Cluster nodes by running rabbitmqctl join_cluster –ram rabbit@<firstnode> on the second server
  4. Start the nodes and connect to any of them

RabbitMQ has an interesting feature in that nodes in the cluster can join in RAM mode or in disc mode. RAM nodes will replicate state only in memory, while in disc mode they will also write it to disc. While in theory it is enough to have only one of the nodes in the cluster use disc mode, performance gain of using RAM mode is not worth the risk (performance gain of RAM mode is restricted to joining queues and exchanges, not posting messages anyway).

Not so fast

OK, we cluster the nodes and we are done, right? Not really. Here is the problem: if we configure the clients to connect to the first node and that node goes down, messaging is still lost. Why? Because RabbitMQ guys chose to not implement the load balancing part of clustering. The problem is that clients communicate with the broker using TCP protocol, and Swiss army knives of proxying/caching/balancing/floor waxing such as Apache or Nginx only reverse-proxy HTTP/S.

After I wrote that, I Googled just in case and found Nginx TCP proxy module on GitHub. Perhaps you can get away with just Nginx if you use it already. If you use Apache, I could not find TCP proxy module for it. It it exists, let me know.

What I DID find is that a more frequently used solution for this kind of a problem is HAProxy. This super solid and widely used proxy can be configured for Layer 4 (transport proxy), and works flawlessly with TCP. It is fairly easy to configure too: for TCP, you will need to configure the ‘defaults’, ‘frontend’ and ‘backend’ sections, or join both and just configure the ‘listen’ section (works great for TCP proxies).

I don’t want to go into the details of configuring HAProxy for TCP – there are good blog posts on that topic. Suffice to say that you can configure a virtual broker address that all the clients can connect to as usual, and it will proxy to all the MQ nodes in the cluster. It is customary to add the ‘check’ instruction to the configuration to ensure HAProxy will check that nodes are alive before sending traffic to them. If one of the brokers goes down, all the message traffic will be routed to the surviving nodes.

Do I really need HAProxy?

If you truly want to HA all the things, you need to now worry that you made the HAProxy a single point of failure. I told you, it never ends. The usual suggestions are to set up two instances, one standard and another backup for fail-over.

Can we get away with something simpler? It depends on how you define ‘simpler’. Vast majority of systems RabbitMQ runs on are some variant of Linux, and it appears there is something called LVS (Linux Virtual Server). LVS seems to be perfect for our needs, being a low-level Layer 4 switch – it just passes TCP packets to the servers it is load-balancing. Except in section 2.15 of the documentation I found this:

This is not a utility where you run ../configure && make && make check && make install, put a few values in a *.conf file and you’re done. LVS rearranges the way IP works so that a router and server (here called director and realserver), reply to a client’s IP packets as if they were one machine. You will spend many days, weeks, months figuring out how it works. LVS is a lifestyle, not a utility.

OK, so maybe not as perfect a fit as I thought. I don’t think I am ready for a LVS lifestyle.

How about no proxy at all?

Wouldn’t it be nice if we didn’t need the proxy at all? It turns out, we can pull that off, but it really depends on the protocol and client you are using.

It turns out not all clients for all languages are the same. If you are using AMQP, you are in luck. The standard Java client provided by RabbitMQ can accept a server address array, going through the list of servers when connecting or reconnecting until one responds. This means that in the event of node failure, the client will reconnect to another node.

We are using AMQP for our worker queue with Node.js, not Java, but the Node.js module we are using supports a similar feature. It can accept an array for the ‘host’ property (same port, user and password though). It will work with normal clustered installations, but the bummer is that you cannot install two instances on localhost to try the failure recovery out – you will need to use remote servers.

On the MQTT side, Eclipse Paho Java client supports multiple server URLs as well. Unfortunately, our standard Node.js MQTT module currently only supports one server. I was assured code contributions will not be turned away.

This solution is fairly attractive because it does not add any more moving parts to install and configure. The downside is that the clients becomes fully aware of all the broker nodes – we cannot just transparently add another node as we could in the case of the TCP load balancer. All the client must add it to the list of nodes to connect to for this addition to work. In effect, our code becomes aware of our infrastructure choices more than it should.

All this may be unnecessary for you if you use AWS since Google claims AWS Elastic Load Balancing can serve as a TCP proxy. Not a solution for us IBMers of course, but it may work for you.

Give me PaaS or give me death

This is getting pretty tiring – I wish we did all this in a PaaS like our own Bluemix so that it is all taken care of. IaaS gives you the freedom that can at times be very useful and allow you to do powerful customizations, but at other times makes you wish to get out of the infrastructure business altogether.

I told you I hate HA. Now if you excuse me, I need to join another call.

© Dejan Glozic, 2014

Node.js Apps and Periodic Tasks

397px-Kitchen_alarm_clock

When working on a distributed system of any size, sooner or later you will hit a problem and proclaim ‘well, this is a first’. My second proclamation in such situations is ‘this is a nice topic for the blog’. Truth to form, I do it again, this time with the issue of running periodic tasks, and the twist that clustering and high availability efforts add to the mix.

First, to frame the problem: a primary pattern you will surely encounter in a Web application is Request/Response. It is a road well traveled. Any ‘Hello, World’ web app is waving you a hello in a response to your request.

Now add clustering to the mix. You want to ensure that no matter what is happening to the underlying hardware, or how many people hunger for your ‘hello’, you will be able to deliver. You add more instances of your app, and they ‘divide and conquer’ the incoming requests. No cry for a warm reply is left unanswered.

Then you decide that you want to tell a more complex message to the world because that’s the kind of person you are: complex and multifaceted. You don’t want to be reduced to a boring slogan. You store a long and growing list of replies in a database. Because you are busy and have no time for standing up databases, you use one hosted by somebody else, already set up for high availability. Then each of your clustered nodes talk to the same database. You set the ‘message of the day’ marker, and every node fetches it. Thousands of people receive the same message.

Because we are writing our system in Node.js, there are several ways to do this, and I have already written about it. Of course, a real system is not an exercise in measuring HWPS (Hello World Per Second). We want to perform complex tasks, serve a multitude of pages, provide APIs and be flexible and enable parallel development by multiple teams. We use micro-services to do all this, and life is good.

I have also written about the need to use messaging in a micro-service system to bring down the inter-service chatter. When we added clustering into the mix, we discovered that we need to pay special attention to ensure task dispatching similar to what Node.js clustering or proxy load balancing is providing us. We found our solution in round-robin dispatching provided by worker queues.

Timers are something else

Then we hit timers. As long as information flow in a distributed system is driven by user events, clustering works well because dispatching policies (most often round-robin) are implemented by both the Node.js clustering and proxy load balancer. However, there is a distinct class of tasks in a distributed system that is not user-driven: periodic tasks.

Periodic tasks are tasks that are done on a timer, outside of any external stimulus. There are many reasons why you would want to do it, but most periodic tasks service databases. In a FIFO of a limited size, they delete old entries, collapse duplicates, extract data for analysis, report them to other services etc.

For periodic tasks, there are two key problems to solve:

  1. Something needs to count the time and initiate triggers
  2. Tasks need to be written to execute when initiated by these triggers

The simplest way to trigger the tasks is known by every Unix admin – cron. You set up a somewhat quirky cron table, and tasks are executed according to the schedule.

The actual job to execute needs to be provided as a command line task, which means your app that normally accesses the database needs to provide additional CLI entry point sharing most of the code. This is important in order to keep with the factor XII from the 12-factors, which insists one-off tasks need to share the same code and config as the long running processes.

 

There are two problems with cron in the context of the cloud:

  1. If the machine running cron jobs malfunctions, all the periodic tasks will stop
  2. If you are running your system on a PaaS, you don’t have access to the OS in order to set up cron

The first problem is not a huge issue since these jobs run only periodically and normally provide online status when done – it is relatively easy for an admin to notice when they stop. For high availability and failover, Google has experimented with a tool called rcron for setting up cron over a cluster of machines.

Node cron

The second problem is more serious – in a PaaS, you will need to rely on a solution that involves your apps. This means we will need to set up a small app just to run an alternative to cron that is PaaS friendly. As usual, there are several options, but node-cron library seems fairly popular and has graduated past the version 1.0. If you run it in an app backed by supervisor or PM2, it will keep running and executing tasks.

You can execute tasks in the same app where node-cron is running, providing these tasks have enough async calls themselves to allow the event queue to execute other callbacks in the app. However, if the tasks are CPU intensive, this will block the event queue and should be extracted out.

A good way of solving this problem would be to hook up the app running node-cron to the message broker such as RabbitMQ (which we already use for other MQ needs in our micro-service system anyway). The only thing node-cron app will do is publish task requests to the predefined topics. The workers listening to these topics should do the actual work:

node-cron

The problem with this approach is that a new task request can arrive while a worker has not finished running the previous task. Care should be taken to avoid workers stepping over each other.

Interestingly enough, a hint at this approach can be found in aforementioned 12-factors, in the section on concurrency. You will notice a ‘clock’ app in the picture, indicating an app whose job is to ‘poke’ other apps at periodic intervals.

There can be only one

A ‘headless’ version of this approach can be achieved by running multiple apps in a cluster and letting them individually keep track of periodic tasks by calling ‘setTimeout’. Since these apps share nothing, they will run according to the local server clock that may nor may not be in sync with other servers. All the apps may attempt to execute the same task (since they are clones of each other). In order to prevent duplication, each app should attempt to write a ‘lock’ record in the database before starting. To avoid deadlock, apps should wait random amount of time before retrying.

Obviously, if the lock is already there, apps should fail to create their own. Therefore, only one app will win in securing the lock before executing the task. However, the lock should be set to expire in a small multiple of times required to normally finish the task in order to avoid orphaned locks due to crashed workers. If the worker has not crashed but is just taking longer than usual, it should renew the lock to prevent it from expiring.

The advantage of this approach is that we will only schedule the next task once the current one has finished, avoiding the problem that the worker queue approach has.

Note that in this approach, we are not achieving scalability, just high availability. Of the several running apps, at least one app will succeed in securing the lock and executing the task. The presence of other apps ensures execution but does not increase scalability.

I have conveniently omitted many details about writing and removing the lock, retries etc.

Phew…

I guarantee you that once you start dealing with periodic tasks, you will be surprised with the complexity of executing them in the cloud. A mix of cloud, clustering and high availability makes running periodic tasks a fairly non-trivial problem. Limitations of PaaS environments compound this complexity.

If you visit TJ’s tweet above, you will find dozen of people offering alternatives in the replies (most of them being variations of *ron). The plethora of different solutions will be a dead giveaway that this is a thorny problem. It is not fully solved today (at least not in the context of the cloud and micro-service systems), hence so many alternatives. If you use something that works well for you, do share in the ‘Reply’ section.

© Dejan Glozic, 2014

SoundCloud is Reading My Mind

Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, Cincinnati & New York, 1900
Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, 1900

“Bad artists copy. Good artists steal.”

– Pablo Picasso

It was bound to happen. In the ultra-connected world, things are bound to feed off of each other, eventually erasing differences, equalizing any differential in electric potentials between any two points. No wonder the weirdest animals can be found on islands (I am looking at you, Australia). On the internet, there are no islands, just a constant primordial soup bubbling with ideas.

The refactoring of monolithic applications into distributed systems based on micro-services is slowly becoming ‘a tale as old as time’. They all follow a certain path which kind of makes sense when you think about it. We are all impatient, reading the first few Google search and Stack Overflow results ‘above the fold’, and it is no coincidence that the results start resembling majority rule, with more popular choices edging out further and further ahead with every new case of reuse.

Luke Wroblewski of Mobile First fame once said that ‘two apps do the same thing and suddenly it’s a pattern’. I tend to believe that people researching the jump into micro-services read more than two search results, but once you see certain choices appearing in, say, three or four stories ‘from the trenches’, you become reasonably convinced to at least try them yourself.

If you were so kind as to read my past blog posts, you know some of they key points of my journey:

  1. Break down a large monolithic application (Java or RoR) into a number of small and nimble micro-services
  2. Use REST API as the only way these micro-services talk to each other
  3. Use message broker (namely, RabbitMQ) to apply event collaboration pattern and avoid annoying inter-service polling for state changes
  4. Link MQ events and REST into what I call REST/MQTT mirroring to notify about resource changes

Then this came along:

As I was reading the blog post, it got me giddy at the realization we are all converging on the emerging model for universal micro-service architecture. Solving their own unique SoundCloud problems (good problems to have, if I may say – coping with millions of users falls into such a category), SoundCloud developers came to very similar realizations as many of us taking a similar journey. I will let you read the post for yourself, and then try to extract some common points.

Stop the monolith growth

Large monolithic systems cannot be refactored at once. This simple realization about technical debt actually has two sub-aspects: the size of the system at the moment it is considered for a rewrite, and the new debt being added because ‘we need these new features yesterday’. As with real world (financial) debt, the first order of business is to ‘stop the bleeding’ – you want to stop new debt from accruing before attempting to make it smaller.

At the beginning of this journey you need to ‘draw the line’ and stop adding new features to the monolith. This rule is simple:

Rule 1: Every new feature added to the system will from now on be written as a micro-service.

This ensures that precious resources of the team are not spent on making the monolith bigger and the finish line farther and farther on the horizon.

Of course, a lot of the team’s activity involves reworking the existing features based on validated learning. Hence, a new rule is needed to limit this drain on resources to critical fixes only:

Rule 2: Every existing feature that requires significant rework will be removed and rewritten as a micro-service.

This rule is somewhat less clear-cut because it leaves some room for the interpretation of ‘significant rework’. In practice, it is fairly easy to convince yourself to rewrite it this way because micro-service stacks tend to be more fun, require fewer files, fewer lines of code and are more suitable for Web apps today. For example, we don’t need too much persuasion to rewrite a servlet/JSP service in the old application as a Node.js/Dust.js micro-service whenever we can. If anything, we need to practice restraint and not fabricate excuse to rewrite features that only need touch-ups.

US_Beef_cuts_svg
Micro-services as BBQ. Mmmmm, BBQ…

An important corollary of this rule is to have a plan of action ahead of time. Before doing any work, have a ‘cut of beef’ map of the monolith with areas that naturally lend themselves to be rewritten as micro-services. When the time comes for a significant rework in one of them, you can just act along that map.

As is the norm these days, ‘there’s a pattern for that’, and as SoundCloud guys noticed, the cuts are along what is known as bounded context.

Center around APIs

As you can read at length on the API evangelist’s blog, we are transforming into an API economy, and APIs are becoming a central part of your system, rather than something you tack on after the fact. If you could get by with internal monolith services in the early days, micro-services will force you to accept APIs as the only way you communicate both inside your system and with the outside world. As SoundCloud developers realized, the days of integration around databases are over – APIs are the only contact points that tie the system together.

Rule 3: APIs should be the only way micro-services talk to each other and the outside world.

With monolithic systems, APIs are normally not used internally, so the first APIs to be created are outward facing – for third party developers and partners. A micro-service based system normally starts with inter-service APIs. These APIs are normally more powerful since they assume a level of trust that comes from sitting behind a  firewall. They can use proprietary authentication protocols, have no rate limiting and expose the entire functionality of the system. An important rule is that they should in no way be second-class compared to what you would expose to the external users:

Rule 4: Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.

Once you have the internal APIs designed this way, deciding which subset to expose as public API stops becoming a technical decision. Your external APIs look like internal with the exception of stricter visibility rules (who can see what), rate limiting (with the possibility of a rate-unlimited paid tier), and authentication mechanism that may differ from what is used internally.

Rule 5: Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.

SoundClound developers went the other way (public API first) and realized that they cannot build their entire system with the limitations in place for the public APIs, and had to resort to more powerful internal APIs. The delicate balance between making public APIs useful without giving out the farm is a decision every business need to make in the API economy. Micro-services simply encourage you to start from internal and work towards public.

Messaging

If there was a section in SoundCloud blog post that made me jump with joy was a section where they discussed how they arrived at using RabbitMQ for messaging between micro-services, considering how I write about that in every second post for the last three months. In their own words:

Soon enough, we realized that there was a big problem with this model; as our microservices needed to react to user activity. The push-notifications system, for example, needed to know whenever a track had received a new comment so that it could inform the artist about it. At our scale, polling was not an option. We needed to create a better model.

 

We were already using AMQP in general and RabbitMQ in specific — In a Rails application you often need a way to dispatch slow jobs to a worker process to avoid hogging the concurrency-weak Ruby interpreter. Sebastian Ohm and Tomás Senart presented the details of how we use AMQP, but over several iterations we developed a model called Semantic Events, where changes in the domain objects result in a message being dispatched to a broker and consumed by whichever microservice finds the message interesting.

I don’t need to say much about this – read my REST/MQTT mirroring post that describes the details of what SoundCloud guys call ‘changes in the domain objects result in a message’. I would like to indulge in a feeling that ‘great minds think alike’, but more modestly (and realistically), it is just common sense and RabbitMQ is a nice, fully featured and reliable open source polyglot broker. No shocking coincidence – it is seen in many installations of this kind. Let’s make a rule about it:

Rule 6: Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

All together now

Let’s pull all the rules together. As we speak, teams around the world are suffering under the weight of large unwieldy monolithic applications that are ill-fit for the cloud deployment. They are intrigued by micro-services but afraid to take the plunge. These rules will make the process more manageable and allow you to arrive at a better system that is easier to grow, deploy many times a day, and more reactive to events, load, failure and users:

  1. Every new feature added to the system will from now on be written as a micro-service.
  2. Every existing feature that requires significant rework will be removed and rewritten as a micro-service.
  3. APIs should be the only way micro-services talk to each other and the outside world.
  4. Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.
  5. Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.
  6. Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

This is a great time to build micro-service based systems, and collective wisdom on the best practices is converging as more systems are coming online. I will address the topic of APIs in more detail in one of the future posts. Stay tuned, and keep reading my mind!

© Dejan Glozic, 2014

Micro-Service Fun – Node.js + Messaging + Clustering Combo

monty-1920-1200-wallpaper
Micro-Services in Silly Walk, Monty Python

A:    I told you, I’m not allowed to argue unless you’ve paid.
M:   I just paid!
A:   No you didn’t.
M:   I DID!
A:   No you didn’t.
M:  Look, I don’t want to argue about that.
A:  Well, you didn’t pay.
M:  Aha. If I didn’t pay, why are you arguing? I Got you!
A:   No you haven’t.
M:  Yes I have. If you’re arguing, I must have paid.
A:   Not necessarily. I could be arguing in my spare time.

 

Monty Python, ‘The Argument Clinic’

And now for something completely different. I decided to get off the soap box I kept climbing recently and give you some useful code for a change. I cannot stray too much from the topic of micro-services (because that is our reality now), or Node.js (because they is the technology with which we implement that reality). But I can zero in on a very practical problem we unearthed this week.

First, a refresher. When creating a complex distributed system using micro-services, one of the key problems to solve is to provide for inter-service communication. Micro-services normally provide REST APIs to get the baseline state, but the system is in constant flux and it does not take long until the state you received from the REST API is out of date. Of course, you can keep refreshing your copy by polling but this is not scalable and adds a lot of stress to the system, because a lot of these repeated requests do not result in new state (I can imagine an annoyed version of Scarlett Johansson in ‘Her’, replying with ‘No, you still have no new messages. Stop asking.’).

A better pattern is to reverse the flow of information and let the service that owns the data tell you when there are changes. This is where messaging comes in – modern micro-service based systems are hard to imagine without a message broker of some kind.

Another important topic of a modern system is scalability. In a world where you may need to quickly ramp up your ability to handle the demand (lucky you!), an ability to vertically or horizontally scale your micro-services on a moments notice is crucial.

Vertical and Horizontal Scalability

An important Node.js quirk explainer: people migrating from Java may not understand the ‘vertical scalability’ part of the equation. Due to the auto-magical handling of thread pools in a JEE container, increasing the real or virtual machine specs requires no effort on the software side. For example, if you add more CPU cores to your VM, JEE container will spread out to take advantage of them. If you add more memory, you may need to tweak the JVM settings but otherwise it will just work. Of course, at some point you will need to resort to multiple VMs (horizontal scalability), at which point you may discover that your JEE app is actually not written with clustering in mind. Bummer.

In the Node.js land, adding more cores will help you squat unless you make a very concrete effort to fork more processes. In practice, this is not hard to do with utilities such as PM2 – it may be as easy as running the following command:

pm2 start app.js -i max

Notice, however, that for Node.js, vertical and horizontal scalability is the same regarding the way you write your code. You need to cluster just to take advantage of all the CPU cores on the same machine, never mind load balancing multiple separate servers or VMs.

I actually LOVE this characteristic of Node.js – it forces you to think about clustering from the get go, discouraging you from holding onto data between requests, forcing you to store any state in a shared DB where it can be accessed by all the running instances. This makes the switch from vertical to horizontal scalability a non-event for you, which is a good thing to discover when you need to scale out in a hurry. Nothing new here, just basic share-nothing goodness (see 12factors for a good explainer).

However, there is one important difference between launching multiple Node.js processes using PM2 or Node ‘cluster’ module, and load-balancing multiple Node servers using something like Nginx. With load balancing using a proxy, we have a standalone server binding to a port on a machine, and balancing and URL proxying is done at the same time. You will write something like this:

http {
    upstream myapp1 {
        server srv1.example.com;
        server srv2.example.com;
        server srv3.example.com;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myapp1;
        }
    }
}

If you try to launch multiple Node servers on a single machine like this, all except the first one will fail because they cannot bind to the same port. However, if you use Node’s ‘cluster’ module (or use PM2 with uses the same module under the covers), a little bit of white magic happens – the master process has a bit of code that enables socket sharing between the workers using a policy (either OS-defined, or ’round-robin’ as of Node 0.12). This is very similar to what Nginx does to your Node instances running on separate servers, with a few more load balancing options (round-robin, least connected, IP-hash, weight-directed).

Now Add Messaging to the Mix

So far, so good. Now, as it is usual in life, the fun starts when you combine the two concepts I talked about (messaging and clustering) together.

To make things more concrete, let’s take a real world example (one we had to solve ourselves this week). We were writing an activity stream micro-service. Its job is to collect activities expressed using Activity Stream 2.0 draft spec, and store them in Cloudant DB, so that they can later be retrieved as an activity stream. This service does one thing, and does it well – it aggregates activities that can originate anywhere in the system – any micro-service can fire an activity by publishing into a dedicated MQTT topic.

On the surface, it sounds clear cut – we will use a well behaved mqtt module as a MQTT client, RabbitMQ for our polyglot message broker, and Node.js for our activity micro-service. This is not the first time we are using this kind of a system.

However, things become murky when clustering is added to the mix. This is what happens: MQTT is a pub/sub protocol. In order to allow each subscriber to read the messages from the queue at its own pace, RabbitMQ implements the protocol by hooking up separate queues for each Node instance in the cluster.

mqtt-cluster2

This is not what we want. Each instance will receive a ‘new activity’ message, and attempt to write it to the DB, requiring contention avoidance. Even if the DB can prevent all but one node to succeed in writing the activity record, this is wasteful because all the nodes are attempting the same task.

The problem here is that ‘white magic’ used for the clustering module to handle http/https server requests does not extend to the mqtt module.

Our initial thoughts around solving this problem were like this: if we move the message client to the master instance, it can react to incoming messages and pass them on to the forked workers in some kind of ’round- robin’ fashion. It seemed plausible, but had a moderate ‘ick’ factor because implementing our own load balancing seemed like fixing a solved problem. In addition, it would prevent us from using PM2 (because we had to be in control of forking the workers), and if we used multiple VMs and Nginx load balancing, we would be back to square one.

Fortunately, we realized that RabbitMQ can already handle this if we partially give up the pretense and acknowledge we are running AMQP under the MQTT abstraction. The way RabbitMQ works for pub/sub topologies is that publishers post to ‘topic’ exchanges, that are bound to queues using routing keys (in fact, there is direct mapping between AMQP routing keys and MQTT topics – it is trivial to map back and forth).

The problem we were having by using MQTT client on the consumer side was that each cluster instance received its own queue. By dropping to an AMQP client and making all the instances bind to the same queue, we let RabbitMQ essentially load balance the clients using ’round-robin’ policy. In fact, this way of working is listed in RabbitMQ documentation as work queue, which is exactly what we want.

amqp-cluster

OK, Show Us Some Code Now

Just for variety, I will publish MQTT messages to the topic using Eclipse PAHO Java client. Publishing using clients in Node.js or Ruby will be almost identical, modulo syntax differences:

	public static final String TOPIC = "activities";
	MqttClient client;

	try {
		client = new MqttClient("tcp://"+host+":1883",
                                          "mqtt-client1");
		client.connect();
		MqttMessage message = new MqttMessage(messageText.getBytes());
		client.publish(TOPIC, message);
	    System.out.println(" [x] Sent to MQTT topic '"
                                  +TOPIC+"': "+ message + "'");
		client.disconnect();
	} catch (MqttException e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}

The client above will publish to the ‘activities’ topic. What we now need to do on the receiving end is set up a single queue and bind it to the default AMQP topic exchange (“amq.topic”) using the matching routing key (again, ‘activities’). The name of the queue does not matter as long as all the Node workers are using it (and they will by the virtue of being clones of each other).

var amqp = require('amqp');

var connection = amqp.createConnection({ host: 'localhost' });

// Wait for connection to become established.
connection.on('ready', function () {
  // Use the default 'amq.topic' exchange
  connection.queue('worker-queue', { durable: true}, function (q) {
      // Route key identical to the MQTT topic
      q.bind('activities');

      // Receive messages
      q.subscribe(function (message, headers, deliveryInfo, messageObject) {
        // Print messages to stdout
        console.log('Node AMQP('+process.pid+'): received topic &quot;'+
        		deliveryInfo.routingKey+
        		'", message: "'+
        		message.data.toString()+'"');
      });
  });
});

Implementation detail: RabbitMQ MQTT plug-in uses the default topic exchange “amq.topic”. If we set up a different exchange for the MQTT traffic, we will need to explicitly name the exchange when binding the queue to it.

Any Downsides?

There are scenarios in which it is actually beneficial for all the workers to receive an identical message. When the workers are managing Socket.io chat rooms with clients, a message may affect multiple clients, so all the workers need to receive it and decide if it applies to them. A single queue topology used here is limited to cases where workers are used strictly for scalability and any single worker can do the job.

From the more philosophical point of view, by resorting to AMQP we have broken through the abstraction layer and removed the option to swap in another MQTT broker in the future. We looked around and noticed that other people had to do the same in this situation. MQTT is a pub/sub protocol and many pub/sub protocols have a ‘unicast’ feature which would deliver the message to only one of the subscribers using some kind of a selection policy (that would work splendidly in our case). Unfortunately, there is no ‘unicast’ in MQTT right now.

Nevertheless, by retaining the ability to handle front end and devices publishing in MQTT we preserved the nice features of MQTT protocol (simple, light weight, can run on the smallest of devices). We continue to express the messaging side of our APIs using MQTT. At the same time, we were able to tap into the ‘unicast’ behavior by using the RabbitMQ features.

That seems like a good compromise to me. Here is hoping that unicast will become part of the MQTT protocol at some point in the future. Then we can go back to pretending we are running a ‘mystery MQTT broker’ rather than RabbitMQ for our messaging needs.

Nudge, nudge. Wink wink. Say no more.

© Dejan Glozic, 2014

The Queue Is the Message

Messenger_Boy_Game_01

The title of this post is a paraphrase of the famous Marshal McLuhan’s ‘The medium is the message‘, meant to imply that the medium that carries the message also embeds itself into the message, creating a symbiotic relationship with it. Of course, as I write this, I half-expect a ghost of Mr. Marshal to appear and say that I know nothing of his work and that the fact that I am allowed to write a blog on anything is amazing.

Message queues belong to a class of enterprise middleware that I managed to ignore for a long time. This is not the first time I am writing about holes in my understanding of enterprise architecture. In the post on databases, I similarly explained how one can go through life without ever writing a single SQL statement and still manage to persist data. Message queues are even worse. It is reasonable to expect the need to persist data, but the need to mediate between systems used to be the purview of system integrators, not application developers.

Don’t get me wrong, the company I work for had a commercial MQ product for years so I heard plenty about it in passing, and it seemed to be a big deal when connecting big box A to an even bigger box B. In contrast, developers of desktop applications have the luxury of  passing events between parts of the application in-process (just add a listener and you are done). For monolithic Web applications, situation is not very different. It is no wonder Stack Overflow is full of puzzled developers asking why they would need a message queue and what good it will bring to their projects.

In the previously mentioned post on databases, I echoed the thought of Martin Fowler and Pramod Sadalage that databases (and by extension, DBAs) are losing the role of the system integrators. In the olden days, applications accessed data by executing SQL statements, making database schema the de facto API, and database design a very big deal that required careful planning. Today, REST services are APIs, and storage is relegated to the service implementation detail.

In the modern architecture, particularly in the cloud, there is a very strong movement away from monolithic applications to a federation of smaller collaborating apps. These apps are free to store the data as they see fit, as long as they expose it through the API contract. The corollary is the data fragmentation – the totality of the system’s data is scattered across a number of databases hooked up to the service apps.

It is true that at any point, we can get the current state of the data by performing an API call on these services. However, once we know the current state and render the system for the user, what happens when there is a change? Modern systems have a lot of moving parts. Some of the changes are brought about by the apps themselves, some of them come from users interacting with the system through the browser or the mobile clients. Without a message broker circulating messages between the federated apps, they will become more and more out of sync until the next full API call. Of course, apps can poll for data in an attempt to stay in sync, but such a topology would look very complex and would not scale, particularly for ‘popular’ apps whose data is ‘sought after’ (typically common data that provides the glue for the system, such as ‘users’, ‘projects’, ‘tasks’ etc.).

Message queues come in many shapes and sizes, and can organize the flow of messages in different ways, depending on the intended use (RabbitMQ Getting Started document offers a fairly useful breakdown of these flows). Of course, if you are as new to message queues as I am, you may suffer a case of tl;dr here, so I will cut straight to the topology that is going to help us here: publish/subscribe. In the post on Web Sockets and Socket.io, I covered the part of the system that pushes messages from the Node.js app to JavaScript running in the browser. Message queues will help us push messages between apps running on the server, leaving Socket.io to handle ‘the last mile’. In the cloud, we will of course set up message queue as a service (MQaaS 🙂 and pass its coordinates to apps as an ‘attached resource’ expressing a backing service, to use 12-factors lingo here.

The publish/subscribe pattern is very attractive for us because it cuts down on unnecessary linkages and network traffic between apps. Instead of apps annoying each other with frequent ‘are we there yet’ REST calls, they can sit idle until we ARE there, at which point a message is published to all the interested (subscribed) parties. Note that messages themselves normally do not carry a lot of data – a REST call may still be needed (it may say ‘user ‘John Doe’ added’, but the apps may still need to make a REST call to the ‘users’ app to fetch ‘John Doe’ resource and do something useful with it).

Another important benefit is the asynchronous nature of the coupling between publishers and subscribers. The only thing publishers care about is firing a message – they don’t care what happens next. Message brokers are responsible for delivering the message to each and every subscriber. At any point in time, a subscriber can be inaccessible (busy or down). Even if they are up, there can be periods of mismatch between the publishers’ ability to provide and subscribers’ ability to consume messages. Message brokers will hold onto the messages until such time when the subscriber will actually be able to consume them, acting as a relief valve of sorts. How reliable the brokers are in this endeavour depend on something called ‘Quality of Service’. Transient messages can be lost, but important messages must be delivered ‘at least once’, or with an even stronger guarantee of ‘exactly once’ (albeit with a performance penalty). This may sound boring now but will matter to you once your career depends on all the messages being accounted for.

Finally, a very important advantage of using message queues in your project is feature growth. What starts as a simple app can easily grow into a monster under a barrage of new features. Adam Bloom from Pivotal wrote a very nice blog post on scaling an Instagram-like app without crushing it with its own weight. He used an example of a number of things such an app would want to do on an image upload: resize the image, notify friends, add points to the user, tweet the image etc. You can add these as functions in the main app, growing it very quickly and making development teams step on each others’ toes. Or you can insert a message broker, make the image app add the image and fire the ‘image added’ message to the subscribers. Then you can create ‘resizer app’, ‘notifier app’, ‘points app’, ‘tweeter app’ and make each of them subscribe to the ‘image’ topic in the message broker. In the future you can add a new feature by adding another app and subscribing to the same topic. Incidentally, the Groupon team has decided to do something similar when they moved from a monolithic RoR app to a collection of smaller Node.js apps.

All right, you say, you convinced me, I will give message queues a try. At this point the enthusiasm fizzles because navigating the message queue choices is far from trivial. In fact, there are two decisions to be made: which message broker and which protocol.

And here we are looping right to the beginning to Marshal McLuhan (and you thought I forgot to bring that tangent back). For message queues, we can say that to an extent the broker IS the protocol. Choosing a protocol (the way our apps will interact with the broker) is affecting your choice of the broker itself. There are several protocols and many brokers to choose from, and this is not an article to help you do that. However, for me the real decision flow was around the two important requirements: will the broker scale (instead of becoming the system’s bottleneck), and can I extend the reach of the broker to the mobile devices. An extra requirement (a JavaScript client I can use in Node.js) was a given, considering most of our apps will be written using Node.

The mobile connectivity requirement was easy to satisfy – all roads pointed to MQTT as the protocol to use when talking to devices with limited resources. Your broker must be able to speak MQTT in order to push messages to mobile devices. Facebook among others is using the libmosquiotto client in their native iOS app as well as the Messenger app. There is a range of ways to use MQTT in Android. And if you are interesting in the Internet of Things, it is an easy choice.

All right, now the brokers. How about picking something Open Source, with an attractive license with no strings attached, and with the ability to cluster itself to handle a barrage of messages? And something that is easy to install as a service? I haven’t done extensive research here, but we need to start somewhere and get some experience, so RabbitMQ seems like a good choice for now. It supports multiple protocols (AMQP, MQTT, STOMP), is Open Source, has clients in many languages, and has the built-in clustering support. In fact, if publish/subscribe is the only pattern you need, readers are advised to steer clear from AMQP protocol (native to RabbitMQ) because there is a version schism right now. The version of the protocol that everybody supports (0.91) is not what was put forward as an official v1.0 standard (a more significant change than the version numbers would indicate, and which few brokers or clients actually support). It should not even matter – RabbitMQ should be commended for its flexibility and the ‘polyglot messaging’ approach, so as long as we are using clients that speak correct MQTT, we could swap the broker in the future and nothing should break. Technically, an Open Source Mosquitto broker could work too, but it seems much more modest and not exactly Web-scale.

Notice how I mentioned ‘topics’ couple of paragraphs above. In ‘publish/subscribe’ world, topics are very important because they segregate message flow. Publishers send messages addressed to topics, and subscribers, well, subscribe to them. MQTT has a set of rules of how topics can be organized, with hierarchy for subtopics, and wildcards for subscribing to a number of subtopics. It is hard to overstate this: structuring topic namespaces is one of the most important tasks for your integration architecture. Don’t approach it lightly, because topics will be your API as much as your REST services are.

Note that pub/sub organized around topics is an MQTT simplification of a more complex area. RabbitMQ supports a number of ways messages are routed called ‘exchanges’, and topic-based exchange is just one of the available types (others are ‘direct’, ‘fanout’ and ‘headers’). Sticking with topics makes things simultaneously easier and more flexible from the point of future integrations.

As for the payload of messages flowing through the broker, the answer is easy – no reason to deviate from JSON as the de facto exchange format of the Internet. In fact, I will be even more specific: if you ever intend to turn the events flowing between your apps into an activity stream, you may as well use the Activity Stream JSON format for your message body. Our experience is that activities can easily be converted into events by cherry-picking the data you need. The opposite is not necessarily true: if you want to make your system social, you will be wise to plan ahead and pass enough information around to be able to create a tweet, or a Facebook update from it.

OK, so we made some choices: our medium will be RabbitMQ, and our message will be expressed using MQTT protocol (but in a pinch, an AMQP v0.91 client can participate in the system without problems). With Node.js and Java clients both readily available, we will be able to pass messages around in a system composed of Node.js and Java apps. In the next ‘down and dirty’ post, I will modify our example app from the last week to run ‘fake’ builds in a Java app, pass MQTT messages to the Node.js app which will in turn push the data to the browser using Socket.io.

That’s a whole lot of messaging. Our ‘Messenger Boy’ from the picture above will get very tired.

© Dejan Glozic, 2014