The title of this post is a paraphrase of the famous Marshal McLuhan’s ‘The medium is the message‘, meant to imply that the medium that carries the message also embeds itself into the message, creating a symbiotic relationship with it. Of course, as I write this, I half-expect a ghost of Mr. Marshal to appear and say that I know nothing of his work and that the fact that I am allowed to write a blog on anything is amazing.
Message queues belong to a class of enterprise middleware that I managed to ignore for a long time. This is not the first time I am writing about holes in my understanding of enterprise architecture. In the post on databases, I similarly explained how one can go through life without ever writing a single SQL statement and still manage to persist data. Message queues are even worse. It is reasonable to expect the need to persist data, but the need to mediate between systems used to be the purview of system integrators, not application developers.
Don’t get me wrong, the company I work for had a commercial MQ product for years so I heard plenty about it in passing, and it seemed to be a big deal when connecting big box A to an even bigger box B. In contrast, developers of desktop applications have the luxury of passing events between parts of the application in-process (just add a listener and you are done). For monolithic Web applications, situation is not very different. It is no wonder Stack Overflow is full of puzzled developers asking why they would need a message queue and what good it will bring to their projects.
In the previously mentioned post on databases, I echoed the thought of Martin Fowler and Pramod Sadalage that databases (and by extension, DBAs) are losing the role of the system integrators. In the olden days, applications accessed data by executing SQL statements, making database schema the de facto API, and database design a very big deal that required careful planning. Today, REST services are APIs, and storage is relegated to the service implementation detail.
In the modern architecture, particularly in the cloud, there is a very strong movement away from monolithic applications to a federation of smaller collaborating apps. These apps are free to store the data as they see fit, as long as they expose it through the API contract. The corollary is the data fragmentation – the totality of the system’s data is scattered across a number of databases hooked up to the service apps.
It is true that at any point, we can get the current state of the data by performing an API call on these services. However, once we know the current state and render the system for the user, what happens when there is a change? Modern systems have a lot of moving parts. Some of the changes are brought about by the apps themselves, some of them come from users interacting with the system through the browser or the mobile clients. Without a message broker circulating messages between the federated apps, they will become more and more out of sync until the next full API call. Of course, apps can poll for data in an attempt to stay in sync, but such a topology would look very complex and would not scale, particularly for ‘popular’ apps whose data is ‘sought after’ (typically common data that provides the glue for the system, such as ‘users’, ‘projects’, ‘tasks’ etc.).
The publish/subscribe pattern is very attractive for us because it cuts down on unnecessary linkages and network traffic between apps. Instead of apps annoying each other with frequent ‘are we there yet’ REST calls, they can sit idle until we ARE there, at which point a message is published to all the interested (subscribed) parties. Note that messages themselves normally do not carry a lot of data – a REST call may still be needed (it may say ‘user ‘John Doe’ added’, but the apps may still need to make a REST call to the ‘users’ app to fetch ‘John Doe’ resource and do something useful with it).
Another important benefit is the asynchronous nature of the coupling between publishers and subscribers. The only thing publishers care about is firing a message – they don’t care what happens next. Message brokers are responsible for delivering the message to each and every subscriber. At any point in time, a subscriber can be inaccessible (busy or down). Even if they are up, there can be periods of mismatch between the publishers’ ability to provide and subscribers’ ability to consume messages. Message brokers will hold onto the messages until such time when the subscriber will actually be able to consume them, acting as a relief valve of sorts. How reliable the brokers are in this endeavour depend on something called ‘Quality of Service’. Transient messages can be lost, but important messages must be delivered ‘at least once’, or with an even stronger guarantee of ‘exactly once’ (albeit with a performance penalty). This may sound boring now but will matter to you once your career depends on all the messages being accounted for.
Finally, a very important advantage of using message queues in your project is feature growth. What starts as a simple app can easily grow into a monster under a barrage of new features. Adam Bloom from Pivotal wrote a very nice blog post on scaling an Instagram-like app without crushing it with its own weight. He used an example of a number of things such an app would want to do on an image upload: resize the image, notify friends, add points to the user, tweet the image etc. You can add these as functions in the main app, growing it very quickly and making development teams step on each others’ toes. Or you can insert a message broker, make the image app add the image and fire the ‘image added’ message to the subscribers. Then you can create ‘resizer app’, ‘notifier app’, ‘points app’, ‘tweeter app’ and make each of them subscribe to the ‘image’ topic in the message broker. In the future you can add a new feature by adding another app and subscribing to the same topic. Incidentally, the Groupon team has decided to do something similar when they moved from a monolithic RoR app to a collection of smaller Node.js apps.
All right, you say, you convinced me, I will give message queues a try. At this point the enthusiasm fizzles because navigating the message queue choices is far from trivial. In fact, there are two decisions to be made: which message broker and which protocol.
The mobile connectivity requirement was easy to satisfy – all roads pointed to MQTT as the protocol to use when talking to devices with limited resources. Your broker must be able to speak MQTT in order to push messages to mobile devices. Facebook among others is using the libmosquiotto client in their native iOS app as well as the Messenger app. There is a range of ways to use MQTT in Android. And if you are interesting in the Internet of Things, it is an easy choice.
All right, now the brokers. How about picking something Open Source, with an attractive license with no strings attached, and with the ability to cluster itself to handle a barrage of messages? And something that is easy to install as a service? I haven’t done extensive research here, but we need to start somewhere and get some experience, so RabbitMQ seems like a good choice for now. It supports multiple protocols (AMQP, MQTT, STOMP), is Open Source, has clients in many languages, and has the built-in clustering support. In fact, if publish/subscribe is the only pattern you need, readers are advised to steer clear from AMQP protocol (native to RabbitMQ) because there is a version schism right now. The version of the protocol that everybody supports (0.91) is not what was put forward as an official v1.0 standard (a more significant change than the version numbers would indicate, and which few brokers or clients actually support). It should not even matter – RabbitMQ should be commended for its flexibility and the ‘polyglot messaging’ approach, so as long as we are using clients that speak correct MQTT, we could swap the broker in the future and nothing should break. Technically, an Open Source Mosquitto broker could work too, but it seems much more modest and not exactly Web-scale.
Notice how I mentioned ‘topics’ couple of paragraphs above. In ‘publish/subscribe’ world, topics are very important because they segregate message flow. Publishers send messages addressed to topics, and subscribers, well, subscribe to them. MQTT has a set of rules of how topics can be organized, with hierarchy for subtopics, and wildcards for subscribing to a number of subtopics. It is hard to overstate this: structuring topic namespaces is one of the most important tasks for your integration architecture. Don’t approach it lightly, because topics will be your API as much as your REST services are.
Note that pub/sub organized around topics is an MQTT simplification of a more complex area. RabbitMQ supports a number of ways messages are routed called ‘exchanges’, and topic-based exchange is just one of the available types (others are ‘direct’, ‘fanout’ and ‘headers’). Sticking with topics makes things simultaneously easier and more flexible from the point of future integrations.
As for the payload of messages flowing through the broker, the answer is easy – no reason to deviate from JSON as the de facto exchange format of the Internet. In fact, I will be even more specific: if you ever intend to turn the events flowing between your apps into an activity stream, you may as well use the Activity Stream JSON format for your message body. Our experience is that activities can easily be converted into events by cherry-picking the data you need. The opposite is not necessarily true: if you want to make your system social, you will be wise to plan ahead and pass enough information around to be able to create a tweet, or a Facebook update from it.
OK, so we made some choices: our medium will be RabbitMQ, and our message will be expressed using MQTT protocol (but in a pinch, an AMQP v0.91 client can participate in the system without problems). With Node.js and Java clients both readily available, we will be able to pass messages around in a system composed of Node.js and Java apps. In the next ‘down and dirty’ post, I will modify our example app from the last week to run ‘fake’ builds in a Java app, pass MQTT messages to the Node.js app which will in turn push the data to the browser using Socket.io.
That’s a whole lot of messaging. Our ‘Messenger Boy’ from the picture above will get very tired.
© Dejan Glozic, 2014