The Web of Hooks

hooks

It’s a new year, which means it is time to update the copyright at the bottom. I have been busy lately and I didn’t want to write fluffy blogs just to fill the space. But this time I actually have real content, so here it goes.

If you followed my blog in the past, you know I was very bullish on message brokers in the context of microservice architecture. I claimed that REST APIs alone are not sufficient to sustain a successful microservice system. Event collaboration pattern is necessary to ensure a scalable and robust system where microservices that own resource lifecycle don’t need to be burdened with executing all the code driven by that lifecycle. You can read more about it in my article about REST/MQ mirroring.

Not so fast

There is a fly in this particular ointment, however. While HTTP REST is well understood, messaging protocols are numerous. You may know from my writing that I liked MQTT for a while due to its simplicity and wide support. However, MQTT is actually not a great protocol for gluing your microservice system. It is designed to be lightweight and run on the smallest of IoT devices, and lacks some critical features such as anycast support and manual acknowledgement of messages.

Another popular protocol (AMQP) suffered a schism of sorts. The version supported in a popular RabbitMQ broker (0.91) is a very different protocol from an actual open standard AMQP 1.0 that is not as widely supported. This is a pity because I really like AMQP 1.0. It is like MQTT but with anycast and manual acknowledgment – perfect as a microservice system glue, yet still very simple to work from within clients.

And of course, there is Apache Kafka, a very powerful but odd choice given its origins in high-throughput distributed log aggregation. Kafka is unapologetically Java-centric, has a proprietary protocol and bolt-on REST APIs to connect client languages are still fairly low level. For example, where AMQP 1.0 guarantees quality of service and requires implementations to provide a buffer of messages that are re-delivered to a client that crashed, Kafka simply allows you to maintain a pointer in the queue but it is your job to work up the queue and catch up after restarting. You pay for performance by working at a level fairly low for general purpose application messaging.

Authentication pains

Choosing the messaging solution and protocol is only one part of the problem. If you want your system to be extensible and allow third-party integrations, you need to make it fairly easy for new clients to connect. On the other hand, you need to secure the clients because you don’t want a rogue app to eavesdrop on the events in the system without authorization (otherwise you have this egg on the bot face). Maintaining a large list of clients connected to the same topics can also be an scalability issue for some brokers.

For all these reasons, it has been widely acknowledged that message brokers are not a good match for external integrations. A cursory scan of popular cloud applications with large ecosystems all point at a more client-friendly alternative – WebHooks.

WebHooks to the rescue

You would think that if something is so popular and has a catchy name, there is an actual well written protocol for it. Wrong! WebHooks is the least common denominator you could imagine. In a nutshell, this is all there is:

  • You publish a list of valid events for which you will notify clients
  • You provide an API (and/or UI) for clients to register URLs for one or more of those events
  • When an event happens, you execute HTTP POST on the URLs registered for that event type.

That’s all. Granted, message brokers have topics you can publish and subscribe, and the actual messages they pass around are free-form, so this is not very different. But absent from WebHooks are things such as anycast, Quality of Service, manual acknowledgement etc.

I don’t intend to go into the details of what various implementations of WebHooks in apps such as GitHub, Slack etc. provide because thankfully Giuliano Iacobelli already wrote such an article. My interest here is to apply this knowledge to a microservice system we are building and try to anticipate pros and cons of going with WebHooks.

What it would take

First thing that comes to mind is that in order to support WebHooks, we would need to write a new WebHook service. Its role would be to accept registrations, and store URL and event type mappings for subsequent invocation. Right there, my first thought is about the difference between external and internal clients. External clients would most likely use the UI to register a URL of their integration. This is how you register your script in GitHub so that it runs on every commit, for example.

However, with internal clients we would have a funny problem: every time I restart a microservice instance, I would need to register somewhere in startup. That would make a POST endpoint a nonstarter, because I don’t want to keep creating new registrations. Instead, a PUT with a client ID would work better, where an existing registration for the same ID would just be updated if already there.

Other than that, the service would offer a POST for a new message into the provided event type that would be delivered to all registered URLs for that type. Obviously it would need to guard against 404s, 502s and URLs that take too long to return response, giving up on them after a set timeout.

The best of both worlds

The set timeout brings back the topic of the quality of service, implying that WebHooks are great for external integration but not that great for reliable glue of a microservice system. Why don’t we marry the two then? We could continue to use message broker for reliable delivery of internal messages, and hook it up to a WebHook service that would notify external integrations without the need to support our particular protocol, or get too much access into the sensitive innards. Hooking up a WebHook service to a message broker would have the added benefit of buffering the service itself so that it can be restarted and updated without interruption and missed events.

webhooks

In the diagram above, our microservice system has the normal architecture with a common routing proxy providing a single domain entry into the microservices. The microservices use normal message broker clients to publish to topics. A subset of these topics deemed suitable for external integrations is also listened to by the WebHook service, and for each of those messages it reaches into the stored list of registrations and calls HTTP POST on the registered URLs. If the WebHook service crashes, a reputable message broker will maintain a buffer of messages to re-deliver them upon restart. For performance reasons, WebHook service can choose to keep a subset of registrations in the in-memory cache depending on how frequently they are used.

Discussion

Obviously registering a URL with an HTTP PUT is much easier to implement, and providing a single POST endpoint to handle the event lowers the barrier of entry for external integrations. In fact, hooking up code to react to a single POST could very well be done using serverless architecture.

Are we losing something in the process? Inserting another service into the flow will add a bit of a delay but external notifications are normally for events that are not happening many times a second, so the tiny delay is more than acceptable tradeoff. In addition, if the client providing WebHook URL is itself load-balanced, this delivery will be hardcoded to anycast (the event notification will only hit one of the instances in the cluster).

Finally, this creates two classes of clients – ‘inner circle’ and external, segregated clients. Inner circle is hooked up directly to the message broker, while the external clients go through the service. In addition to this being an acceptable price to pay for easier integration, it is useful to be able to only expose a subset of events externally – some highly sensitive internal events may only be available to ‘trusted’ clients subscribing to message broker topics and having internal credentials.

Since the WebHook service will normally not keep retrying to deliver an event to an unresponsive URL, it is possible to miss an event. If this is a problem, external system would need to fashion a ‘belt and suspenders’ fortification, where event driven approach is augmented with a periodic REST API call to ‘compare notes’ and ensure the baseline it is working against is up to date.

© Dejan Glozic, 2017

The Queue Is the Message

Messenger_Boy_Game_01

The title of this post is a paraphrase of the famous Marshal McLuhan’s ‘The medium is the message‘, meant to imply that the medium that carries the message also embeds itself into the message, creating a symbiotic relationship with it. Of course, as I write this, I half-expect a ghost of Mr. Marshal to appear and say that I know nothing of his work and that the fact that I am allowed to write a blog on anything is amazing.

Message queues belong to a class of enterprise middleware that I managed to ignore for a long time. This is not the first time I am writing about holes in my understanding of enterprise architecture. In the post on databases, I similarly explained how one can go through life without ever writing a single SQL statement and still manage to persist data. Message queues are even worse. It is reasonable to expect the need to persist data, but the need to mediate between systems used to be the purview of system integrators, not application developers.

Don’t get me wrong, the company I work for had a commercial MQ product for years so I heard plenty about it in passing, and it seemed to be a big deal when connecting big box A to an even bigger box B. In contrast, developers of desktop applications have the luxury of  passing events between parts of the application in-process (just add a listener and you are done). For monolithic Web applications, situation is not very different. It is no wonder Stack Overflow is full of puzzled developers asking why they would need a message queue and what good it will bring to their projects.

In the previously mentioned post on databases, I echoed the thought of Martin Fowler and Pramod Sadalage that databases (and by extension, DBAs) are losing the role of the system integrators. In the olden days, applications accessed data by executing SQL statements, making database schema the de facto API, and database design a very big deal that required careful planning. Today, REST services are APIs, and storage is relegated to the service implementation detail.

In the modern architecture, particularly in the cloud, there is a very strong movement away from monolithic applications to a federation of smaller collaborating apps. These apps are free to store the data as they see fit, as long as they expose it through the API contract. The corollary is the data fragmentation – the totality of the system’s data is scattered across a number of databases hooked up to the service apps.

It is true that at any point, we can get the current state of the data by performing an API call on these services. However, once we know the current state and render the system for the user, what happens when there is a change? Modern systems have a lot of moving parts. Some of the changes are brought about by the apps themselves, some of them come from users interacting with the system through the browser or the mobile clients. Without a message broker circulating messages between the federated apps, they will become more and more out of sync until the next full API call. Of course, apps can poll for data in an attempt to stay in sync, but such a topology would look very complex and would not scale, particularly for ‘popular’ apps whose data is ‘sought after’ (typically common data that provides the glue for the system, such as ‘users’, ‘projects’, ‘tasks’ etc.).

Message queues come in many shapes and sizes, and can organize the flow of messages in different ways, depending on the intended use (RabbitMQ Getting Started document offers a fairly useful breakdown of these flows). Of course, if you are as new to message queues as I am, you may suffer a case of tl;dr here, so I will cut straight to the topology that is going to help us here: publish/subscribe. In the post on Web Sockets and Socket.io, I covered the part of the system that pushes messages from the Node.js app to JavaScript running in the browser. Message queues will help us push messages between apps running on the server, leaving Socket.io to handle ‘the last mile’. In the cloud, we will of course set up message queue as a service (MQaaS 🙂 and pass its coordinates to apps as an ‘attached resource’ expressing a backing service, to use 12-factors lingo here.

The publish/subscribe pattern is very attractive for us because it cuts down on unnecessary linkages and network traffic between apps. Instead of apps annoying each other with frequent ‘are we there yet’ REST calls, they can sit idle until we ARE there, at which point a message is published to all the interested (subscribed) parties. Note that messages themselves normally do not carry a lot of data – a REST call may still be needed (it may say ‘user ‘John Doe’ added’, but the apps may still need to make a REST call to the ‘users’ app to fetch ‘John Doe’ resource and do something useful with it).

Another important benefit is the asynchronous nature of the coupling between publishers and subscribers. The only thing publishers care about is firing a message – they don’t care what happens next. Message brokers are responsible for delivering the message to each and every subscriber. At any point in time, a subscriber can be inaccessible (busy or down). Even if they are up, there can be periods of mismatch between the publishers’ ability to provide and subscribers’ ability to consume messages. Message brokers will hold onto the messages until such time when the subscriber will actually be able to consume them, acting as a relief valve of sorts. How reliable the brokers are in this endeavour depend on something called ‘Quality of Service’. Transient messages can be lost, but important messages must be delivered ‘at least once’, or with an even stronger guarantee of ‘exactly once’ (albeit with a performance penalty). This may sound boring now but will matter to you once your career depends on all the messages being accounted for.

Finally, a very important advantage of using message queues in your project is feature growth. What starts as a simple app can easily grow into a monster under a barrage of new features. Adam Bloom from Pivotal wrote a very nice blog post on scaling an Instagram-like app without crushing it with its own weight. He used an example of a number of things such an app would want to do on an image upload: resize the image, notify friends, add points to the user, tweet the image etc. You can add these as functions in the main app, growing it very quickly and making development teams step on each others’ toes. Or you can insert a message broker, make the image app add the image and fire the ‘image added’ message to the subscribers. Then you can create ‘resizer app’, ‘notifier app’, ‘points app’, ‘tweeter app’ and make each of them subscribe to the ‘image’ topic in the message broker. In the future you can add a new feature by adding another app and subscribing to the same topic. Incidentally, the Groupon team has decided to do something similar when they moved from a monolithic RoR app to a collection of smaller Node.js apps.

All right, you say, you convinced me, I will give message queues a try. At this point the enthusiasm fizzles because navigating the message queue choices is far from trivial. In fact, there are two decisions to be made: which message broker and which protocol.

And here we are looping right to the beginning to Marshal McLuhan (and you thought I forgot to bring that tangent back). For message queues, we can say that to an extent the broker IS the protocol. Choosing a protocol (the way our apps will interact with the broker) is affecting your choice of the broker itself. There are several protocols and many brokers to choose from, and this is not an article to help you do that. However, for me the real decision flow was around the two important requirements: will the broker scale (instead of becoming the system’s bottleneck), and can I extend the reach of the broker to the mobile devices. An extra requirement (a JavaScript client I can use in Node.js) was a given, considering most of our apps will be written using Node.

The mobile connectivity requirement was easy to satisfy – all roads pointed to MQTT as the protocol to use when talking to devices with limited resources. Your broker must be able to speak MQTT in order to push messages to mobile devices. Facebook among others is using the libmosquiotto client in their native iOS app as well as the Messenger app. There is a range of ways to use MQTT in Android. And if you are interesting in the Internet of Things, it is an easy choice.

All right, now the brokers. How about picking something Open Source, with an attractive license with no strings attached, and with the ability to cluster itself to handle a barrage of messages? And something that is easy to install as a service? I haven’t done extensive research here, but we need to start somewhere and get some experience, so RabbitMQ seems like a good choice for now. It supports multiple protocols (AMQP, MQTT, STOMP), is Open Source, has clients in many languages, and has the built-in clustering support. In fact, if publish/subscribe is the only pattern you need, readers are advised to steer clear from AMQP protocol (native to RabbitMQ) because there is a version schism right now. The version of the protocol that everybody supports (0.91) is not what was put forward as an official v1.0 standard (a more significant change than the version numbers would indicate, and which few brokers or clients actually support). It should not even matter – RabbitMQ should be commended for its flexibility and the ‘polyglot messaging’ approach, so as long as we are using clients that speak correct MQTT, we could swap the broker in the future and nothing should break. Technically, an Open Source Mosquitto broker could work too, but it seems much more modest and not exactly Web-scale.

Notice how I mentioned ‘topics’ couple of paragraphs above. In ‘publish/subscribe’ world, topics are very important because they segregate message flow. Publishers send messages addressed to topics, and subscribers, well, subscribe to them. MQTT has a set of rules of how topics can be organized, with hierarchy for subtopics, and wildcards for subscribing to a number of subtopics. It is hard to overstate this: structuring topic namespaces is one of the most important tasks for your integration architecture. Don’t approach it lightly, because topics will be your API as much as your REST services are.

Note that pub/sub organized around topics is an MQTT simplification of a more complex area. RabbitMQ supports a number of ways messages are routed called ‘exchanges’, and topic-based exchange is just one of the available types (others are ‘direct’, ‘fanout’ and ‘headers’). Sticking with topics makes things simultaneously easier and more flexible from the point of future integrations.

As for the payload of messages flowing through the broker, the answer is easy – no reason to deviate from JSON as the de facto exchange format of the Internet. In fact, I will be even more specific: if you ever intend to turn the events flowing between your apps into an activity stream, you may as well use the Activity Stream JSON format for your message body. Our experience is that activities can easily be converted into events by cherry-picking the data you need. The opposite is not necessarily true: if you want to make your system social, you will be wise to plan ahead and pass enough information around to be able to create a tweet, or a Facebook update from it.

OK, so we made some choices: our medium will be RabbitMQ, and our message will be expressed using MQTT protocol (but in a pinch, an AMQP v0.91 client can participate in the system without problems). With Node.js and Java clients both readily available, we will be able to pass messages around in a system composed of Node.js and Java apps. In the next ‘down and dirty’ post, I will modify our example app from the last week to run ‘fake’ builds in a Java app, pass MQTT messages to the Node.js app which will in turn push the data to the browser using Socket.io.

That’s a whole lot of messaging. Our ‘Messenger Boy’ from the picture above will get very tired.

© Dejan Glozic, 2014