Pushy Node.js II: The Mullet Architecture

marvel-mullets

The post title unintentionally sounds like a movie franchise (e.g. Star Trek II: The Wrath of Khan), but in my defense I DID promise I will return to the topic of pushing events to the browser using Socket.io and Node.js in the post about message queues. The idea was to take the example I wrote about in Pushy Node.js and modify it so that the server messages are received from another service using a message queue.

I got the idea for the sequel title from an awesome tweet I saw the other day:

Alas, nothing can be hidden under the Google administration – a quick search yields an article from 2010 on mullet architecture (an actual one, with houses, not software). There is even such a thing as ‘reverse mullet’ – party in the front and business in the back. Frank Lloyd Wright aside, I did mean ‘software architecture’ in this particular post. I wanted to mimic the situation we face these days, where Node.js front end typically communicates with the legacy services in the back that nobody dares to touch. As a result, this post will mix Node.js, Socket.io, MQTT protocol and Java app (posing as the mullet’s backside). I am all for accuracy in advertising.

To refresh your memory, the example app in the previous article was creating a hypothetical system in which builds are running on the server, and as they are going through their execution, we use Web Sockets to drive the Bootstrap progress bar in the browser. To make the original sample palatable, we faked the build by simply using the timeouts. We will continue with the fakery, but remove one layer by moving the fake build into a dedicated Java app. This Java app (if you squint, you can imagine it is in fact a Jenkins server running builds) will react to the POST request to start or stop the build, but the build service will publish messages using the message broker. Our Node.js app will subscribe to the ‘builds’ topic and pass the information about the build progress to the client using Socket.io.

This all provides the ‘how’, but not the ‘why’. Recall that the power of message queues is in avoiding polling (our Node.js app does not need to constantly nag the Java app “are we there yet?”), and also in providing for flexible architecture. Imagine, for example, that in the future we want to capture trend data on build average times and success/failure rates. We could easily write another service that will subscribe to the ‘builds’ topic and store the data in a DB, and also provide a nice data visualization. I can easily see a nice Node.js app storing data into a NoSQL database like as MongoDB or CouchDB, and a nice page rendered using Dust.js template and D3.js for data visualization. Why, the app is practically writing itself!

The key point here is that you can easily add this feature tomorrow by simply adding another subscriber to the ‘builds’ message topic – you don’t need to modify any of the existing apps. This low friction way of growing the system by adding micro-services is very powerful, and has also been reinforced during the recent NodeDay in the presentation by Richard Roger, founder of NearForm.

All right then, we know why and we know how, so let’s get busy. First, we will install RabbitMQ as our message broker. I have chosen RabbitMQ because it is open source, supports multiple protocols, scales well and is available as a service in PaaSes such as CloudFoundry or Heroku. As for the protocol, I will use MQTT, a lightweight pub/sub protocol particularly popular for anything M2M. If you want to play with MQTT but don’t want to install RabbitMQ, a lightweight open source MQTT broker called Mosquitto is also available, although I don’t know how well it can hold up in production.

For our Java app, both AMQP and MQTT clients are readily available. RabbitMQ provides the AMQP client, while Eclipse project Paho has the MQTT client. For this example, I will use the latter. When you have time, read the article REST is for sleeping, MQTT is for mobile by J Spee.

Let’s look at the diagram of what the control flow in the original example app looked like:

pushy-node

A Node.js app handles the initial GET request from the browser by rendering the page with a button and a progress bar. Button click issues a jQuery Ajax POST request to the app, initiating a fake build. As the build is going through the paces, it uses Socket.io and Web Sockets to push the progress to the brower, moving the progress bar.

Now let’s take a look of what the new topology will look like:

pushy-node-mq

The flow starts as in the original example, but the Node.js app just proxies the build start/stop command to the Java app listening on a different port. The Java app will kick a fake build on a worker thread. As the build is running, it will periodically publish messages of its progress to the message broker on a predetermined topic using MQTT client. Node.js app registered as a subscriber to the same topic will receive the messages and pass them on the browser using Web Sockets.

In order to save space, I will not post the code snippets that didn’t change – check them out in the original article. The first change here is how we react to POST request from the browser in the Node.js app. Instead of kicking off or stopping the fake build, we will just proxy the call to the Java app:

exports.post = function(req, res) {
	var options = {
       hostname: 'localhost',
       port: 9080,
       path: '/builds/build',
       method: 'POST'
	};
	var rreq = http.request(options, function (rres) {
		console.log('Redirect status: '+rres.statusCode);
	});
	rreq.write(JSON.stringify(req.body));
	rreq.end();
}

On the Java end, we will write a JEE servlet to handle the POST request:

package fake.java.build.service;

import javax.servlet.ServletInputStream;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.wink.json4j.JSON;
import org.apache.wink.json4j.JSONException;
import org.apache.wink.json4j.JSONObject;
import org.eclipse.paho.client.mqttv3.MqttClient;
import org.eclipse.paho.client.mqttv3.MqttException;
import org.eclipse.paho.client.mqttv3.MqttMessage;
import org.eclipse.paho.client.mqttv3.MqttPersistenceException;

public class BuildService extends HttpServlet {
	private static final String TOPIC = "builds";
	private static FakeBuild build;

	class FakeBuild implements Runnable {
		boolean running;
		boolean errors;

		public void run() {
			MqttClient client;
			try {
				client = new MqttClient("tcp://localhost:1883", "pushynodeII");
				client.connect();
			} catch (MqttException e) {
				e.printStackTrace();
				return;
			}
			running = true;
			errors = false;

			for (int i = 0; i <= 100; i += 10) {
				if (i == 70)
					errors = true;
				if (i == 100)
					running = false;
				// notify state
				publishState(client, running, i, errors);
				if (running == false)
					break;
				try {
					Thread.sleep(1000);
				} catch (InterruptedException e) {
					e.printStackTrace();
				}
			}
			try {
				client.disconnect();
			} catch (MqttException e) {
				e.printStackTrace();
			}
		}

		private void publishState(MqttClient client, boolean running,
				int progress, boolean errors) {
			try {
				JSONObject body = new JSONObject();
				body.put("running", running);
				body.put("progress";, progress);
				body.put("errors", errors);
				MqttMessage message = new MqttMessage(body.toString()
						.getBytes());
				try {
					client.publish(TOPIC, message);
				} catch (MqttPersistenceException e) {
					e.printStackTrace();
				} catch (MqttException e) {
					e.printStackTrace();
				}

			} catch (JSONException e) {
				e.printStackTrace();
			}
		}
	}

	protected void doPost(HttpServletRequest req, HttpServletResponse res) {
		try {
			ServletInputStream bodyStream = req.getInputStream();

			JSONObject body = (JSONObject) JSON.parse(bodyStream);
			bodyStream.close();
			doToggleBuild("start";.equals(body.get("action")));
		} catch (Exception e) {
			e.printStackTrace();
			res.setStatus(500);
		}
	}

	private void doToggleBuild(boolean start) {
		if (start) {
			build = new FakeBuild();
			Thread worker = new Thread(build);
			worker.start();
		} else {
			if (build!=null)
				build.running = false;
		}
	}
}

In the code above, a ‘start’ command will kick the build on the worker thread, which will sleep for 1 second (as in the JavaScript case), and then send a MQTT message to update the subscribers on its status. In case of a ‘stop’ command, we will just turn the ‘running’ flag to false and let the build to naturally stop and exit, and also update the status. We will pass the build state in message body as JSON (duh).

Once the Java app is up and running and messages are flowing, it is time to add a MQTT message subscriber in Node.js app. To that end, I installed the mqtt module using NPM. All we need to do to start listening is create a client in app.js, subscribe to the topic and register a callback function to handle incoming messages:

var mqtt = require('mqtt')
   ,io = require('socket.io');

....

    var server = http.createServer(app);
    io = io.listen(server);

    var mqttClient = mqtt.createClient(1883, 'localhost');
    mqttClient.subscribe('builds');

    mqttClient.on('message', function (topic, message) {
       io.sockets.emit('build', JSON.parse(message));
    });

The code above will create a client for localhost and the default tcp port 1883. In a real world situation, you will most likely supply these as environment or config variables. The client listens on the topic and simply passes on the body of the message as a JavaScript object by passing the message to all the currently opened sockets (clients). One important meta-comment highlighting difference in approaches between Node and Java: because Node is asynhconous, I was able to create a client and bind it to a TCP port, then proceed to bind the app to another port for HTTP requests. In a JEE app, we would need to handle incoming messages in a separate thread, and possibly manage a thread pool if one thread cannot handle all incoming message traffic. As if you needed more reasons to switch to Node.js.

In terms of behavior, this code in action looks the same in the browser as the previous example, but we made an important step towards an architecture that will work well in todays’ mixed language architectures and have room for future growth. What I particularly like about it is that Java and Node.js apps can pass the messages in an implementation-agnostic way, using JSON. Note that the Java app in this example is not as far-fetched as it may seem: you can make the example even more real by using a Jenkins service. Jenkins has REST API that our Node app can call to start and stop build jobs, and once a job is underway, we can install an MQTT notifier plug-in to let us know about the status. The plug-in is simple and uses the same Paho client I used here, so if you don’t like what it does, it is trivial to roll your own.

The intent of this post was to demonstrate an architecture you will most likely work with these days, even if you are a total Node-head. Apart from legacy back-end services, even some newly written ones may need to be in some other language (Java, C++, Go). Nobody in his right mind would write a video transcoding app in JavaScript (if that is even possible). Using message brokers will help you arrive at an architecture that looks like you actually meant it that way, rather than fell backwards into it (what I call ‘accidental architecture’). Even if it is a mullet, wear it like you mean(t) it!

© Dejan Glozic, 2014

Advertisements

The Queue Is the Message

Messenger_Boy_Game_01

The title of this post is a paraphrase of the famous Marshal McLuhan’s ‘The medium is the message‘, meant to imply that the medium that carries the message also embeds itself into the message, creating a symbiotic relationship with it. Of course, as I write this, I half-expect a ghost of Mr. Marshal to appear and say that I know nothing of his work and that the fact that I am allowed to write a blog on anything is amazing.

Message queues belong to a class of enterprise middleware that I managed to ignore for a long time. This is not the first time I am writing about holes in my understanding of enterprise architecture. In the post on databases, I similarly explained how one can go through life without ever writing a single SQL statement and still manage to persist data. Message queues are even worse. It is reasonable to expect the need to persist data, but the need to mediate between systems used to be the purview of system integrators, not application developers.

Don’t get me wrong, the company I work for had a commercial MQ product for years so I heard plenty about it in passing, and it seemed to be a big deal when connecting big box A to an even bigger box B. In contrast, developers of desktop applications have the luxury of  passing events between parts of the application in-process (just add a listener and you are done). For monolithic Web applications, situation is not very different. It is no wonder Stack Overflow is full of puzzled developers asking why they would need a message queue and what good it will bring to their projects.

In the previously mentioned post on databases, I echoed the thought of Martin Fowler and Pramod Sadalage that databases (and by extension, DBAs) are losing the role of the system integrators. In the olden days, applications accessed data by executing SQL statements, making database schema the de facto API, and database design a very big deal that required careful planning. Today, REST services are APIs, and storage is relegated to the service implementation detail.

In the modern architecture, particularly in the cloud, there is a very strong movement away from monolithic applications to a federation of smaller collaborating apps. These apps are free to store the data as they see fit, as long as they expose it through the API contract. The corollary is the data fragmentation – the totality of the system’s data is scattered across a number of databases hooked up to the service apps.

It is true that at any point, we can get the current state of the data by performing an API call on these services. However, once we know the current state and render the system for the user, what happens when there is a change? Modern systems have a lot of moving parts. Some of the changes are brought about by the apps themselves, some of them come from users interacting with the system through the browser or the mobile clients. Without a message broker circulating messages between the federated apps, they will become more and more out of sync until the next full API call. Of course, apps can poll for data in an attempt to stay in sync, but such a topology would look very complex and would not scale, particularly for ‘popular’ apps whose data is ‘sought after’ (typically common data that provides the glue for the system, such as ‘users’, ‘projects’, ‘tasks’ etc.).

Message queues come in many shapes and sizes, and can organize the flow of messages in different ways, depending on the intended use (RabbitMQ Getting Started document offers a fairly useful breakdown of these flows). Of course, if you are as new to message queues as I am, you may suffer a case of tl;dr here, so I will cut straight to the topology that is going to help us here: publish/subscribe. In the post on Web Sockets and Socket.io, I covered the part of the system that pushes messages from the Node.js app to JavaScript running in the browser. Message queues will help us push messages between apps running on the server, leaving Socket.io to handle ‘the last mile’. In the cloud, we will of course set up message queue as a service (MQaaS 🙂 and pass its coordinates to apps as an ‘attached resource’ expressing a backing service, to use 12-factors lingo here.

The publish/subscribe pattern is very attractive for us because it cuts down on unnecessary linkages and network traffic between apps. Instead of apps annoying each other with frequent ‘are we there yet’ REST calls, they can sit idle until we ARE there, at which point a message is published to all the interested (subscribed) parties. Note that messages themselves normally do not carry a lot of data – a REST call may still be needed (it may say ‘user ‘John Doe’ added’, but the apps may still need to make a REST call to the ‘users’ app to fetch ‘John Doe’ resource and do something useful with it).

Another important benefit is the asynchronous nature of the coupling between publishers and subscribers. The only thing publishers care about is firing a message – they don’t care what happens next. Message brokers are responsible for delivering the message to each and every subscriber. At any point in time, a subscriber can be inaccessible (busy or down). Even if they are up, there can be periods of mismatch between the publishers’ ability to provide and subscribers’ ability to consume messages. Message brokers will hold onto the messages until such time when the subscriber will actually be able to consume them, acting as a relief valve of sorts. How reliable the brokers are in this endeavour depend on something called ‘Quality of Service’. Transient messages can be lost, but important messages must be delivered ‘at least once’, or with an even stronger guarantee of ‘exactly once’ (albeit with a performance penalty). This may sound boring now but will matter to you once your career depends on all the messages being accounted for.

Finally, a very important advantage of using message queues in your project is feature growth. What starts as a simple app can easily grow into a monster under a barrage of new features. Adam Bloom from Pivotal wrote a very nice blog post on scaling an Instagram-like app without crushing it with its own weight. He used an example of a number of things such an app would want to do on an image upload: resize the image, notify friends, add points to the user, tweet the image etc. You can add these as functions in the main app, growing it very quickly and making development teams step on each others’ toes. Or you can insert a message broker, make the image app add the image and fire the ‘image added’ message to the subscribers. Then you can create ‘resizer app’, ‘notifier app’, ‘points app’, ‘tweeter app’ and make each of them subscribe to the ‘image’ topic in the message broker. In the future you can add a new feature by adding another app and subscribing to the same topic. Incidentally, the Groupon team has decided to do something similar when they moved from a monolithic RoR app to a collection of smaller Node.js apps.

All right, you say, you convinced me, I will give message queues a try. At this point the enthusiasm fizzles because navigating the message queue choices is far from trivial. In fact, there are two decisions to be made: which message broker and which protocol.

And here we are looping right to the beginning to Marshal McLuhan (and you thought I forgot to bring that tangent back). For message queues, we can say that to an extent the broker IS the protocol. Choosing a protocol (the way our apps will interact with the broker) is affecting your choice of the broker itself. There are several protocols and many brokers to choose from, and this is not an article to help you do that. However, for me the real decision flow was around the two important requirements: will the broker scale (instead of becoming the system’s bottleneck), and can I extend the reach of the broker to the mobile devices. An extra requirement (a JavaScript client I can use in Node.js) was a given, considering most of our apps will be written using Node.

The mobile connectivity requirement was easy to satisfy – all roads pointed to MQTT as the protocol to use when talking to devices with limited resources. Your broker must be able to speak MQTT in order to push messages to mobile devices. Facebook among others is using the libmosquiotto client in their native iOS app as well as the Messenger app. There is a range of ways to use MQTT in Android. And if you are interesting in the Internet of Things, it is an easy choice.

All right, now the brokers. How about picking something Open Source, with an attractive license with no strings attached, and with the ability to cluster itself to handle a barrage of messages? And something that is easy to install as a service? I haven’t done extensive research here, but we need to start somewhere and get some experience, so RabbitMQ seems like a good choice for now. It supports multiple protocols (AMQP, MQTT, STOMP), is Open Source, has clients in many languages, and has the built-in clustering support. In fact, if publish/subscribe is the only pattern you need, readers are advised to steer clear from AMQP protocol (native to RabbitMQ) because there is a version schism right now. The version of the protocol that everybody supports (0.91) is not what was put forward as an official v1.0 standard (a more significant change than the version numbers would indicate, and which few brokers or clients actually support). It should not even matter – RabbitMQ should be commended for its flexibility and the ‘polyglot messaging’ approach, so as long as we are using clients that speak correct MQTT, we could swap the broker in the future and nothing should break. Technically, an Open Source Mosquitto broker could work too, but it seems much more modest and not exactly Web-scale.

Notice how I mentioned ‘topics’ couple of paragraphs above. In ‘publish/subscribe’ world, topics are very important because they segregate message flow. Publishers send messages addressed to topics, and subscribers, well, subscribe to them. MQTT has a set of rules of how topics can be organized, with hierarchy for subtopics, and wildcards for subscribing to a number of subtopics. It is hard to overstate this: structuring topic namespaces is one of the most important tasks for your integration architecture. Don’t approach it lightly, because topics will be your API as much as your REST services are.

Note that pub/sub organized around topics is an MQTT simplification of a more complex area. RabbitMQ supports a number of ways messages are routed called ‘exchanges’, and topic-based exchange is just one of the available types (others are ‘direct’, ‘fanout’ and ‘headers’). Sticking with topics makes things simultaneously easier and more flexible from the point of future integrations.

As for the payload of messages flowing through the broker, the answer is easy – no reason to deviate from JSON as the de facto exchange format of the Internet. In fact, I will be even more specific: if you ever intend to turn the events flowing between your apps into an activity stream, you may as well use the Activity Stream JSON format for your message body. Our experience is that activities can easily be converted into events by cherry-picking the data you need. The opposite is not necessarily true: if you want to make your system social, you will be wise to plan ahead and pass enough information around to be able to create a tweet, or a Facebook update from it.

OK, so we made some choices: our medium will be RabbitMQ, and our message will be expressed using MQTT protocol (but in a pinch, an AMQP v0.91 client can participate in the system without problems). With Node.js and Java clients both readily available, we will be able to pass messages around in a system composed of Node.js and Java apps. In the next ‘down and dirty’ post, I will modify our example app from the last week to run ‘fake’ builds in a Java app, pass MQTT messages to the Node.js app which will in turn push the data to the browser using Socket.io.

That’s a whole lot of messaging. Our ‘Messenger Boy’ from the picture above will get very tired.

© Dejan Glozic, 2014