The Queue Is the Message

Messenger_Boy_Game_01

The title of this post is a paraphrase of the famous Marshal McLuhan’s ‘The medium is the message‘, meant to imply that the medium that carries the message also embeds itself into the message, creating a symbiotic relationship with it. Of course, as I write this, I half-expect a ghost of Mr. Marshal to appear and say that I know nothing of his work and that the fact that I am allowed to write a blog on anything is amazing.

Message queues belong to a class of enterprise middleware that I managed to ignore for a long time. This is not the first time I am writing about holes in my understanding of enterprise architecture. In the post on databases, I similarly explained how one can go through life without ever writing a single SQL statement and still manage to persist data. Message queues are even worse. It is reasonable to expect the need to persist data, but the need to mediate between systems used to be the purview of system integrators, not application developers.

Don’t get me wrong, the company I work for had a commercial MQ product for years so I heard plenty about it in passing, and it seemed to be a big deal when connecting big box A to an even bigger box B. In contrast, developers of desktop applications have the luxury of  passing events between parts of the application in-process (just add a listener and you are done). For monolithic Web applications, situation is not very different. It is no wonder Stack Overflow is full of puzzled developers asking why they would need a message queue and what good it will bring to their projects.

In the previously mentioned post on databases, I echoed the thought of Martin Fowler and Pramod Sadalage that databases (and by extension, DBAs) are losing the role of the system integrators. In the olden days, applications accessed data by executing SQL statements, making database schema the de facto API, and database design a very big deal that required careful planning. Today, REST services are APIs, and storage is relegated to the service implementation detail.

In the modern architecture, particularly in the cloud, there is a very strong movement away from monolithic applications to a federation of smaller collaborating apps. These apps are free to store the data as they see fit, as long as they expose it through the API contract. The corollary is the data fragmentation – the totality of the system’s data is scattered across a number of databases hooked up to the service apps.

It is true that at any point, we can get the current state of the data by performing an API call on these services. However, once we know the current state and render the system for the user, what happens when there is a change? Modern systems have a lot of moving parts. Some of the changes are brought about by the apps themselves, some of them come from users interacting with the system through the browser or the mobile clients. Without a message broker circulating messages between the federated apps, they will become more and more out of sync until the next full API call. Of course, apps can poll for data in an attempt to stay in sync, but such a topology would look very complex and would not scale, particularly for ‘popular’ apps whose data is ‘sought after’ (typically common data that provides the glue for the system, such as ‘users’, ‘projects’, ‘tasks’ etc.).

Message queues come in many shapes and sizes, and can organize the flow of messages in different ways, depending on the intended use (RabbitMQ Getting Started document offers a fairly useful breakdown of these flows). Of course, if you are as new to message queues as I am, you may suffer a case of tl;dr here, so I will cut straight to the topology that is going to help us here: publish/subscribe. In the post on Web Sockets and Socket.io, I covered the part of the system that pushes messages from the Node.js app to JavaScript running in the browser. Message queues will help us push messages between apps running on the server, leaving Socket.io to handle ‘the last mile’. In the cloud, we will of course set up message queue as a service (MQaaS 🙂 and pass its coordinates to apps as an ‘attached resource’ expressing a backing service, to use 12-factors lingo here.

The publish/subscribe pattern is very attractive for us because it cuts down on unnecessary linkages and network traffic between apps. Instead of apps annoying each other with frequent ‘are we there yet’ REST calls, they can sit idle until we ARE there, at which point a message is published to all the interested (subscribed) parties. Note that messages themselves normally do not carry a lot of data – a REST call may still be needed (it may say ‘user ‘John Doe’ added’, but the apps may still need to make a REST call to the ‘users’ app to fetch ‘John Doe’ resource and do something useful with it).

Another important benefit is the asynchronous nature of the coupling between publishers and subscribers. The only thing publishers care about is firing a message – they don’t care what happens next. Message brokers are responsible for delivering the message to each and every subscriber. At any point in time, a subscriber can be inaccessible (busy or down). Even if they are up, there can be periods of mismatch between the publishers’ ability to provide and subscribers’ ability to consume messages. Message brokers will hold onto the messages until such time when the subscriber will actually be able to consume them, acting as a relief valve of sorts. How reliable the brokers are in this endeavour depend on something called ‘Quality of Service’. Transient messages can be lost, but important messages must be delivered ‘at least once’, or with an even stronger guarantee of ‘exactly once’ (albeit with a performance penalty). This may sound boring now but will matter to you once your career depends on all the messages being accounted for.

Finally, a very important advantage of using message queues in your project is feature growth. What starts as a simple app can easily grow into a monster under a barrage of new features. Adam Bloom from Pivotal wrote a very nice blog post on scaling an Instagram-like app without crushing it with its own weight. He used an example of a number of things such an app would want to do on an image upload: resize the image, notify friends, add points to the user, tweet the image etc. You can add these as functions in the main app, growing it very quickly and making development teams step on each others’ toes. Or you can insert a message broker, make the image app add the image and fire the ‘image added’ message to the subscribers. Then you can create ‘resizer app’, ‘notifier app’, ‘points app’, ‘tweeter app’ and make each of them subscribe to the ‘image’ topic in the message broker. In the future you can add a new feature by adding another app and subscribing to the same topic. Incidentally, the Groupon team has decided to do something similar when they moved from a monolithic RoR app to a collection of smaller Node.js apps.

All right, you say, you convinced me, I will give message queues a try. At this point the enthusiasm fizzles because navigating the message queue choices is far from trivial. In fact, there are two decisions to be made: which message broker and which protocol.

And here we are looping right to the beginning to Marshal McLuhan (and you thought I forgot to bring that tangent back). For message queues, we can say that to an extent the broker IS the protocol. Choosing a protocol (the way our apps will interact with the broker) is affecting your choice of the broker itself. There are several protocols and many brokers to choose from, and this is not an article to help you do that. However, for me the real decision flow was around the two important requirements: will the broker scale (instead of becoming the system’s bottleneck), and can I extend the reach of the broker to the mobile devices. An extra requirement (a JavaScript client I can use in Node.js) was a given, considering most of our apps will be written using Node.

The mobile connectivity requirement was easy to satisfy – all roads pointed to MQTT as the protocol to use when talking to devices with limited resources. Your broker must be able to speak MQTT in order to push messages to mobile devices. Facebook among others is using the libmosquiotto client in their native iOS app as well as the Messenger app. There is a range of ways to use MQTT in Android. And if you are interesting in the Internet of Things, it is an easy choice.

All right, now the brokers. How about picking something Open Source, with an attractive license with no strings attached, and with the ability to cluster itself to handle a barrage of messages? And something that is easy to install as a service? I haven’t done extensive research here, but we need to start somewhere and get some experience, so RabbitMQ seems like a good choice for now. It supports multiple protocols (AMQP, MQTT, STOMP), is Open Source, has clients in many languages, and has the built-in clustering support. In fact, if publish/subscribe is the only pattern you need, readers are advised to steer clear from AMQP protocol (native to RabbitMQ) because there is a version schism right now. The version of the protocol that everybody supports (0.91) is not what was put forward as an official v1.0 standard (a more significant change than the version numbers would indicate, and which few brokers or clients actually support). It should not even matter – RabbitMQ should be commended for its flexibility and the ‘polyglot messaging’ approach, so as long as we are using clients that speak correct MQTT, we could swap the broker in the future and nothing should break. Technically, an Open Source Mosquitto broker could work too, but it seems much more modest and not exactly Web-scale.

Notice how I mentioned ‘topics’ couple of paragraphs above. In ‘publish/subscribe’ world, topics are very important because they segregate message flow. Publishers send messages addressed to topics, and subscribers, well, subscribe to them. MQTT has a set of rules of how topics can be organized, with hierarchy for subtopics, and wildcards for subscribing to a number of subtopics. It is hard to overstate this: structuring topic namespaces is one of the most important tasks for your integration architecture. Don’t approach it lightly, because topics will be your API as much as your REST services are.

Note that pub/sub organized around topics is an MQTT simplification of a more complex area. RabbitMQ supports a number of ways messages are routed called ‘exchanges’, and topic-based exchange is just one of the available types (others are ‘direct’, ‘fanout’ and ‘headers’). Sticking with topics makes things simultaneously easier and more flexible from the point of future integrations.

As for the payload of messages flowing through the broker, the answer is easy – no reason to deviate from JSON as the de facto exchange format of the Internet. In fact, I will be even more specific: if you ever intend to turn the events flowing between your apps into an activity stream, you may as well use the Activity Stream JSON format for your message body. Our experience is that activities can easily be converted into events by cherry-picking the data you need. The opposite is not necessarily true: if you want to make your system social, you will be wise to plan ahead and pass enough information around to be able to create a tweet, or a Facebook update from it.

OK, so we made some choices: our medium will be RabbitMQ, and our message will be expressed using MQTT protocol (but in a pinch, an AMQP v0.91 client can participate in the system without problems). With Node.js and Java clients both readily available, we will be able to pass messages around in a system composed of Node.js and Java apps. In the next ‘down and dirty’ post, I will modify our example app from the last week to run ‘fake’ builds in a Java app, pass MQTT messages to the Node.js app which will in turn push the data to the browser using Socket.io.

That’s a whole lot of messaging. Our ‘Messenger Boy’ from the picture above will get very tired.

© Dejan Glozic, 2014

Pushy Node.js

Push-fact_Michael_N_Erickson_2011

Last week I hoped to blog-shame Guillermo Rauch into releasing Socket.io v1.0 for my own convenience. Alas, it didn’t work (gotta work on my SEO), but I see a lot of traffic from @rauchg on the corresponding GitHub project, so my spirits are high. Meanwhile, I realized that for my own dabbling, v0.91 is pretty darn good on its own. Good enough, in fact, to socketize our example Node.js app in time for this post.

Why would I need Socket.io in the first place? Because of mobile phones. In a straightforward web server implementation, requests are always originating from the client. Client pulls, server obliges and sends markup and other resources back, and then returns to listening on the port, awaiting further requests (Node.js server does the exact same thing – this is not just ‘old tech’). With Ajax, the nature of requests is different but not the direction. To add liveliness, XHR calls are made to fetch data and update portions of the page without the full refresh, but those XHR calls again originate in the client. Even when it looks as if the server is pushing, it is all a sham – a technique called ‘long polling’ where a connection is kept alive and open and the server never closes it, choosing instead to push data nuggets to the client via the connection originally initiated by the client. Finally, with Web Sockets it is possible to have a true server push, where server initiates the connection when data is good and ready. Enough of the client acting as bored kids on a family trip (“are we there yet? are we there yet?”).

OK, but why mobile phones? Because they conditioned us to expect push notifications. We are so used for our phones telling us when there is something new to observe, that we are almost offended when we need to explicitly refresh an app to get new content. It is no surprise that developers now want that kind of lively experience on the desktops as well. It is also a prerequisite for a mobile Web app hoping to convince customers that it is just as good as a native counterpart.

I decided to add a new page to the example app I keep evolving since the first post on Node.js – too lazy to create a new app. It is also a good way to go beyond ‘Hello, World’ because examples on Socket.io home page are all in app.js. Since I am using express.js and have a few pages with their corresponding controllers and views, I decided to move most of the socket action to a dedicated page. This is a realistic real-world scenario – not all of your pages will have a need for server push. This is all assuming you are not writing a Single Page App (SPA between friends), at which point all bets are off.

Socket.io home page definitely fits this approach, which means that you get the endorphin kick from getting the code to work, but you immediately need to make changes for the real world app. In my case, I was using Require.js, and the page where the client code needs to go is namespaced for jQuery. Here is what I needed to do in the shared Dust.js partial:

   requirejs.config({
      shim: {
         'socketio': {
             exports: 'io'
         }
      },
      paths: {
         socketio: '../socket.io/socket.io'
      },
      baseUrl: "/js"
   });

Now we are ready to write some server push code. I have decided to create a mockup of something we are currently working on in the context of JazzHub – a build happening somewhere on the server that our page is watching. The page is simple – we want a button to start the build, a progress bar to watch it working, and a failure in the build at some point along the way just to mix it up.

We will start by using NPM to fetch socket.io module and hooking it up in app.js. Socket.io is designed to coexist peacefully with express.js, and to piggy-back on the server that it is starting:

var express = require('express')
, routes = require('./routes')
, dust = require('dustjs-linkedin')
, helpers = require('dustjs-helpers')
, cons = require('consolidate')
, user = require('./routes/user')
, simple = require('./routes/simple')
, widgets = require('./routes/widgets')
, http = require('http')
, sockets = require('./routes/sockets')
, io = require('socket.io')
, path = require('path');

We have required another controller for the new page (‘./routes/sockets’) as well as the library itself (‘socket.io’). We can how hook it up to the server:

var server = http.createServer(app);
sockets.io = io.listen(server);

In the last line we have passed the Socket.io root object to the new page’s controller so that we can access it there.

The new page needs a view, and we will again use Dust.js template and Bootstrap for our button and progress bar:

{>layout/}
{<head}
   <script src="/socket.io/socket.io.js"></script>
{/head}
{<content}
	<h2>Web Sockets</h2>
	<p>
		This page demonstrate the use of Sockets.io to push data from the Node.js server.
	</p>
	<p><button type="button" class="btn btn-primary" id="playButton" data-state="start">
		<span class="glyphicon glyphicon-play" id="playButtonIcon"></span></button>
	</p>
	<div class="progress" style="width: 50%">
       <div id="progress" class="progress-bar" role="progressbar" aria-valuenow="100" aria-valuemin="0" aria-valuemax="100" style="width: 100%;">
          <span class="sr-only">100% Complete</span>
       </div>
    </div>

	<p>This page is served by server {pid}</p>
{/content}
{<script}
	<script src="/js/sockets/sockets-page.js"></script>
{/script}

Here is something that took me a while to figure out, and judging by the Stack Overflow questions, it is puzzling to many a developer. If you take a look how we are referencing the client side portion of Socket.io library, it makes no sense:

<script src="/socket.io/socket.io.js"></script>

All we did was install Socket.io using NPM, and the library contains client side portion as well, but we didn’t put it in ‘/public’ where our styles and other static client-side files are. Nevertheless, our server was finding and serving this file to the client somehow. It wasn’t until I looked at the server side console that I noticed this line in the sea of Socket.io debug chatter:

debug: served static content /socket.io.js

Apparently, Socket.io is not only handling requests from its client side code, it is assisting Express in finding and serving the client side code to begin with. A bit magical for my taste but OK.

You may have noticed that I don’t have JavaScript inlined in the Dust template for the page. I did this for cleanliness, but also because curly braces in JavaScript code need to be escaped in Dust.js (because curly braces are special characters), making JavaScript exceedingly ugly. Mental note to talk to Dust.js guys about finding a better way to handle inlined JavaScript.

The content of the referenced file ‘sockets-page.js’ is here:

require(["jquery", "socketio"], function($, io) {
    // connect to socket
    var socket = io.connect('http://localhost'); // this needs to change in the real code
    socket.on('build', function (build) {
    	if (build.progress==0)
           _resetProgress();
    	else {
           $("#progress").attr("aria-valuenow", ""+build.progress)
           .css("width", build.progress+"%");
   	   if (build.errors) {
              $("#progress").removeClass("progress-bar-success")
              .addClass("progress-bar-danger");
           }
        }
    	var state = (build.running)?"stop":"start";
    	if ($("#playButton").data("state")!=state) {
           if (state=="stop") {
              $("#playButtonIcon").removeClass("glyphicon-play")
             .addClass("glyphicon-stop");
           } else {
       	      $("#playButtonIcon").removeClass("glyphicon-stop")
             .addClass("glyphicon-play");
           }
           $("#playButton").data("state", state);
       }
    });

    // bind event listeners
    $("#playButton").on("click", _handleButtonClick);

    // private function
    function _handleButtonClick(evt) {
       var state = $("#playButton").data("state");
       $.post("sockets", { action: state });
    }

    function _resetProgress() {
       $("#progress").removeClass("progress-bar-danger")
       .attr("aria-valuenow", "0")
       .css("width", "0%")
       .removeClass("progress-bar-danger")
       .addClass("progress-bar-success");
    }
});

The code above does the following: it registers a listener for the dual-purpose button we placed on the page. It’s initial function is to start the build. Once the build is in progress, a subsequent click will stop it (we change the icon glyph to reflect this). We handle the button click by POST-ing to the same controller that handles the GET request that renders the page, and passing the action in the request body (it is exceedingly easy to do this in jQuery, and equally easy to access it on the other end in Express).

In order to handle both GET and POST, we will register our controller in app.js thusly:

	app.get('/sockets', sockets.get);
	app.post("/sockets", sockets.post);

If you recall, we shimmed Socket.io so that we can use it with Require.js. In the controller code above we are requiring both jQuery and socket.js. The moment this code runs, it will establish a handshake with socket.io code on the server, and once it does, messages can start flowing in both directions. We will define one custom message ‘build’ and pass the JavaScript object containing build status (running/not running), percentage done (0-100) and whether there are errors. This information will in turn affect how we render Bootstrap button and progress bar.

Meanwhile on the server, the router for the page contains the other half of the code. We will create fake build activity. In the real world, this information will have arrived from another app where the build is actually running. In fact, it is common that some kind of message broker is used for app to app messaging on the server (a topic for a future post). For now, we will fake the build by making it last 10 seconds, with progress sent to the client every second:

exports.get = function(req, res) {
  res.render('sockets', { title: 'Web Sockets', active: 'sockets', pid: process.pid });
};

var build = {
   running: false,
   progress: 0,
   errors: false
};

var _lastTimeout;

exports.post = function(req, res) {
   var action = req.body.action;

   if (action==="stop") {
	   // stop the build.
	   build.running = false;
	   if (_lastTimeout)
		   clearTimeout(_lastTimeout);
	   _pushEvent("build");
   }
   else if (action==="start") {
       // reset the build, start from 0
       build.running = true;
       build.errors=false;
       build.progress = 0;
       _pushEvent("build");
       _lastTimeout = setTimeout(_buildWork, 1000);
   }
};

function _buildWork() {
	build.progress += 10;
	if (build.progress==70)
		build.errors=true;
	if (build.progress < 100) {
		_pushEvent("build");
		_lastTimeout = setTimeout(_buildWork, 1000);
	}
	else {
		build.running = false;
		_pushEvent("build");
	}
}

function _pushEvent(event) {
	exports.io.sockets.emit(event, build);
}

The code above should be fairly easy to read – we are faking the build by setting timeout of 1000ms (our 10% ‘ticks’). We move the ‘build.progress’ property and ’emit’ a message to all the active sockets (if you recall, we are using the ‘io’ object we attached in app.js). Any number of clients looking at this page will see the build in progress and will be able to start it and stop it.

When we start the server and navigate to the newly added ‘Sockets’ page, we can see progress bar and the button, as expected. Pressing the button starts the build and page is updated as the build progresses, turning from green to red at the 70% mark, as expected:

web-sockets-picture

You can observe the the whole dance in action in this animated GIF.

Time for the post-demo discussion. Readers following this blog may remember my concerns about Node.js that kept me on the fence for a while. Node.js and a JavaScript templating library such as Dust.js offer a very fast cycle of experimentation and exploration that Bill Scott from PayPal among others has found instrumental for the process of Lean UX. However, it is hard to make such a tectonic shift for that sole reason. For me, adding server push to the mix is what tipped the scales in Node.js’ favor. It is hard to match the efficiency and scale possible this way, and alternative technologies that consume a process or a thread per request will find a very hard time trying to match the number of simultaneous connections possible with Node. Not to mention how easy and enjoyable the whole coding experience is, if you care about the state of mind of your developers.

Of course, this is not a ground-breaking revelation – the fact that Node.js is particularly suitable for DIRT-y apps was the major driving force for its explosive growth. Nevertheless, I will repeat it here in case you missed all the other mentions. If you are a JEE developer considering moving from servlets and JSPs to Node, a combination of Node.js, express.js and one of the JavaScript-based templating libraries will make for a fairly painless transition. Still, you will find yourself with a nagging feeling that the new stack is not so much better as different, particularly since you will not immediately feel an improvement in scalability as you are testing your new code in isolation. Only when you start adding server push code will you find yourself in a truly new territory and will be able to justify the expense and the effort.

Now I feel bad for halfheartedly ranting against Guillermo Rauch for not shipping Socket.io v1.0 fast enough for my liking. This experiment convinced me that if you don’t do push, you will not get Node.

© Dejan Glozic, 2014

Socket.io and the Business of Open Source

Gerard van Honthorst: The Matchmaker (1625)
Gerard van Honthorst: The Matchmaker (1625)

In one of my previous posts on the topic of risk, I mentioned studies that show that investor tolerance for the stock market risk is much higher when the market is rising than when it is falling like a knife. It turned out that the risk is felt as something abstract until you actually start losing real money, at which point it becomes painfully real and risk appetite-suppressing. I was reminded of this phenomenon as I was researching material for this blog post, and unintentionally encountered the risk of depending on Open Source projects.

The post was supposed to be another one in a series about our journey into full-on Node.js development. I wanted to demonstrate one of the key selling points of Node – the ability to handle a lot of I/O in a non-blocking fashion, allowing the Node server to push simultaneous messages to a large number of clients using one of the push technologies. As with everything Node, there are multiple libraries for this task but it is hard to miss that Socket.io has been immensely popular among the choices (here is a nice summary of the server push choices for Node.js).

Allright then, let’s install socket.io via NPM and start coding. Not so fast – which version? And herein starts a tale full of twists and suspense. If you go to Socket.io Web site, it is all about version 0.9. Aha, but if you google Socket.io, you will find presentations of the type “Why Socket.io 1.0”, “What is new in Socket.io 1.0”, “How Socket.io 1.0 Will Be Unbelievably Awesome”, all around mid 2012 or so. Then in 2013 – silence. Everybody assumed it was a matter of days until 1.0 appeared, yet we are still waiting.

It looks like version 1.0 has been a long time coming, which is fine by itself – I have noticed that Node community favors features and quality over schedules (all these comments equally apply to Node version 0.12). However, there is precious little about the upcoming version at the project’s GitHub home page. This has caused a lot of discussion in the project itself, as well as a panicky Google Groups thread Is Socket.io Dying?

Well, it appears that rumors of Socket.io’s death have been greatly exaggerated – Mr. @rauchg himself appeared to assuage palpitating followers and assure them that all test suites for v1.0 are passing and the good news are coming very soon. This was in December 2013. We are in February 2014 and still waiting. The lower level Engine.io module is supposed to power Socket.io, and apparently that’s where most of the action is these days.

The frustrating wait for a long anticipated product is something I have experienced recently with Apple Logic Pro X. Apple’s replacement for the Logic Pro 9 was several years in coming, and the rumor mill was crazy. People were latching to every whisper on the InterWeb, every Mac Rumor they can find, publicly renounced Apple, threatened to switch to Cubase or Pro Tools. The parallels are almost uncanny: uncertainty about the next version, tight-lipped product owners leaving a lot of room for speculation, FUD, rumors of near death, reassurances by the product owners that they are ‘hard at work on a new version’.

In the end the new version did arrive, it was unbelievably awesome and everybody was too busy playing with it and loving Apple again to remember how crazy they were only a short while ago. Most likely this will be Socket.io story – @rauchg and the team will be forgiven as soon as NPM starts serving v1.0 modules.

What did we learn from this, if anything? Google Groups comments contained reminders that “Socket.io is Open Source, after all”, and that “we should be grateful for free software” and “they don’t owe you anything, man” etc. All true, but I take an issue with “free software”. Open Source software is not very different from paid software – there is always a reason why somebody writes it, and the fact that I am not asked to pay for the software upfront does not mean that a business model does not exist and that a transaction is not happening. It is just not as obvious as with payware.

Open Source is free, but it is not a hobby for most people. In the old days, releasing code into Open Source was a way to undercut a competitor by commoditizing a function or a product. More recently, contributing to Open Source became a way to grow your personal brand and be highly employable in the social economy. One of these days somebody will make the equivalent of the Klout number that measures your footprint in Open Source projects (you can call it ‘Codex’ but if you do, I want some stock options if you build a startup around this idea). Therefore, while Guillermo Rauch wrote Socket.io ‘for free’, social advertisers can calculate to a dollar the value of the tagline “Creator of socket.io” in his Twitter profile. Otherwise, how on Earth can Snapchat that makes zero revenue be valuated at $3B by Facebook, and then turn down that offer? Not seeing the monetization path right away does not mean it does not exist.

Apart from individual contributors and their public brand, whole companies build up their street cred by contributing to Open Source. In a recent article by ReadWrite (thanks for passing along, @cra!), companies such as Google, Twitter and LinkedIn were open about it. In fact, LinkedIn went as far as to claim that ‘Open Source is part of their recruiting strategy’ because their engineering blog and the code that is discussed there is essentially a replacement for the HR lady (unless you consider Veena Basavaraj a member of LinkedIn HR – it definitely seemed so when I researched their fork of Dust.js: I felt the urge to work for LinkedIn, and I wasn’t even looking).

In a sense, GitHub is becoming the software industry equivalent of that matchmaker lady in the painting above, matching companies on a lookout for talent and developers putting their best lines of code forward. Your ‘free’ GitHub project may speak about you more than your LinkedIn resume. For some developers, Open Source GitHub projects ARE their resumes, cutting the glib ‘team player’ and ‘fast learner’ claptrap of traditional resumes that nobody reads and going straight to the code, where the tire meets the road. It is all great that you work well in response to challenge, but why don’t you fix those 150 issues that are gathering dust in your project? Will you be similarly neglectful if we hire you?

In that light, I don’t think it is too much to ask to get the Socket.io v1.0 out the door already, particularly after so much PR about it. Every time we tweet about it, use it, write articles and blog posts, talk to our friends about it, we build up the Socket.io team’s reputation they can take all the way to the bank, so I think the transaction is fair and we are even. Along that line (and to tie back to my reference to risk tolerance, in this case risk of building production code on top of Open Source software), I don’t think we can just wave our hand and say ‘it comes with the territory of using software you didn’t pay for’. As I tried to prove here, I DID pay for it and continue to pay, just in a more convoluted and delayed transaction.

Less pontificating, more coding. Next week I will go back from this tangent to write about pushing events from a Node.js server using whichever Socket.io version is available at that time. Maybe this blog post will code-shame them into releasing v1.0.

Meanwhile, and talking about the risk of depending on Open Source code, not sure what to do with node-optimist, beside recommending against using modules with pirates on their home pages:

© Dejan Glozic, 2014

In The Beginning, Node Created os.cpus().length Workers

Schnorr_von_Carolsfeld_Bibel_in_Bildern_1860_006

As reported in the previous two posts, we are boldly going where many have already gone before – into the brave new world of Node.js development. Node has this wonderful aura that makes you feel unique even though the fact that you can find answers to your Node.js questions in the first two pages of Google search results should tell you something. This reminds me of Stuff White People Like, where the question ‘Why do white people love Apple products so much” is answered thusly:

Apple products tell the world you are creative and unique. They are an exclusive product line only used by every white college student, designer, writer, English teacher, and hipster on the planet.

And so it is with Node.js, where I fear that if I hesitate too much, somebody else will write an app in Node.js and totally deploy it to production, beating me to it.

Jesting aside, Node.js definitely has the ‘new shoes’ feel compared to stuff that was around much longer. Now that we graduated from ‘Hello, World’ and want to do some serious work in it, there is a list of best practices we need to quickly absorb. The topic of this post is fitting the square peg of Node.js single-threaded nature into the multi-core hole.

For many skeptics, the fact that Node.js is single-threaded is a non-starter. Any crusty Java server can seamlessly spread to all the cores on the machine, making Node look primitive in comparison. In reality, this is not so clear-cut – it all depends on how I/O intensive your code is. If it is mostly I/O bound, those extra cores will not help you that much, while non-blocking nature of Node.js will be a great advantage. However, if your code needs to do a little bit of sequential work (and even a mostly I/O bound code has a bit of blocking work here and there), you will definitely benefit from doubling up. There is also that pesky problem of uncaught exceptions –  they can terminate your server process. How do you address these problems? As always, it depends.

A simple use case (and one that multi-threading server proponents like to use as a Node.js anti-pattern) is a Node app installed on a multi-core machine (i.e. all machines these days). In a naive implementation, the app will use only one core for all the incoming requests, under-utilizing the hardware. In theory, a Java or a RoR server app will scale better by spreading over all the CPUs.

Of course, this being year 5 of Node.js existence, using all the cores is entirely possible using the core ‘cluster’ module (pun not intended). Starting from the example from two posts ago, all we need to do is bring in the ‘cluster’ module and fork the workers:

var cluster = require('cluster');

if (cluster.isMaster) {
   var numCPUs = require('os').cpus().length;
   //Fork the workers, one per CPU
   for (var i=0; i< numCPUs; i++) {
      cluster.fork();
   }
   cluster.on('exit', function(deadWorker, code, signal) {
      // The worker is dead. He's a stiff, bereft of life,
      // he rests in peace.
      // He's pushing up the daisies. He expired and went
      // to meet his maker.
      // He's bleedin' demised. This worker is no more.
      // He's an ex-worker.

      // Oh, look, a shiny new one.
      // Norwegian Blue - beautiful plumage.
      var worker = cluster.fork();

      var newPID = worker.process.pid;
      var oldPID = deadWorker.process.pid;

      console.log('worker '+oldPID+' died.');
      console.log('worker '+newPID+' born.');
   });
}
else {
   //The normal part of our app.

The code above does two things – it forks the workers (one per CPU core), and it replaces a dead worker with a spiffy new one. This illustrates an ethos of disposability as described in 12factors. An app that can quickly be started and stopped, can also be replaced if it crashes without a hitch. Of course, you can analyze logs and try to figure out why a worker crashed, but you can do it on our own time, while the app continues to handle requests.

It can help to modify the server creation loop by printing out the process ID (‘process’ is a global variable implicitly defined – no need to require a module for it):

http.createServer(app).listen(app.get('port'), function() {
   console.log('Express server '+process.pid+
                ' listening on port ' + app.get('port'));
});

The sweet thing is that even though we are using multiple processes, they are all bound to the same port (3000 in this case). This is done by the virtue of the master process being the only one actually bound to that port, and a bit of white Node magic.

We can now modify our controller to pass in the PID to simple page and render it using Dust:

exports.simple = function(req, res) {
  res.render('simple', { title: 'Simple',
                         active: 'simple',
                         pid: process.pid });
};

This line in simple.dust file will render the process ID on the page:


This page is served by server pid={pid}.

When I try this code on my quad-core ThinkPad laptop running Windows 7, I get 8 workers:

Express server 7668 listening on port 3000
Express server 8428 listening on port 3000
Express server 8580 listening on port 3000
Express server 9764 listening on port 3000
Express server 7284 listening on port 3000
Express server 5412 listening on port 3000
Express server 6304 listening on port 3000
Express server 8316 listening on port 3000

If you reload the browser fast enough when rendering the page, you can see different process IDs reported on the page.

This sounds easy enough, as most things in Node.js do. But as usual, real life is a tad messier. After testing the clustering on various machines and platforms, the Node.js team noticed that some machines tend to favor only a couple of workers from the entire pool. It is a sad fact of life that for college assignments, couple of nerds end up doing all the work while the slackers party. But few of us want to tolerate such behavior when it comes to responding to our Web traffic.

As a result, starting from the upcoming Node version 0.12, workers will be assigned in a ’round-robin’ fashion. This policy will be the default on most machines (although you can defeat it by adding this line before creating workers):

    // Set this before calling other cluster functions.
    cluster.schedulingPolicy = cluster.SCHED_NONE;

You can read more about it in this StrongLoop blog post.

An interesting twist to clustering is when you deploy this app to the Cloud, using IaaS such as SoftLayer, Amazon EC2 or anything based on VMware. Since you can provision VMs with a desired number of virtual cores, you have two dimensions to scale your Node application:

  1. You can ramp up the number of virtual cores allocated for your app. Your code as described above will stretch to create more workers and take advantage of this increase, but all the child processes will still be using shared RAM and virtual file system. If a rogue worker fills up the file system writing logs like mad, it will spoil the party for all. This approach is good if you have some CPU bottlenecks in your app.
  2. You can add more VMs, fully isolating your app instances. This approach will give you more RAM and disk space. For JVM-based apps, this would definitely matter because JVMs are RAM-intensive. However, Node apps are much more frugal when it comes to resources, so you may not need as many full VMs for Node.

Between the two approaches, ramping up cores is definitely the cheaper option, and should be attempted first – it may be all you need. Of course, if you deploy your app to a PaaS like CloudFoundry or Heroku, all bets are off. It is possible that the code I have listed above is not even needed if you intend to host your app on a PaaS, because the platform will provide this behaviour out of the box. However, in some configurations this code will still be useful.

Example: Heroku gives you a single CPU dyno (virtualized unit of server power) with 512MB RAM for free. If you stay on one instance but pick a 2-core dyno with 1GB RAM (I know, still peanuts), that will cost you $34.50 at the time of writing (don’t quote me on the numbers, check them directly at the Heroku pricing page). Using two single core dynos will cost you the same. Between the two, JVM would probably benefit from the 2x dyno (with more RAM), while a single threaded Node app would benefit from two single core instances. However, our code gives you the freedom to use one 2X dyno and still use both cores. I don’t know if availability is the responsibility of the PaaS or yourself – drop me a line if you know the details.

It goes without saying that workers are separate processes, sharing nothing (SN). In reality, the workers will probably share storage via the attached resource, and storage itself can be clustered (or sharded) for horizontal scaling. It is debatable if sharing storage (even as attached resources) disqualifies this architecture from being called ‘SN’, but ignoring storage for now, your worker should be written to not cache anything in memory that cannot be easily recreated from a data source outside the worker itself. This includes auth or session data – you should rely on authentication schemes where the client sends you some kind of a token you can exchange for the user data with an external authentication authority. This makes your worker not unlike Dory from Pixar’s ‘Finding Nemo’, suffering from short term memory loss and introducing itself for each request. The flip side is that a new worker spawned after a worker death can be ready for duty, missing nothing from the previous interactions with the client.

In a sense, using clustering from the start builds character – you can never leave clustering as an afterthought, as something you will add later when your site becomes wildly popular and you need to scale. You may discover that you are caching too much in memory and need to devise schemes to share that information between nodes. It is better to get used to SN mindset before you start writing clever code that will bite you later.

Of course, this being Node, there is always more than one way to skin any particular cat. There is a history of clustering with Node, and also keeping Node alive (an uncaught exception can terminate your process, which is a bummer if only one process is serving all your traffic). In the olden days (i.e. couple of years ago), people had good experience with forever. It is simple and comes with a friendly license (MIT). Note though that forever only keeps your app alive, it is not clustering it. More recently, PM2 emerged as a more sophisticated solution, adding clustering and monitoring to the mix. Unfortunately, PM2 comes with an AGPL license, which makes it much harder to ship it with your commercial product (which means little if you are just having fun, but actually matters if you are a company of any size with actual paying customers installing your product on premise). Of course, if your whole business is hosted and you are not shipping anything to customers, you should be fine.

What I like about ‘cluster’ module is that it is part of the Node.js core library. We will likely add our own monitoring or look for ‘add-on’ monitoring that plays nicely with this module, rather than use a complete replacement like PM2. Regardless of what we do about monitoring, the clustering boilerplate will be a normal part of all our Node.js apps from now on.

© Dejan Glozic, 2014