Micro-Service Fun – Node.js + Messaging + Clustering Combo

monty-1920-1200-wallpaper
Micro-Services in Silly Walk, Monty Python

A:    I told you, I’m not allowed to argue unless you’ve paid.
M:   I just paid!
A:   No you didn’t.
M:   I DID!
A:   No you didn’t.
M:  Look, I don’t want to argue about that.
A:  Well, you didn’t pay.
M:  Aha. If I didn’t pay, why are you arguing? I Got you!
A:   No you haven’t.
M:  Yes I have. If you’re arguing, I must have paid.
A:   Not necessarily. I could be arguing in my spare time.

 

Monty Python, ‘The Argument Clinic’

And now for something completely different. I decided to get off the soap box I kept climbing recently and give you some useful code for a change. I cannot stray too much from the topic of micro-services (because that is our reality now), or Node.js (because they is the technology with which we implement that reality). But I can zero in on a very practical problem we unearthed this week.

First, a refresher. When creating a complex distributed system using micro-services, one of the key problems to solve is to provide for inter-service communication. Micro-services normally provide REST APIs to get the baseline state, but the system is in constant flux and it does not take long until the state you received from the REST API is out of date. Of course, you can keep refreshing your copy by polling but this is not scalable and adds a lot of stress to the system, because a lot of these repeated requests do not result in new state (I can imagine an annoyed version of Scarlett Johansson in ‘Her’, replying with ‘No, you still have no new messages. Stop asking.’).

A better pattern is to reverse the flow of information and let the service that owns the data tell you when there are changes. This is where messaging comes in – modern micro-service based systems are hard to imagine without a message broker of some kind.

Another important topic of a modern system is scalability. In a world where you may need to quickly ramp up your ability to handle the demand (lucky you!), an ability to vertically or horizontally scale your micro-services on a moments notice is crucial.

Vertical and Horizontal Scalability

An important Node.js quirk explainer: people migrating from Java may not understand the ‘vertical scalability’ part of the equation. Due to the auto-magical handling of thread pools in a JEE container, increasing the real or virtual machine specs requires no effort on the software side. For example, if you add more CPU cores to your VM, JEE container will spread out to take advantage of them. If you add more memory, you may need to tweak the JVM settings but otherwise it will just work. Of course, at some point you will need to resort to multiple VMs (horizontal scalability), at which point you may discover that your JEE app is actually not written with clustering in mind. Bummer.

In the Node.js land, adding more cores will help you squat unless you make a very concrete effort to fork more processes. In practice, this is not hard to do with utilities such as PM2 – it may be as easy as running the following command:

pm2 start app.js -i max

Notice, however, that for Node.js, vertical and horizontal scalability is the same regarding the way you write your code. You need to cluster just to take advantage of all the CPU cores on the same machine, never mind load balancing multiple separate servers or VMs.

I actually LOVE this characteristic of Node.js – it forces you to think about clustering from the get go, discouraging you from holding onto data between requests, forcing you to store any state in a shared DB where it can be accessed by all the running instances. This makes the switch from vertical to horizontal scalability a non-event for you, which is a good thing to discover when you need to scale out in a hurry. Nothing new here, just basic share-nothing goodness (see 12factors for a good explainer).

However, there is one important difference between launching multiple Node.js processes using PM2 or Node ‘cluster’ module, and load-balancing multiple Node servers using something like Nginx. With load balancing using a proxy, we have a standalone server binding to a port on a machine, and balancing and URL proxying is done at the same time. You will write something like this:

http {
    upstream myapp1 {
        server srv1.example.com;
        server srv2.example.com;
        server srv3.example.com;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myapp1;
        }
    }
}

If you try to launch multiple Node servers on a single machine like this, all except the first one will fail because they cannot bind to the same port. However, if you use Node’s ‘cluster’ module (or use PM2 with uses the same module under the covers), a little bit of white magic happens – the master process has a bit of code that enables socket sharing between the workers using a policy (either OS-defined, or ’round-robin’ as of Node 0.12). This is very similar to what Nginx does to your Node instances running on separate servers, with a few more load balancing options (round-robin, least connected, IP-hash, weight-directed).

Now Add Messaging to the Mix

So far, so good. Now, as it is usual in life, the fun starts when you combine the two concepts I talked about (messaging and clustering) together.

To make things more concrete, let’s take a real world example (one we had to solve ourselves this week). We were writing an activity stream micro-service. Its job is to collect activities expressed using Activity Stream 2.0 draft spec, and store them in Cloudant DB, so that they can later be retrieved as an activity stream. This service does one thing, and does it well – it aggregates activities that can originate anywhere in the system – any micro-service can fire an activity by publishing into a dedicated MQTT topic.

On the surface, it sounds clear cut – we will use a well behaved mqtt module as a MQTT client, RabbitMQ for our polyglot message broker, and Node.js for our activity micro-service. This is not the first time we are using this kind of a system.

However, things become murky when clustering is added to the mix. This is what happens: MQTT is a pub/sub protocol. In order to allow each subscriber to read the messages from the queue at its own pace, RabbitMQ implements the protocol by hooking up separate queues for each Node instance in the cluster.

mqtt-cluster2

This is not what we want. Each instance will receive a ‘new activity’ message, and attempt to write it to the DB, requiring contention avoidance. Even if the DB can prevent all but one node to succeed in writing the activity record, this is wasteful because all the nodes are attempting the same task.

The problem here is that ‘white magic’ used for the clustering module to handle http/https server requests does not extend to the mqtt module.

Our initial thoughts around solving this problem were like this: if we move the message client to the master instance, it can react to incoming messages and pass them on to the forked workers in some kind of ’round- robin’ fashion. It seemed plausible, but had a moderate ‘ick’ factor because implementing our own load balancing seemed like fixing a solved problem. In addition, it would prevent us from using PM2 (because we had to be in control of forking the workers), and if we used multiple VMs and Nginx load balancing, we would be back to square one.

Fortunately, we realized that RabbitMQ can already handle this if we partially give up the pretense and acknowledge we are running AMQP under the MQTT abstraction. The way RabbitMQ works for pub/sub topologies is that publishers post to ‘topic’ exchanges, that are bound to queues using routing keys (in fact, there is direct mapping between AMQP routing keys and MQTT topics – it is trivial to map back and forth).

The problem we were having by using MQTT client on the consumer side was that each cluster instance received its own queue. By dropping to an AMQP client and making all the instances bind to the same queue, we let RabbitMQ essentially load balance the clients using ’round-robin’ policy. In fact, this way of working is listed in RabbitMQ documentation as work queue, which is exactly what we want.

amqp-cluster

OK, Show Us Some Code Now

Just for variety, I will publish MQTT messages to the topic using Eclipse PAHO Java client. Publishing using clients in Node.js or Ruby will be almost identical, modulo syntax differences:

	public static final String TOPIC = "activities";
	MqttClient client;

	try {
		client = new MqttClient("tcp://"+host+":1883",
                                          "mqtt-client1");
		client.connect();
		MqttMessage message = new MqttMessage(messageText.getBytes());
		client.publish(TOPIC, message);
	    System.out.println(" [x] Sent to MQTT topic '"
                                  +TOPIC+"': "+ message + "'");
		client.disconnect();
	} catch (MqttException e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}

The client above will publish to the ‘activities’ topic. What we now need to do on the receiving end is set up a single queue and bind it to the default AMQP topic exchange (“amq.topic”) using the matching routing key (again, ‘activities’). The name of the queue does not matter as long as all the Node workers are using it (and they will by the virtue of being clones of each other).

var amqp = require('amqp');

var connection = amqp.createConnection({ host: 'localhost' });

// Wait for connection to become established.
connection.on('ready', function () {
  // Use the default 'amq.topic' exchange
  connection.queue('worker-queue', { durable: true}, function (q) {
      // Route key identical to the MQTT topic
      q.bind('activities');

      // Receive messages
      q.subscribe(function (message, headers, deliveryInfo, messageObject) {
        // Print messages to stdout
        console.log('Node AMQP('+process.pid+'): received topic "'+
        		deliveryInfo.routingKey+
        		'", message: "'+
        		message.data.toString()+'"');
      });
  });
});

Implementation detail: RabbitMQ MQTT plug-in uses the default topic exchange “amq.topic”. If we set up a different exchange for the MQTT traffic, we will need to explicitly name the exchange when binding the queue to it.

Any Downsides?

There are scenarios in which it is actually beneficial for all the workers to receive an identical message. When the workers are managing Socket.io chat rooms with clients, a message may affect multiple clients, so all the workers need to receive it and decide if it applies to them. A single queue topology used here is limited to cases where workers are used strictly for scalability and any single worker can do the job.

From the more philosophical point of view, by resorting to AMQP we have broken through the abstraction layer and removed the option to swap in another MQTT broker in the future. We looked around and noticed that other people had to do the same in this situation. MQTT is a pub/sub protocol and many pub/sub protocols have a ‘unicast’ feature which would deliver the message to only one of the subscribers using some kind of a selection policy (that would work splendidly in our case). Unfortunately, there is no ‘unicast’ in MQTT right now.

Nevertheless, by retaining the ability to handle front end and devices publishing in MQTT we preserved the nice features of MQTT protocol (simple, light weight, can run on the smallest of devices). We continue to express the messaging side of our APIs using MQTT. At the same time, we were able to tap into the ‘unicast’ behavior by using the RabbitMQ features.

That seems like a good compromise to me. Here is hoping that unicast will become part of the MQTT protocol at some point in the future. Then we can go back to pretending we are running a ‘mystery MQTT broker’ rather than RabbitMQ for our messaging needs.

Nudge, nudge. Wink wink. Say no more.

© Dejan Glozic, 2014

We Don’t Have MongoDB – is CouchDB OK?

First Choice Colas, Wikimedia Commons, 2007, Erik1980
First Choice Colas, Wikimedia Commons, 2007, Erik1980

‘It can’t be helped,’ the saddlebum said. ‘What’s happened is, you’ve overloaded your analogizing faculty, thereby blowing a fuse. Accordingly, your perceptions have taken up the task of experimental normalization. This state is known as “metaphoric deformation”‘.

 

Robert Sheckley, “Mind Swap”, 1966.

One of the Sci-Fi books that left a deep impression on my young mind was Mind Swap by Robert Sheckley (no, I didn’t read it in 1966, in case you were wandering – I am not that old). It introduced a number of hilarious concepts that were surprisingly applicable to the real world, and not just the crazy ones he created in the book. One of them was the aforementioned ‘metaphoric deformation’. According to Robert, it is a disease that often befalls interstellar travellers, where a mind overwhelmed by a stimulation it cannot process and make sense of, starts generating more bearable hallucinations.

I think about metaphoric deformation a lot these days when the world of software development is changing at an unprecedented pace, and in multiple dimensions at once. In our development workflow, we create a set of roles for libraries, databases, languages and platforms. We understand that the things have changed and are willing to replace A for B but are desperately trying to keep the old habits in place, just replace the players.

Case in point – the MEAN stack. The open source world was happy with the LAMP stack (Linux, Apache, MySQL, PHP). Facebook now sits on the world-biggest mountain of PHP and stores precious big data in racks upon racks of MySQL because that’s what Mark reached for in his Harvard dorm room to create thefacebook.com (OK, I know they bolted Hack on top of PHP but still). Now that the world has changed unrecognizably, developers are searching for another four-letter stack lest they slip into a metaphoric deformation of their own.

And yet, that’s just a crutch. Let’s analyze what MEAN stands for:

  • M is for MongoDB – a goto NoSQL database for MySQL withdrawal, it recommends itself mostly because of the SQL-like query language that makes porting manageable.
  • E is for Express.js – a middleware framework for Node.js that allows you to create the familiar world of server side MVC.
  • A is for Angular.js – because if MVC on the server is good, MVC on the client must be great, and the good people of Google have thought about everything so that you don’t have to (and if you do, it’s kind of hard to wrestle the control back anyway).
  • N is for Node.js – because awesome.

See what we have done? We just recreated our old world using the new tech. Very human, but you will soon realize that the old world is gone for good and we will have to do better.

Fearful for the fragile minds of their fellow citizens, the first engineers to experiment with cars simply set out to create ‘horseless carriages’. Replace a horse with an engine, keep everything else as it has always been since the dawn of time. Now, look around you – does that 10 air bag, GPS-connecting, direct-fuel-injecting, microprocessor-controlled car that just whizzed by has any semblance to the horseless carriage to you?

Nowhere is this inclination to replace the parts but keep the structure more bizarre than in the current software world. I am not the first to notice – Forrest Norvel from New Relic has observed it in his presentation Beyond the MEAN Stack.

Forrest noticed that Node.js is all about modularity and freedom of choice. For every letter in the MEAN stack (except N, of course, lets not get crazy here) there is an alternative. Instead of MongoDB, there are other NoSQL databases. Instead of Express.js, there is Hapi.js (created by Eran Hammer, a person I deeply admire ever since he left OAuth 2.0 Workgroup with a bang). Even the author of Express, the prolific TJ Holowaychuk, moved on to the next generation framework Koa that uses ES6 generators and looks very promising. Instead of Angular, there is Ember.js, or Backbone.js, or Knockout.js, or none at all (why do you think you absolutely need MVC on the client if some jQuery or even just vanilla JS can suffice?).

Adrian Rossouw dared to ask the right question – why the crazy stampede to MongoDB? There are many situations where another NoSQL database is a better fit, so why instinctively reaching for ‘MySQL of NoSQLs’, other than to keep you from slipping into metaphoric deformation?

Due to some legal issues (AGPL license of MongoDB server is giving our lawyers heartburn), we were slow to make that instinctive choice ourselves. Then IBM bought Cloudant, and it was an easy decision – use our very own NoSQL DB, and let somebody else manage it to boot. True master-master replication can’t hurt either.

Not so fast. In case you are not familiar, Cloudant is a hosted CouchDB fork with the *Apache Lucene on top. Just looking at CouchDB itself, the goal was to do the key things well, not help developers transition to the new world by dragging in the old habits. We were staring at map/reduce views feeling the anxiety of the New, looking for the ways to make dynamic queries.

Turns out you can’t. That’s what Apache Lucene was there for – if you need to create ad hoc queries based on some computed constraints, you use Apache Lucene. Map/reduce views are for predefined queries – things you can craft in advance.

Paul Klicnik, a developer from our team, spent a lot of time banging his head against the Cloudant wall, looking wistfully over into the MongoDB land where he could have just formed a dynamic query using JavaScript. However, when he made the realization he could use Apache Lucene for that, things started looking up. In the end, we ended up with a very fast, solid implementation of a Node.js micro-service that uses nano module to persist data in Cloudant.

In Paul’s own words:

My experience is that map/reduce is a little tricky to get the hang of, but it’s very powerful. Now that I’ve experienced it for myself, I can say I like it a lot. I’m actually using a regular view with map/reduce for retrieving user filters, and it works great. There is something very simple and elegant about Couch. It’s focused on being very good in certain scenarios, and not trying to be a general purpose database which can solve every problem. I also like that the API is RESTful – I know it’s a small detail, but REST is already pretty natural for us, and it makes it easy to work with.

Apparently, Paul is firmly in the new world now – using CouchDB in a way it was intended, not by trying to pretend it is an SQL database where you don’t have to create tables any more.

Of course, not everything is perfect:

Without the Lucene functionality, we would not be able to use CouchDB for the type of querying we need to do. I have mixed feeling about this. So far, I have had zero problems with Lucene and it performs great, but I’m uneasy about the fact that someone essentially shoe-horned in a major feature. I view Cloudant as a two headed monster – one head being normal CouchDB that behaves as expected, the other a weird appendage that seems to be behaving normally but you’re never quite sure if it’ll bite your head off. I’m being a bit dramatic, but I’m not fond of solutions where someone alters the functionality and attempts to make a product do something that it was never intended to do, ultimately comprising the system on the way. I realize that this is a purist view; maybe under the covers everything is technically sound.

I have added Paul’s second comment in case you interpreted this post as an ad for Cloudant. We are just finding our way in the new world and trying to use it as intended, and sometimes it gets a little weird. I guess that’s half the fun.

As I always say, “si fueris Rōmae, Rōmānō vīvitō mōre (rough translation from Latin: “when in Rome, for the love of God stop eating at McDonald’s – there are like a million trattorias serving actual Italian pizza and stuff”). Stop searching for new stacks to recreate your familiar world and instead open your mind for the new ways of looking at things.

Or even better – don’t create a single stack. For each element of the old stack, go through several awesome choices available today, and take a pick based on the suitability for the task at hand. Contractors don’t have one hammer, one saw and one drill set – why should you?

Oh, and one more thing – stop drinking caffeinated sugary drinks. Both of them.

 

*The original version of the article claimed Cloudant uses ‘Lucene Elastic Search’. This was a mistake, although Apache Lucene (definitely not Elastic, or Lucene Search) is involved, as per company website.

© Dejan Glozic, 2014

Meanwhile, in Nodeland…

Hieronymus Bosch, "The Garden of Earthly Delights", between 1480 and 1505.
Hieronymus Bosch, “The Garden of Earthly Delights”, between 1480 and 1505.

In India, we don’t call it “Indian food’. We just call it “food”.

Rajesh Koothrappali, “The Big Bang Theory”

It was always hard for me to make choices. In the archetypical classification of ‘satisficers’ and ‘maximizers’, I definitely fall into the latter camp. I research ad nauseum, read reviews, measure carefully, take everything into account, and then finally make a move, mentally exhausted. Then I burst into cold sweat of buyer’s remorse, or read a less than glowing review of the product I just purchased, and my happiness is dimished. It sucks to be me.

It appears it can be passed on as well. My 14-year-old daughter recently went to the mall to buy clothes (alone, a rite of passage of sorts). She returned empty-handed – ‘there was nothing nice’. I think we confirmed paternity right then and there.

Of course, this affliction carries over into my professional life. Readers who follow me for a while can attest my hemming and hawing about Node.js – should I stay or should I go, what if it is not ready, what if it turns out to be just a fad, what if I read a bad review after we port everything.

At least in my professional choices I appear to be more decisive than when looking at Banana Republic sweaters. After we jumped into Node.js waters in January this year, there was no turning back. It is almost April now and my mid-term report is due.

One of the beneficial things of ending the agony and giving Node a chance is that you stop worrying about HOW things are done and focus on GETTING them done. And once you do it, like Rajesh above, Node.js recedes into the background and the actual problems you are trying to solve become your focus (you are not solving problems in Node, you are just, you know, solving problems). Xenofobic people who get to know ‘the other’ are often surprised that under the surface of difference there is a sea of sameness – people around the world have similar worries, hopes, fears and joys. In Node, you still have to worry about security, i18n, performance, code structure, user experience, and Node is not the only player involved in this – just one tool in your toolbox.

One of the most important roles in converting an organization of any size to Node.js is that of influencers. People like myself who push for it, look for opportunities to get ‘camel’s nose under the tent’ (to borrow from Bill Scott), find a crevice where Node can squeeze and then expand like ice in the road potholes (I should know, I keep avoiding them this cold March in Toronto all the time). It is a startup of sorts, an inverted pyramid that can easily fall sideways if the founder flinches or loses hope and vision.

On the other hand, it can remind you of the locked potential of the people. There is a dark lake of passion in all of us, often trapped under a layer of dealing with day to day stress and ‘things that need to be done’. It is heart-warming to see that when you succeed in winning people over to the Node.js side, the passion bursts out and they jump in with enthusiasm and vigor that is great to experience.

Example: a member of the JazzHub team heard my pitch for Node.js and started experimenting with tutorials written in markdown. Like everything else, these tutorials are currently served using JSPs and client-side JavaScript. He wrote a small Node app that takes the tutorials, converts them to HTML using a nice little module and renders them using Dust.js. He did it in one afternoon and proudly demoed it the next time we met – it is one page of JavaScript – you don’t even need to scroll to see the entire code. Needless to say, he was hooked.

One of the most important jobs of an influencer is to prevent failure on stupid technicalities. People ready to yell ‘I told you Node was not ready’ are everywhere. Node is awesome,  but we all know that a single uncaught exception can crash the process. A JEE container can throw exceptions like a leaky barge, filling up the logs, but it will stay up. In Node, it is your responsibility to prop Node up, a Weekend at Bernies’s of sorts. Judging by the frequency of questions about this behaviour, it is hard to stomach by people moving over from the other platforms.

JEE is in fact not that different. When an uncaught exception is thrown in a simple Java program with ‘main’, it exits. When the same happens in a JEE container, the runnable running in a thread exits. The thread is reset, returned to the thread pool, and the next request is passed on to another thread. In a production system with Node.js, it is your responsibility to replicate this behaviour.

It is somewhat easier to end up with an uncaught exception in Node.js than it is in Java. Due to the asynchronous nature of Node, trying to surround a piece of code in a try/catch block is often futile. Exceptions can and will escape. This is such a tricky problem that it warranted the high priests of Node to huddle recently and release a detailed and super useful document that anybody trying to write production quality Node.js system should read and internalize.

It goes without saying then that you will never run ‘node myApp.js’ except during development. In production, you want to spread out to cover all the cores of the machine or VM you are running on. You want to restart the process when it exits (and it will, repeatedly). And finally, it would help to monitor CPU, memory and throughput of each running process. We are currently looking at PM2 for this, but as always with Node, there are other ways of addressing this need.

There are other things we discovered while getting serious with Node. For example:

  1. WebSockets – one of the first things you will try to do after getting the CRUD down is server push – after all, Data Intensive Real Time (DIRT) is what initially catapulted Node into our consciousness. After you get the code working on your machine, inspect the entire path to the browser on your system – proxies are surprisingly cranky when it comes to support for Web Sockets. NGINX didn’t have it until version 1.3. If you are using Apache, note that it will not let the Web Socket connections through unless you install and configure mod_proxy_wstunnel. Of course, this is not Node-specific, but chances are you are most likely to hit it with Node because of its affinity to server push.
  2. DevOps – make them your friend. If you thought Continous Integration and Continous Deployment was a neat idea, you will find it absolutely indispensable once you start getting real with Node. A well established pattern here is a system composed of micro-services, and it is not uncommon to juggle dozens upon dozens of separate Node apps performing a role in the system. You will go mad trying to manage micro-services by hand. Of course, once you get your DevOps going, the ability to deliver new features into a micro-service with surgical precision will bring no end of joy, particularly if you are replacing a JEE pig that takes minutes to restart. Zero downtime is a reachable goal in a micro-service based system comprised of Node processes. In IBM Rational, we use Urban Code, which is natural to us since we bought them recently. Do what you need to do to go automatic.
  3. Storage – Node will make you re-evaluate the rest of your system. When I started this blog, I was characteristically wishy-washy about databases. With Node.js, we noticed that we naturally lean towards JSON-based NoSQL databases. Many people use MongoDB with success due to the smooth transition from SQL, but if you don’t like its scalability characteristics or your lawyers don’t like its AGPL license, CouchDB is an easy choice. IBM recently bought Clodant (hosted CouchDB fork with Apache Lucene tossed in) so it is hard for us to say ‘no’ to a free, Cloud-based and professionally managed NoSQL JSON DB. Again, YMMV – use what works for you, but give JSON-based NoSQL databases a try – going JavaScript all the way down is liberating.
  4. Authentication and Single Sign-On – if there is anything that will remind you of the Ford Model T (you could order it in any colour as long as it was black), it will be authentication and SSO in your system. Nothing will uncover unnecessary complexity of the authentication schemes used in the system like trying to move it to something that is not JEE-based. The complexity is normally hidden in the JEE filters, and is uncovered as soon as you try to replicate it in Node.js. It is technology agnostic as long as you use Java. The good thing is that with a sympathetic team on the other end, this serves as a catalyst to finally simplify and clean up the protocol or switch to something saner.
  5. Legal – if you are in an enterprise of any size, you will not be allowed to just use whatever you find on the Internet (even startups find software promiscuity to have side-effects as soon as they grow enough to attract legal attention). Our legal department probably hates us – the amount of Node.js modules they need to vet is staggering, but the mountain of requests is tapering down. For example, the Markdown app I mentioned earlier in the post needed to vet only the markdown module – all the other modules were already encountered by the apps before it. At this rate, the flow will slow down to a trickle soon.

So there you go – three months after our foray into Nodeland, we are lining up several new Node micro-services, the sky didn’t fall, we are learning new things every day and worry more about what we are building than how we are building it. Every once in a while we are forced to solve a problem for the first time. We agree on the best practice together, write it down for others coming after us, and move on. Where we can, we rely on the efforts of people before us, allowing suites such as Kraken.js to let us focus on what is really unique about our code, rather than the boilerplate. Our lawyers hate us, but if ‘tried and true’ was the only criterium when choosing our tools and languages, we would be still writing everything in Cobol.

If only I was this decisive choosing my last Home Theatre receiver.

© Dejan Glozic, 2014