HA All The Things

HA all the things

I hate HA (High Availability). Today everything has to be highly available. All of the sudden SA (Standard Availability) isn’t cutting it any more. Case in point: I used to listen to music on my way to work. Not any more – my morning meeting schedule intrudes into my ride, forcing me to participate in meetings while driving, Bluetooth and all. My 8 speaker, surround sound Acura ELS system hates me – built for high resolution multichannel reproduction, it is reduced to ‘Hi, who just joined?’ in glorious mono telephony. But I digress.

You know that I wrote many articles on micro-services because it is our ongoing concern as we are slowly evolving our topology away from monolithic systems and towards micro-services. I have already written about my thoughts on now to scale and provide HA for Node.js services. We have also solved our problem of handling messaging in a cluster using AMQP worker queues.

However, we are not done with HA. Message broker itself needs to be HA, and we only have one node. We are currently using RabbitMQ, and so far it has been rock solid, but we know that in a real-world system it is not a matter of ‘if’ but ‘when’ it will suffer a problem, bringing all the messaging capabilities of the system with it. Or we will mess around with the firewall rules and block access to it by accident. Hey, contractors rupture gas pipes and power cables by accident all the time. Don’t judge.

Luckily RabbitMQ can be clustered. RabbitMQ documentation is fairly extensive on clustering and HA. In short, you need to:

  1. Stand up multiple RabbitMQ instances (nodes)
  2. Make sure all the instances use the same Erlang cookie which allows them to talk to each other (yes, RabbitMQ is written in Erlang; you learn on the first day when you need to install Erlang environment before you install Rabbit)
  3. Cluster nodes by running rabbitmqctl join_cluster –ram rabbit@<firstnode> on the second server
  4. Start the nodes and connect to any of them

RabbitMQ has an interesting feature in that nodes in the cluster can join in RAM mode or in disc mode. RAM nodes will replicate state only in memory, while in disc mode they will also write it to disc. While in theory it is enough to have only one of the nodes in the cluster use disc mode, performance gain of using RAM mode is not worth the risk (performance gain of RAM mode is restricted to joining queues and exchanges, not posting messages anyway).

Not so fast

OK, we cluster the nodes and we are done, right? Not really. Here is the problem: if we configure the clients to connect to the first node and that node goes down, messaging is still lost. Why? Because RabbitMQ guys chose to not implement the load balancing part of clustering. The problem is that clients communicate with the broker using TCP protocol, and Swiss army knives of proxying/caching/balancing/floor waxing such as Apache or Nginx only reverse-proxy HTTP/S.

After I wrote that, I Googled just in case and found Nginx TCP proxy module on GitHub. Perhaps you can get away with just Nginx if you use it already. If you use Apache, I could not find TCP proxy module for it. It it exists, let me know.

What I DID find is that a more frequently used solution for this kind of a problem is HAProxy. This super solid and widely used proxy can be configured for Layer 4 (transport proxy), and works flawlessly with TCP. It is fairly easy to configure too: for TCP, you will need to configure the ‘defaults’, ‘frontend’ and ‘backend’ sections, or join both and just configure the ‘listen’ section (works great for TCP proxies).

I don’t want to go into the details of configuring HAProxy for TCP – there are good blog posts on that topic. Suffice to say that you can configure a virtual broker address that all the clients can connect to as usual, and it will proxy to all the MQ nodes in the cluster. It is customary to add the ‘check’ instruction to the configuration to ensure HAProxy will check that nodes are alive before sending traffic to them. If one of the brokers goes down, all the message traffic will be routed to the surviving nodes.

Do I really need HAProxy?

If you truly want to HA all the things, you need to now worry that you made the HAProxy a single point of failure. I told you, it never ends. The usual suggestions are to set up two instances, one standard and another backup for fail-over.

Can we get away with something simpler? It depends on how you define ‘simpler’. Vast majority of systems RabbitMQ runs on are some variant of Linux, and it appears there is something called LVS (Linux Virtual Server). LVS seems to be perfect for our needs, being a low-level Layer 4 switch – it just passes TCP packets to the servers it is load-balancing. Except in section 2.15 of the documentation I found this:

This is not a utility where you run ../configure && make && make check && make install, put a few values in a *.conf file and you’re done. LVS rearranges the way IP works so that a router and server (here called director and realserver), reply to a client’s IP packets as if they were one machine. You will spend many days, weeks, months figuring out how it works. LVS is a lifestyle, not a utility.

OK, so maybe not as perfect a fit as I thought. I don’t think I am ready for a LVS lifestyle.

How about no proxy at all?

Wouldn’t it be nice if we didn’t need the proxy at all? It turns out, we can pull that off, but it really depends on the protocol and client you are using.

It turns out not all clients for all languages are the same. If you are using AMQP, you are in luck. The standard Java client provided by RabbitMQ can accept a server address array, going through the list of servers when connecting or reconnecting until one responds. This means that in the event of node failure, the client will reconnect to another node.

We are using AMQP for our worker queue with Node.js, not Java, but the Node.js module we are using supports a similar feature. It can accept an array for the ‘host’ property (same port, user and password though). It will work with normal clustered installations, but the bummer is that you cannot install two instances on localhost to try the failure recovery out – you will need to use remote servers.

On the MQTT side, Eclipse Paho Java client supports multiple server URLs as well. Unfortunately, our standard Node.js MQTT module currently only supports one server. I was assured code contributions will not be turned away.

This solution is fairly attractive because it does not add any more moving parts to install and configure. The downside is that the clients becomes fully aware of all the broker nodes – we cannot just transparently add another node as we could in the case of the TCP load balancer. All the client must add it to the list of nodes to connect to for this addition to work. In effect, our code becomes aware of our infrastructure choices more than it should.

All this may be unnecessary for you if you use AWS since Google claims AWS Elastic Load Balancing can serve as a TCP proxy. Not a solution for us IBMers of course, but it may work for you.

Give me PaaS or give me death

This is getting pretty tiring – I wish we did all this in a PaaS like our own Bluemix so that it is all taken care of. IaaS gives you the freedom that can at times be very useful and allow you to do powerful customizations, but at other times makes you wish to get out of the infrastructure business altogether.

I told you I hate HA. Now if you excuse me, I need to join another call.

© Dejan Glozic, 2014

Advertisements

The Year of Blogging Dangerously

391px-Extremely_yummy_raspberry_cheesecake

Wow, has it been a year already? I am faking surprise, of course, because WordPress has notified me well ahead of time that I need to renew my dejanglozic.com domain. So in actuality I said ‘wow, will it soon be a year of me blogging’. Nevertheless, the sentiment is genuine.

It may be worthwhile to look back at the year, if only to reaffirm how quickly things change in this industry of ours, and also to notice some about-faces, changes of direction and mind.

I started blogging in the intent to stay true to the etymological sense of the word ‘blog’ (Web log). As a weekly diary of sorts, it was supposed to chronicle trials and tribulations of our team as it boldly goes into the tumultuous waters of writing Web apps in the cloud. I settled on a weekly delivery, which is at times doable, at other times a nightmare. I could definitely do without an onset of panic when I realize that it is Monday and I forgot to write a new entry.

Luckily we have enough issues we deal with daily in our work to produce enough material for the blog. In that regard, we are like a person who just moved into a new condo after his old apartment went up in flames and went to Ikea. If an eager clerk asks him ‘what do you need in particular’, his genuine answer must be ‘everything – curtains, rugs, new mattress, a table, chairs, a sofa, a coffee table …’.

At least that’s how we felt – we were re-doing everything in our distributed system and we were able to re-use very little from our past lives, having boldly decided to jump ahead as far as possible and start clean.

Getting things out of the system

That does not mean that the blog actually started with a theme or a direction. In the inaugural post The Turtleneck and The Hoodie, I proudly declared that I care both about development AND the design and refuse to choose. But that is not necessarily a direction to sustain a blog. It was not an issue for a while due to all these ideas that were bouncing in my head waiting to be written down. Looking back, I think it sort of worked in a general-purpose, ‘good advice’ kind of way. Posts such as Pulling Back from Extreme AJAX or A Guide to Storage for ADD Types were at least very technical and based on actual research and hands-on experience.

Some of the posts were just accumulated professional experience that I felt the need to share. Don’t Get Attached to Your Code or Dumb Code Good, Smart Code Bad were crowd pleasers, at least in the ‘yeah, it happened to me too’ way. Kind of like reading that in order to lose weight you need to eat smart and go outside. Makes a lot of sense except for the execution, which is the hard part.

344px-'Be_smart..Act_dumb^_-_NARA_-_513932

Old man yells at the cloud

Funnily enough, some of my posts, after using up all the accumulated wisdom to pass on, sound somewhat cranky in hindsight. I guess I disagreed with some ideas and directions I noticed, and the world ignored my disagreement and continued, unimpressed. How dare people do things I don’t approve of!

Two cranky posts that are worth highlighting are Swimming Against the Tide, in which I am cranky regarding client side MVC frameworks, and Sitting on the Node.js Fence, in which I argue with myself on pros and cons of Node.js. While my subsequent posts clearly demonstrate that I resolved the latter dilemma and went down the Node.js route hook, line and sinker, I am still not convinced that all that JavaScript required to write non-trivial Single Page Apps (SPAs) is a very good idea, particularly if you have any ambition to run them on mobile devices. But it definitely sounds funny to me now – as if I was expressing an irritated disbelief that, after publishing all the bad consequences of practicing extreme Ajax, people still keep doing it!

I heart Node.js

Of course, once our team went down Node.js route (egged on and cajoled by me), you could not get me to shut up about it. In fact, the gateway drug to it was my focus on templating solutions, and our choice of Dust.js (LinkedIn fork). By the way, it is becoming annoying to keep adding ‘LinkedIn fork’ all the time – that’s the only version that is actively worked on anyway.

Articles from this period are more-less setting the standard for my subsequent posts: they are about 1500 words long, have a mix of outgoing links, a focused technical topic, illustrative embedded tweets (thanks to @cra who taught me how not to embed tweets as images like a loser). As no story about Node.js apps is complete without Web Sockets and clustering, and both were dully covered.

Schnorr_von_Carolsfeld_Bibel_in_Bildern_1860_006

I know micro-services!

Of course, it was not until I went to attend NodeDay in February that a torrent of posts on micro-services was unleashed. The first half of 2014 was all ablaze with the posts and tweets about micro-services around the world anyway, which my new Internet buddy Adrian Rossouw dully documented in his Wayfinder field guide. It was at times comical to follow food fights about who will provide the bestest definition of them all:

If you follow a micro-services tag for my blog, the list of posts is long and getting longer every week. At some point I will stop tagging posts with it, because if everything is about them, nothing is – I need to be more specific. Nevertheless, I am grateful for the whole topic – it did after all allow me to write the most popular post so far: Node.js and Enterprise – Why Not?

monty-1920-1200-wallpaper

What does the future hold?

Obviously Node.js, messaging and micro-services will continue to dominate our short-term horizon as we are wrestling with them daily. I spoke about them at the recent DevCon5 in NYC and intend to do the same at the upcoming nodeconf.eu in September.

Beyond that, I can see some possible future topics (although I can’t promise anything – it is enough to keep up as it is).

  • Reactive programming – I have recently presented at the first Toronto Reactive meetup, and noticed this whole area of Scala and Akka that is a completely viable alternative to implement micro-services and scalable distributed systems that confirm to the tenets of Reactive Manifesto. I would like to probe further.
  • Go language – not only because TJ decided to go that route, having an alternative to Node.js while implementing individual micro-services is a great thing, particularly for API and back-end services (I still prefer Node.js for Web serving apps).
  • Libchan – Docker’s new project (like Go channels over the network) currently requires Go (duh) but I am sure Node.js version will follow.
  • Famo.us – I know, I know, I have expressed my concerns about their approach, but I did the same with Node.js and look at me now.
  • Swift – I am a registered XCode developer and have the Swift-enabled update to it. If only I could find some time to actually create some native iOS apps. Maybe I will like Swift more than I do Objective-C.

I would like to read this post in a year and see if any of these bullets panned out (or were instead replaced with a completely different list of even newer and cooler things). In this industry, I would not be surprised.

Whatever I am writing about, I would like to thank you for your support and attention so far and hope to keep holding it just a little bit longer. Now if you excuse me, I need to post this – I am already late this week!

© Dejan Glozic, 2014

Node.js Apps and Periodic Tasks

397px-Kitchen_alarm_clock

When working on a distributed system of any size, sooner or later you will hit a problem and proclaim ‘well, this is a first’. My second proclamation in such situations is ‘this is a nice topic for the blog’. Truth to form, I do it again, this time with the issue of running periodic tasks, and the twist that clustering and high availability efforts add to the mix.

First, to frame the problem: a primary pattern you will surely encounter in a Web application is Request/Response. It is a road well traveled. Any ‘Hello, World’ web app is waving you a hello in a response to your request.

Now add clustering to the mix. You want to ensure that no matter what is happening to the underlying hardware, or how many people hunger for your ‘hello’, you will be able to deliver. You add more instances of your app, and they ‘divide and conquer’ the incoming requests. No cry for a warm reply is left unanswered.

Then you decide that you want to tell a more complex message to the world because that’s the kind of person you are: complex and multifaceted. You don’t want to be reduced to a boring slogan. You store a long and growing list of replies in a database. Because you are busy and have no time for standing up databases, you use one hosted by somebody else, already set up for high availability. Then each of your clustered nodes talk to the same database. You set the ‘message of the day’ marker, and every node fetches it. Thousands of people receive the same message.

Because we are writing our system in Node.js, there are several ways to do this, and I have already written about it. Of course, a real system is not an exercise in measuring HWPS (Hello World Per Second). We want to perform complex tasks, serve a multitude of pages, provide APIs and be flexible and enable parallel development by multiple teams. We use micro-services to do all this, and life is good.

I have also written about the need to use messaging in a micro-service system to bring down the inter-service chatter. When we added clustering into the mix, we discovered that we need to pay special attention to ensure task dispatching similar to what Node.js clustering or proxy load balancing is providing us. We found our solution in round-robin dispatching provided by worker queues.

Timers are something else

Then we hit timers. As long as information flow in a distributed system is driven by user events, clustering works well because dispatching policies (most often round-robin) are implemented by both the Node.js clustering and proxy load balancer. However, there is a distinct class of tasks in a distributed system that is not user-driven: periodic tasks.

Periodic tasks are tasks that are done on a timer, outside of any external stimulus. There are many reasons why you would want to do it, but most periodic tasks service databases. In a FIFO of a limited size, they delete old entries, collapse duplicates, extract data for analysis, report them to other services etc.

For periodic tasks, there are two key problems to solve:

  1. Something needs to count the time and initiate triggers
  2. Tasks need to be written to execute when initiated by these triggers

The simplest way to trigger the tasks is known by every Unix admin – cron. You set up a somewhat quirky cron table, and tasks are executed according to the schedule.

The actual job to execute needs to be provided as a command line task, which means your app that normally accesses the database needs to provide additional CLI entry point sharing most of the code. This is important in order to keep with the factor XII from the 12-factors, which insists one-off tasks need to share the same code and config as the long running processes.

 

There are two problems with cron in the context of the cloud:

  1. If the machine running cron jobs malfunctions, all the periodic tasks will stop
  2. If you are running your system on a PaaS, you don’t have access to the OS in order to set up cron

The first problem is not a huge issue since these jobs run only periodically and normally provide online status when done – it is relatively easy for an admin to notice when they stop. For high availability and failover, Google has experimented with a tool called rcron for setting up cron over a cluster of machines.

Node cron

The second problem is more serious – in a PaaS, you will need to rely on a solution that involves your apps. This means we will need to set up a small app just to run an alternative to cron that is PaaS friendly. As usual, there are several options, but node-cron library seems fairly popular and has graduated past the version 1.0. If you run it in an app backed by supervisor or PM2, it will keep running and executing tasks.

You can execute tasks in the same app where node-cron is running, providing these tasks have enough async calls themselves to allow the event queue to execute other callbacks in the app. However, if the tasks are CPU intensive, this will block the event queue and should be extracted out.

A good way of solving this problem would be to hook up the app running node-cron to the message broker such as RabbitMQ (which we already use for other MQ needs in our micro-service system anyway). The only thing node-cron app will do is publish task requests to the predefined topics. The workers listening to these topics should do the actual work:

node-cron

The problem with this approach is that a new task request can arrive while a worker has not finished running the previous task. Care should be taken to avoid workers stepping over each other.

Interestingly enough, a hint at this approach can be found in aforementioned 12-factors, in the section on concurrency. You will notice a ‘clock’ app in the picture, indicating an app whose job is to ‘poke’ other apps at periodic intervals.

There can be only one

A ‘headless’ version of this approach can be achieved by running multiple apps in a cluster and letting them individually keep track of periodic tasks by calling ‘setTimeout’. Since these apps share nothing, they will run according to the local server clock that may nor may not be in sync with other servers. All the apps may attempt to execute the same task (since they are clones of each other). In order to prevent duplication, each app should attempt to write a ‘lock’ record in the database before starting. To avoid deadlock, apps should wait random amount of time before retrying.

Obviously, if the lock is already there, apps should fail to create their own. Therefore, only one app will win in securing the lock before executing the task. However, the lock should be set to expire in a small multiple of times required to normally finish the task in order to avoid orphaned locks due to crashed workers. If the worker has not crashed but is just taking longer than usual, it should renew the lock to prevent it from expiring.

The advantage of this approach is that we will only schedule the next task once the current one has finished, avoiding the problem that the worker queue approach has.

Note that in this approach, we are not achieving scalability, just high availability. Of the several running apps, at least one app will succeed in securing the lock and executing the task. The presence of other apps ensures execution but does not increase scalability.

I have conveniently omitted many details about writing and removing the lock, retries etc.

Phew…

I guarantee you that once you start dealing with periodic tasks, you will be surprised with the complexity of executing them in the cloud. A mix of cloud, clustering and high availability makes running periodic tasks a fairly non-trivial problem. Limitations of PaaS environments compound this complexity.

If you visit TJ’s tweet above, you will find dozen of people offering alternatives in the replies (most of them being variations of *ron). The plethora of different solutions will be a dead giveaway that this is a thorny problem. It is not fully solved today (at least not in the context of the cloud and micro-service systems), hence so many alternatives. If you use something that works well for you, do share in the ‘Reply’ section.

© Dejan Glozic, 2014

Micro-Service Fun – Node.js + Messaging + Clustering Combo

monty-1920-1200-wallpaper
Micro-Services in Silly Walk, Monty Python

A:    I told you, I’m not allowed to argue unless you’ve paid.
M:   I just paid!
A:   No you didn’t.
M:   I DID!
A:   No you didn’t.
M:  Look, I don’t want to argue about that.
A:  Well, you didn’t pay.
M:  Aha. If I didn’t pay, why are you arguing? I Got you!
A:   No you haven’t.
M:  Yes I have. If you’re arguing, I must have paid.
A:   Not necessarily. I could be arguing in my spare time.

 

Monty Python, ‘The Argument Clinic’

And now for something completely different. I decided to get off the soap box I kept climbing recently and give you some useful code for a change. I cannot stray too much from the topic of micro-services (because that is our reality now), or Node.js (because they is the technology with which we implement that reality). But I can zero in on a very practical problem we unearthed this week.

First, a refresher. When creating a complex distributed system using micro-services, one of the key problems to solve is to provide for inter-service communication. Micro-services normally provide REST APIs to get the baseline state, but the system is in constant flux and it does not take long until the state you received from the REST API is out of date. Of course, you can keep refreshing your copy by polling but this is not scalable and adds a lot of stress to the system, because a lot of these repeated requests do not result in new state (I can imagine an annoyed version of Scarlett Johansson in ‘Her’, replying with ‘No, you still have no new messages. Stop asking.’).

A better pattern is to reverse the flow of information and let the service that owns the data tell you when there are changes. This is where messaging comes in – modern micro-service based systems are hard to imagine without a message broker of some kind.

Another important topic of a modern system is scalability. In a world where you may need to quickly ramp up your ability to handle the demand (lucky you!), an ability to vertically or horizontally scale your micro-services on a moments notice is crucial.

Vertical and Horizontal Scalability

An important Node.js quirk explainer: people migrating from Java may not understand the ‘vertical scalability’ part of the equation. Due to the auto-magical handling of thread pools in a JEE container, increasing the real or virtual machine specs requires no effort on the software side. For example, if you add more CPU cores to your VM, JEE container will spread out to take advantage of them. If you add more memory, you may need to tweak the JVM settings but otherwise it will just work. Of course, at some point you will need to resort to multiple VMs (horizontal scalability), at which point you may discover that your JEE app is actually not written with clustering in mind. Bummer.

In the Node.js land, adding more cores will help you squat unless you make a very concrete effort to fork more processes. In practice, this is not hard to do with utilities such as PM2 – it may be as easy as running the following command:

pm2 start app.js -i max

Notice, however, that for Node.js, vertical and horizontal scalability is the same regarding the way you write your code. You need to cluster just to take advantage of all the CPU cores on the same machine, never mind load balancing multiple separate servers or VMs.

I actually LOVE this characteristic of Node.js – it forces you to think about clustering from the get go, discouraging you from holding onto data between requests, forcing you to store any state in a shared DB where it can be accessed by all the running instances. This makes the switch from vertical to horizontal scalability a non-event for you, which is a good thing to discover when you need to scale out in a hurry. Nothing new here, just basic share-nothing goodness (see 12factors for a good explainer).

However, there is one important difference between launching multiple Node.js processes using PM2 or Node ‘cluster’ module, and load-balancing multiple Node servers using something like Nginx. With load balancing using a proxy, we have a standalone server binding to a port on a machine, and balancing and URL proxying is done at the same time. You will write something like this:

http {
    upstream myapp1 {
        server srv1.example.com;
        server srv2.example.com;
        server srv3.example.com;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myapp1;
        }
    }
}

If you try to launch multiple Node servers on a single machine like this, all except the first one will fail because they cannot bind to the same port. However, if you use Node’s ‘cluster’ module (or use PM2 with uses the same module under the covers), a little bit of white magic happens – the master process has a bit of code that enables socket sharing between the workers using a policy (either OS-defined, or ’round-robin’ as of Node 0.12). This is very similar to what Nginx does to your Node instances running on separate servers, with a few more load balancing options (round-robin, least connected, IP-hash, weight-directed).

Now Add Messaging to the Mix

So far, so good. Now, as it is usual in life, the fun starts when you combine the two concepts I talked about (messaging and clustering) together.

To make things more concrete, let’s take a real world example (one we had to solve ourselves this week). We were writing an activity stream micro-service. Its job is to collect activities expressed using Activity Stream 2.0 draft spec, and store them in Cloudant DB, so that they can later be retrieved as an activity stream. This service does one thing, and does it well – it aggregates activities that can originate anywhere in the system – any micro-service can fire an activity by publishing into a dedicated MQTT topic.

On the surface, it sounds clear cut – we will use a well behaved mqtt module as a MQTT client, RabbitMQ for our polyglot message broker, and Node.js for our activity micro-service. This is not the first time we are using this kind of a system.

However, things become murky when clustering is added to the mix. This is what happens: MQTT is a pub/sub protocol. In order to allow each subscriber to read the messages from the queue at its own pace, RabbitMQ implements the protocol by hooking up separate queues for each Node instance in the cluster.

mqtt-cluster2

This is not what we want. Each instance will receive a ‘new activity’ message, and attempt to write it to the DB, requiring contention avoidance. Even if the DB can prevent all but one node to succeed in writing the activity record, this is wasteful because all the nodes are attempting the same task.

The problem here is that ‘white magic’ used for the clustering module to handle http/https server requests does not extend to the mqtt module.

Our initial thoughts around solving this problem were like this: if we move the message client to the master instance, it can react to incoming messages and pass them on to the forked workers in some kind of ’round- robin’ fashion. It seemed plausible, but had a moderate ‘ick’ factor because implementing our own load balancing seemed like fixing a solved problem. In addition, it would prevent us from using PM2 (because we had to be in control of forking the workers), and if we used multiple VMs and Nginx load balancing, we would be back to square one.

Fortunately, we realized that RabbitMQ can already handle this if we partially give up the pretense and acknowledge we are running AMQP under the MQTT abstraction. The way RabbitMQ works for pub/sub topologies is that publishers post to ‘topic’ exchanges, that are bound to queues using routing keys (in fact, there is direct mapping between AMQP routing keys and MQTT topics – it is trivial to map back and forth).

The problem we were having by using MQTT client on the consumer side was that each cluster instance received its own queue. By dropping to an AMQP client and making all the instances bind to the same queue, we let RabbitMQ essentially load balance the clients using ’round-robin’ policy. In fact, this way of working is listed in RabbitMQ documentation as work queue, which is exactly what we want.

amqp-cluster

OK, Show Us Some Code Now

Just for variety, I will publish MQTT messages to the topic using Eclipse PAHO Java client. Publishing using clients in Node.js or Ruby will be almost identical, modulo syntax differences:

	public static final String TOPIC = "activities";
	MqttClient client;

	try {
		client = new MqttClient("tcp://"+host+":1883",
                                          "mqtt-client1");
		client.connect();
		MqttMessage message = new MqttMessage(messageText.getBytes());
		client.publish(TOPIC, message);
	    System.out.println(" [x] Sent to MQTT topic '"
                                  +TOPIC+"': "+ message + "'");
		client.disconnect();
	} catch (MqttException e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}

The client above will publish to the ‘activities’ topic. What we now need to do on the receiving end is set up a single queue and bind it to the default AMQP topic exchange (“amq.topic”) using the matching routing key (again, ‘activities’). The name of the queue does not matter as long as all the Node workers are using it (and they will by the virtue of being clones of each other).

var amqp = require('amqp');

var connection = amqp.createConnection({ host: 'localhost' });

// Wait for connection to become established.
connection.on('ready', function () {
  // Use the default 'amq.topic' exchange
  connection.queue('worker-queue', { durable: true}, function (q) {
      // Route key identical to the MQTT topic
      q.bind('activities');

      // Receive messages
      q.subscribe(function (message, headers, deliveryInfo, messageObject) {
        // Print messages to stdout
        console.log('Node AMQP('+process.pid+'): received topic &quot;'+
        		deliveryInfo.routingKey+
        		'", message: "'+
        		message.data.toString()+'"');
      });
  });
});

Implementation detail: RabbitMQ MQTT plug-in uses the default topic exchange “amq.topic”. If we set up a different exchange for the MQTT traffic, we will need to explicitly name the exchange when binding the queue to it.

Any Downsides?

There are scenarios in which it is actually beneficial for all the workers to receive an identical message. When the workers are managing Socket.io chat rooms with clients, a message may affect multiple clients, so all the workers need to receive it and decide if it applies to them. A single queue topology used here is limited to cases where workers are used strictly for scalability and any single worker can do the job.

From the more philosophical point of view, by resorting to AMQP we have broken through the abstraction layer and removed the option to swap in another MQTT broker in the future. We looked around and noticed that other people had to do the same in this situation. MQTT is a pub/sub protocol and many pub/sub protocols have a ‘unicast’ feature which would deliver the message to only one of the subscribers using some kind of a selection policy (that would work splendidly in our case). Unfortunately, there is no ‘unicast’ in MQTT right now.

Nevertheless, by retaining the ability to handle front end and devices publishing in MQTT we preserved the nice features of MQTT protocol (simple, light weight, can run on the smallest of devices). We continue to express the messaging side of our APIs using MQTT. At the same time, we were able to tap into the ‘unicast’ behavior by using the RabbitMQ features.

That seems like a good compromise to me. Here is hoping that unicast will become part of the MQTT protocol at some point in the future. Then we can go back to pretending we are running a ‘mystery MQTT broker’ rather than RabbitMQ for our messaging needs.

Nudge, nudge. Wink wink. Say no more.

© Dejan Glozic, 2014

In The Beginning, Node Created os.cpus().length Workers

Schnorr_von_Carolsfeld_Bibel_in_Bildern_1860_006

As reported in the previous two posts, we are boldly going where many have already gone before – into the brave new world of Node.js development. Node has this wonderful aura that makes you feel unique even though the fact that you can find answers to your Node.js questions in the first two pages of Google search results should tell you something. This reminds me of Stuff White People Like, where the question ‘Why do white people love Apple products so much” is answered thusly:

Apple products tell the world you are creative and unique. They are an exclusive product line only used by every white college student, designer, writer, English teacher, and hipster on the planet.

And so it is with Node.js, where I fear that if I hesitate too much, somebody else will write an app in Node.js and totally deploy it to production, beating me to it.

Jesting aside, Node.js definitely has the ‘new shoes’ feel compared to stuff that was around much longer. Now that we graduated from ‘Hello, World’ and want to do some serious work in it, there is a list of best practices we need to quickly absorb. The topic of this post is fitting the square peg of Node.js single-threaded nature into the multi-core hole.

For many skeptics, the fact that Node.js is single-threaded is a non-starter. Any crusty Java server can seamlessly spread to all the cores on the machine, making Node look primitive in comparison. In reality, this is not so clear-cut – it all depends on how I/O intensive your code is. If it is mostly I/O bound, those extra cores will not help you that much, while non-blocking nature of Node.js will be a great advantage. However, if your code needs to do a little bit of sequential work (and even a mostly I/O bound code has a bit of blocking work here and there), you will definitely benefit from doubling up. There is also that pesky problem of uncaught exceptions –  they can terminate your server process. How do you address these problems? As always, it depends.

A simple use case (and one that multi-threading server proponents like to use as a Node.js anti-pattern) is a Node app installed on a multi-core machine (i.e. all machines these days). In a naive implementation, the app will use only one core for all the incoming requests, under-utilizing the hardware. In theory, a Java or a RoR server app will scale better by spreading over all the CPUs.

Of course, this being year 5 of Node.js existence, using all the cores is entirely possible using the core ‘cluster’ module (pun not intended). Starting from the example from two posts ago, all we need to do is bring in the ‘cluster’ module and fork the workers:

var cluster = require('cluster');

if (cluster.isMaster) {
   var numCPUs = require('os').cpus().length;
   //Fork the workers, one per CPU
   for (var i=0; i< numCPUs; i++) {
      cluster.fork();
   }
   cluster.on('exit', function(deadWorker, code, signal) {
      // The worker is dead. He's a stiff, bereft of life,
      // he rests in peace.
      // He's pushing up the daisies. He expired and went
      // to meet his maker.
      // He's bleedin' demised. This worker is no more.
      // He's an ex-worker.

      // Oh, look, a shiny new one.
      // Norwegian Blue - beautiful plumage.
      var worker = cluster.fork();

      var newPID = worker.process.pid;
      var oldPID = deadWorker.process.pid;

      console.log('worker '+oldPID+' died.');
      console.log('worker '+newPID+' born.');
   });
}
else {
   //The normal part of our app.

The code above does two things – it forks the workers (one per CPU core), and it replaces a dead worker with a spiffy new one. This illustrates an ethos of disposability as described in 12factors. An app that can quickly be started and stopped, can also be replaced if it crashes without a hitch. Of course, you can analyze logs and try to figure out why a worker crashed, but you can do it on our own time, while the app continues to handle requests.

It can help to modify the server creation loop by printing out the process ID (‘process’ is a global variable implicitly defined – no need to require a module for it):

http.createServer(app).listen(app.get('port'), function() {
   console.log('Express server '+process.pid+
                ' listening on port ' + app.get('port'));
});

The sweet thing is that even though we are using multiple processes, they are all bound to the same port (3000 in this case). This is done by the virtue of the master process being the only one actually bound to that port, and a bit of white Node magic.

We can now modify our controller to pass in the PID to simple page and render it using Dust:

exports.simple = function(req, res) {
  res.render('simple', { title: 'Simple',
                         active: 'simple',
                         pid: process.pid });
};

This line in simple.dust file will render the process ID on the page:


This page is served by server pid={pid}.

When I try this code on my quad-core ThinkPad laptop running Windows 7, I get 8 workers:

Express server 7668 listening on port 3000
Express server 8428 listening on port 3000
Express server 8580 listening on port 3000
Express server 9764 listening on port 3000
Express server 7284 listening on port 3000
Express server 5412 listening on port 3000
Express server 6304 listening on port 3000
Express server 8316 listening on port 3000

If you reload the browser fast enough when rendering the page, you can see different process IDs reported on the page.

This sounds easy enough, as most things in Node.js do. But as usual, real life is a tad messier. After testing the clustering on various machines and platforms, the Node.js team noticed that some machines tend to favor only a couple of workers from the entire pool. It is a sad fact of life that for college assignments, couple of nerds end up doing all the work while the slackers party. But few of us want to tolerate such behavior when it comes to responding to our Web traffic.

As a result, starting from the upcoming Node version 0.12, workers will be assigned in a ’round-robin’ fashion. This policy will be the default on most machines (although you can defeat it by adding this line before creating workers):

    // Set this before calling other cluster functions.
    cluster.schedulingPolicy = cluster.SCHED_NONE;

You can read more about it in this StrongLoop blog post.

An interesting twist to clustering is when you deploy this app to the Cloud, using IaaS such as SoftLayer, Amazon EC2 or anything based on VMware. Since you can provision VMs with a desired number of virtual cores, you have two dimensions to scale your Node application:

  1. You can ramp up the number of virtual cores allocated for your app. Your code as described above will stretch to create more workers and take advantage of this increase, but all the child processes will still be using shared RAM and virtual file system. If a rogue worker fills up the file system writing logs like mad, it will spoil the party for all. This approach is good if you have some CPU bottlenecks in your app.
  2. You can add more VMs, fully isolating your app instances. This approach will give you more RAM and disk space. For JVM-based apps, this would definitely matter because JVMs are RAM-intensive. However, Node apps are much more frugal when it comes to resources, so you may not need as many full VMs for Node.

Between the two approaches, ramping up cores is definitely the cheaper option, and should be attempted first – it may be all you need. Of course, if you deploy your app to a PaaS like CloudFoundry or Heroku, all bets are off. It is possible that the code I have listed above is not even needed if you intend to host your app on a PaaS, because the platform will provide this behaviour out of the box. However, in some configurations this code will still be useful.

Example: Heroku gives you a single CPU dyno (virtualized unit of server power) with 512MB RAM for free. If you stay on one instance but pick a 2-core dyno with 1GB RAM (I know, still peanuts), that will cost you $34.50 at the time of writing (don’t quote me on the numbers, check them directly at the Heroku pricing page). Using two single core dynos will cost you the same. Between the two, JVM would probably benefit from the 2x dyno (with more RAM), while a single threaded Node app would benefit from two single core instances. However, our code gives you the freedom to use one 2X dyno and still use both cores. I don’t know if availability is the responsibility of the PaaS or yourself – drop me a line if you know the details.

It goes without saying that workers are separate processes, sharing nothing (SN). In reality, the workers will probably share storage via the attached resource, and storage itself can be clustered (or sharded) for horizontal scaling. It is debatable if sharing storage (even as attached resources) disqualifies this architecture from being called ‘SN’, but ignoring storage for now, your worker should be written to not cache anything in memory that cannot be easily recreated from a data source outside the worker itself. This includes auth or session data – you should rely on authentication schemes where the client sends you some kind of a token you can exchange for the user data with an external authentication authority. This makes your worker not unlike Dory from Pixar’s ‘Finding Nemo’, suffering from short term memory loss and introducing itself for each request. The flip side is that a new worker spawned after a worker death can be ready for duty, missing nothing from the previous interactions with the client.

In a sense, using clustering from the start builds character – you can never leave clustering as an afterthought, as something you will add later when your site becomes wildly popular and you need to scale. You may discover that you are caching too much in memory and need to devise schemes to share that information between nodes. It is better to get used to SN mindset before you start writing clever code that will bite you later.

Of course, this being Node, there is always more than one way to skin any particular cat. There is a history of clustering with Node, and also keeping Node alive (an uncaught exception can terminate your process, which is a bummer if only one process is serving all your traffic). In the olden days (i.e. couple of years ago), people had good experience with forever. It is simple and comes with a friendly license (MIT). Note though that forever only keeps your app alive, it is not clustering it. More recently, PM2 emerged as a more sophisticated solution, adding clustering and monitoring to the mix. Unfortunately, PM2 comes with an AGPL license, which makes it much harder to ship it with your commercial product (which means little if you are just having fun, but actually matters if you are a company of any size with actual paying customers installing your product on premise). Of course, if your whole business is hosted and you are not shipping anything to customers, you should be fine.

What I like about ‘cluster’ module is that it is part of the Node.js core library. We will likely add our own monitoring or look for ‘add-on’ monitoring that plays nicely with this module, rather than use a complete replacement like PM2. Regardless of what we do about monitoring, the clustering boilerplate will be a normal part of all our Node.js apps from now on.

© Dejan Glozic, 2014