In The Beginning, Node Created os.cpus().length Workers

As reported in the previous two posts, we are boldly going where many have already gone before – into the brave new world of Node.js development. Node has this wonderful aura that makes you feel unique even though the fact that you can find answers to your Node.js questions in the first two pages of Google search results should tell you something. This reminds me of Stuff White People Like, where the question ‘Why do white people love Apple products so much” is answered thusly:

Apple products tell the world you are creative and unique. They are an exclusive product line only used by every white college student, designer, writer, English teacher, and hipster on the planet.

And so it is with Node.js, where I fear that if I hesitate too much, somebody else will write an app in Node.js and totally deploy it to production, beating me to it.

Jesting aside, Node.js definitely has the ‘new shoes’ feel compared to stuff that was around much longer. Now that we graduated from ‘Hello, World’ and want to do some serious work in it, there is a list of best practices we need to quickly absorb. The topic of this post is fitting the square peg of Node.js single-threaded nature into the multi-core hole.

For many skeptics, the fact that Node.js is single-threaded is a non-starter. Any crusty Java server can seamlessly spread to all the cores on the machine, making Node look primitive in comparison. In reality, this is not so clear-cut – it all depends on how I/O intensive your code is. If it is mostly I/O bound, those extra cores will not help you that much, while non-blocking nature of Node.js will be a great advantage. However, if your code needs to do a little bit of sequential work (and even a mostly I/O bound code has a bit of blocking work here and there), you will definitely benefit from doubling up. There is also that pesky problem of uncaught exceptions – they can terminate your server process. How do you address these problems? As always, it depends.

A simple use case (and one that multi-threading server proponents like to use as a Node.js anti-pattern) is a Node app installed on a multi-core machine (i.e. all machines these days). In a naive implementation, the app will use only one core for all the incoming requests, under-utilizing the hardware. In theory, a Java or a RoR server app will scale better by spreading over all the CPUs.

Of course, this being year 5 of Node.js existence, using all the cores is entirely possible using the core ‘cluster’ module (pun not intended). Starting from the example from two posts ago, all we need to do is bring in the ‘cluster’ module and fork the workers:

var cluster = require('cluster');

if (cluster.isMaster) {
   var numCPUs = require('os').cpus().length;
   //Fork the workers, one per CPU
   for (var i=0; i< numCPUs; i++) {
      cluster.fork();
   }
   cluster.on('exit', function(deadWorker, code, signal) {
      // The worker is dead. He's a stiff, bereft of life,
      // he rests in peace.
      // He's pushing up the daisies. He expired and went
      // to meet his maker.
      // He's bleedin' demised. This worker is no more.
      // He's an ex-worker.

      // Oh, look, a shiny new one.
      // Norwegian Blue - beautiful plumage.
      var worker = cluster.fork();

      var newPID = worker.process.pid;
      var oldPID = deadWorker.process.pid;

      console.log('worker '+oldPID+' died.');
      console.log('worker '+newPID+' born.');
   });
}
else {
   //The normal part of our app.

The code above does two things – it forks the workers (one per CPU core), and it replaces a dead worker with a spiffy new one. This illustrates an ethos of disposability as described in 12factors. An app that can quickly be started and stopped, can also be replaced if it crashes without a hitch. Of course, you can analyze logs and try to figure out why a worker crashed, but you can do it on our own time, while the app continues to handle requests.

It can help to modify the server creation loop by printing out the process ID (‘process’ is a global variable implicitly defined – no need to require a module for it):

http.createServer(app).listen(app.get('port'), function() {
   console.log('Express server '+process.pid+
                ' listening on port ' + app.get('port'));
});

The sweet thing is that even though we are using multiple processes, they are all bound to the same port (3000 in this case). This is done by the virtue of the master process being the only one actually bound to that port, and a bit of white Node magic.

We can now modify our controller to pass in the PID to simple page and render it using Dust:

exports.simple = function(req, res) {
  res.render('simple', { title: 'Simple',
                         active: 'simple',
                         pid: process.pid });
};

This line in simple.dust file will render the process ID on the page:


This page is served by server pid={pid}.

When I try this code on my quad-core ThinkPad laptop running Windows 7, I get 8 workers:

Express server 7668 listening on port 3000
Express server 8428 listening on port 3000
Express server 8580 listening on port 3000
Express server 9764 listening on port 3000
Express server 7284 listening on port 3000
Express server 5412 listening on port 3000
Express server 6304 listening on port 3000
Express server 8316 listening on port 3000

If you reload the browser fast enough when rendering the page, you can see different process IDs reported on the page.

This sounds easy enough, as most things in Node.js do. But as usual, real life is a tad messier. After testing the clustering on various machines and platforms, the Node.js team noticed that some machines tend to favor only a couple of workers from the entire pool. It is a sad fact of life that for college assignments, couple of nerds end up doing all the work while the slackers party. But few of us want to tolerate such behavior when it comes to responding to our Web traffic.

As a result, starting from the upcoming Node version 0.12, workers will be assigned in a ’round-robin’ fashion. This policy will be the default on most machines (although you can defeat it by adding this line before creating workers):

    // Set this before calling other cluster functions.
    cluster.schedulingPolicy = cluster.SCHED_NONE;

You can read more about it in this StrongLoop blog post.

An interesting twist to clustering is when you deploy this app to the Cloud, using IaaS such as SoftLayer, Amazon EC2 or anything based on VMware. Since you can provision VMs with a desired number of virtual cores, you have two dimensions to scale your Node application:

You can ramp up the number of virtual cores allocated for your app. Your code as described above will stretch to create more workers and take advantage of this increase, but all the child processes will still be using shared RAM and virtual file system. If a rogue worker fills up the file system writing logs like mad, it will spoil the party for all. This approach is good if you have some CPU bottlenecks in your app.
You can add more VMs, fully isolating your app instances. This approach will give you more RAM and disk space. For JVM-based apps, this would definitely matter because JVMs are RAM-intensive. However, Node apps are much more frugal when it comes to resources, so you may not need as many full VMs for Node.

Between the two approaches, ramping up cores is definitely the cheaper option, and should be attempted first – it may be all you need. Of course, if you deploy your app to a PaaS like CloudFoundry or Heroku, all bets are off. It is possible that the code I have listed above is not even needed if you intend to host your app on a PaaS, because the platform will provide this behaviour out of the box. However, in some configurations this code will still be useful.

Example: Heroku gives you a single CPU dyno (virtualized unit of server power) with 512MB RAM for free. If you stay on one instance but pick a 2-core dyno with 1GB RAM (I know, still peanuts), that will cost you $34.50 at the time of writing (don’t quote me on the numbers, check them directly at the Heroku pricing page). Using two single core dynos will cost you the same. Between the two, JVM would probably benefit from the 2x dyno (with more RAM), while a single threaded Node app would benefit from two single core instances. However, our code gives you the freedom to use one 2X dyno and still use both cores. I don’t know if availability is the responsibility of the PaaS or yourself – drop me a line if you know the details.

It goes without saying that workers are separate processes, sharing nothing (SN). In reality, the workers will probably share storage via the attached resource, and storage itself can be clustered (or sharded) for horizontal scaling. It is debatable if sharing storage (even as attached resources) disqualifies this architecture from being called ‘SN’, but ignoring storage for now, your worker should be written to not cache anything in memory that cannot be easily recreated from a data source outside the worker itself. This includes auth or session data – you should rely on authentication schemes where the client sends you some kind of a token you can exchange for the user data with an external authentication authority. This makes your worker not unlike Dory from Pixar’s ‘Finding Nemo’, suffering from short term memory loss and introducing itself for each request. The flip side is that a new worker spawned after a worker death can be ready for duty, missing nothing from the previous interactions with the client.

In a sense, using clustering from the start builds character – you can never leave clustering as an afterthought, as something you will add later when your site becomes wildly popular and you need to scale. You may discover that you are caching too much in memory and need to devise schemes to share that information between nodes. It is better to get used to SN mindset before you start writing clever code that will bite you later.

Of course, this being Node, there is always more than one way to skin any particular cat. There is a history of clustering with Node, and also keeping Node alive (an uncaught exception can terminate your process, which is a bummer if only one process is serving all your traffic). In the olden days (i.e. couple of years ago), people had good experience with forever. It is simple and comes with a friendly license (MIT). Note though that forever only keeps your app alive, it is not clustering it. More recently, PM2 emerged as a more sophisticated solution, adding clustering and monitoring to the mix. Unfortunately, PM2 comes with an AGPL license, which makes it much harder to ship it with your commercial product (which means little if you are just having fun, but actually matters if you are a company of any size with actual paying customers installing your product on premise). Of course, if your whole business is hosted and you are not shipping anything to customers, you should be fine.

What I like about ‘cluster’ module is that it is part of the Node.js core library. We will likely add our own monitoring or look for ‘add-on’ monitoring that plays nicely with this module, rather than use a complete replacement like PM2. Regardless of what we do about monitoring, the clustering boilerplate will be a normal part of all our Node.js apps from now on.

8 thoughts on “In The Beginning, Node Created os.cpus().length Workers”

Add yours

Raine Virta says:

February 12, 2014 at 4:49 pm

I could be wrong but it would seem like 1X is a 4-core dyno.

~/S/limitless-everglades-3734 git:master ❯❯❯ heroku run node
Running `node` attached to terminal… up, run.5346
> require(‘os’).cpus().length
4

1. Dejan Glozic says:
  
  February 12, 2014 at 5:54 pm
  
  Possible – 1X says that it provides ‘1 CPU share’ but does not say how many physical cores per share. If you are right, than clustering would take full advantage of even a 1X dyno, so all is well :-).
  
Pingback: Release the Kraken.js – Part I | Dejan Glozic
Pingback: Micro-Service Fun – Node.js + Messaging + Clustering Combo | Dejan Glozic
Pingback: Micro-service APIs With Some Swag (part 1) | Dejan Glozic
Pingback: Node.js Apps and Periodic Tasks | Dejan Glozic
Pingback: The Year of Blogging Dangerously | Dejan Glozic
Pingback: HA All The Things | Dejan Glozic

	Ingrid M on Don’t Take Micro-Service…
	Kent on We Don’t Have MongoDB…
	Chloe Wang on Microservices On-Premises: An…
	Chloe Wang on How to be a Good Software…
	Mike Kaczmarski (@ka… on How to be a Good Software…

Share this:

8 thoughts on “In The Beginning, Node Created os.cpus().length Workers”

Add yours

Leave a comment Cancel reply