Sitting on the Node.js Fence

Sat_on_the_Fence_-_JM_Staniforth

I am a long standing fan of Garrison Keillor and his Prairie Home Companion. I fondly remember a Saturday in which he delivered his widely popular News From Lake Wobegon, and that contained the following conundrum: “Is ambivalence a bad thing? Well, yes and no”. It also accurately explains my feeling towards Node.js and the exploding Node community.

As I am writing this, I can hear the great late Joe Strummer singing Should I Stay or Should I Go in my head (“If I stay it will be trouble, if I go it will be double”). It is really hard to ignore the passion that Node.js has garnered from its enthusiastic supporters, and if the list of people you are following on Twitter is full of JavaScript lovers, Node.js becomes Kim Kardashian of your Twitter feed (as in, every third tweet is about it).

In case you were somehow preoccupied by, say, living life to the fullest to notice or care about Node.js, it is a server-side framework written in JavaScript sitting on a bedrock of C++, compiled and executed by Google’s Chrome V8 engine. A huge cottage industry sprung around it, and whatever you may need for your Web development, chances are there is a module that does it somewhere on GitHub. Two years ago, in a widely re-tweeted event, Node.js overtook Ruby on Rails as the most watched GitHub repository.

Recently, in a marked trend of maturation from exploration and small side projects, some serious companies started deploying Node into production. I heard everything about how this US Thanksgiving most traffic to Walmart.com was from mobile devices, handled by Node.js servers.

Then I heard from Nick Hart how PayPal switched from JEE to Node.js. A few months ago I stumbled upon an article about Node.js taking over the enterprise (like it or not, to quote the author). I am sure there are more heartwarming stories traded at a recent #NodeSummit. Still, not to be carried away, apart from mobile traffic, Node.js only serves a tiny panel for the Walmart.com web site, so more of a ‘foot in the door’ than a total rewrite for the desktop. Even earlier, LinkedIn engineering team blogged about their use of Node.js for LinkedIn Mobile.

It is very hard to cut through the noise and get to the core of this surge in popularity. When you dig deeper, you find multiple reasons, not all of them technical. That’s why discussions about Node.js sometimes sound to me like the Abott and Costello’s ‘Who’s on First’ routine. You may be discussing technology while other participants may be discussing hiring perks, or skill profile of their employees, or context switching. So lets count the ways why Node.js is popular and note how only one of them is actually technical:

  1. Node.js is written from the ground up to be asynchronous. It is optimized to handle problems where there is a lot of waiting for I/O. Processing tasks not waiting for I/O while others are queued up can increase throughput without adding processing and memory overhead of the traditional ‘one blocking thread per request’ model (or even worse, ‘one process per request’). The sweet spot is when all you need to do is lightly process the result of I/O and pass it along. If you need to spend more time to do some procedural number-crunching, you need to spawn a ‘worker’ child process, at which point you are re-implementing threading.
  2. Node.js is using JavaScript, allowing front-end developers already familiar with it to extend their reach into the server. Most new companies (i.e. not the Enterprise) have a skill pool skewed towards JavaScript developers (compared to the Enterprise world where Java, C/C++, C# etc. and similar are in the majority).
  3. New developers love Node.js and JavaScript and are drawn to it like moths to a candelabra, so having Node.js projects is a significant attraction when competing for resources in a developer-starved new IT economy.

Notice how only the first point is actually engineering-related. This is so prevalent that it can fit into a tweet, like so:

Then there is the issue of storage. There have been many libraries made available for Node.js to connect to SQL databases, but that would be like showing up at an Arcade Fire concert in a suit. Oh, wait, that actually happened:

Never mind then. Back to Node.js: why not going all the way and use something like MongoDB, storing JSON documents and using JavaScript for DB queries? So now not only do front end developers not need to chase down server-side Java guys to change the served HTML or JSON output, they don’t have to chase DBAs to change the DB models. My point is that once you add Node.js, it is highly likely you will revamp your entire stack and architecture, and that is a huge undertaking even without deadlines to spice up your life with a wallop of stress (making you park in a wrong parking spot in your building’s garage, as I did recently).

Now let’s try to apply all this to a talented team in a big corporation:

  1. The team is well versed in both Java (on the server) and JavaScript (on the client), and have no need to chase down Java developers to ‘add a link between pages’ as in the PayPal’s case. The team also knows how to write decent SQL queries and handle the model side of things.
  2. The team uses IDEs chock-full of useful tools, debuggers, incremental builders and other tooling that make for a great workflow, particularly when the going gets tough.
  3. The team needs to deliver something new on a tight schedule and is concerned about adding to the uncertainty of ‘what to build’ by also figuring out ‘how to build it’.
  4. There is a fear that things that the team has grown to expect as gravity (debugging, logging etc.) is still missing or immature.
  5. While there is a big deal made about ‘context switching’, the team does it naturally – they go from Java to JavaScript without missing a beat.
  6. There is a whole slew of complicated licensing reasons why some Open Source libraries cannot be used (a problem startups rarely face).

Mind you, this is a team of very bright developers that eat JavaScript for breakfast and feel very comfortable around jQuery, Require.js, Bootstrap etc. When we overlay the reasons to use Node.js now, the only one that still remains is handling huge number of requests without paying the RAM/CPU tax of blocking I/O, one thread per request.

As if things were not murky enough, Servlets 3.1 spec supported by JEE 7.0 now comes with asynchronous processing of requests (using NIO) and also protocol upgrade (from HTTP/S to WebSockets). These two things together mean that the team above has the option of using non-blocking I/O from the comfort of their Java environment. They also mean that they can write non-blocking I/O code that pushes data to the client using WebSockets. Both things are currently the key technical reasons to jump ship from Java to Node. I am sure Oracle added both things feeling the heat from the Node.js, but now that they are here, they may be just enough reason to not make the transition yet, considering the cost and overhead.

Update: after posting the article, I remembered that Twitter has used Netty for a while now to get the benefit of asynchronous I/O coupled by performance of JVM. Nevertheless, the preceding paragraph is for JEE developers who can now easily start playing with async I/O without changing stacks. Move along, nothing to see here.

Then there is also the ‘Facebook effect’. Social scientists have noticed the emergence of a new kind of depression caused by feeling bad about your life compared to carefully curated projections of other people’s lives on Facebook. I am yet to see a posting like “my life sucks and I did nothing interesting today” or “I think my life is passing me by” (I subsequently learned that Sam Roberts did say that in Brother Down, but he is just pretending to be depressed, so he does not count). Is it possible that I am only hearing about Node.js success stories, while failed Node.js projects are quietly shelved never to be spoken of again?

Well, not everybody is suffering in silence. We all heard about the famous Walmart memory leak that is now thankfully plugged. Or, since I already mentioned MongoDB, how about going knee-deep into BSON to recover your data after your MongoDB had a hardware failure. Or a brouhaha about the MongoDB 100GB scalability warning? Or a sane article by Felix Geisendörfer on when to use, and more importantly, when NOT to use Node.js (as Node.js core alumnus, he should know better than many). These are all stories of the front wave of adopters and the inevitable rough edges that will be filed down over time. The question is simply – should we be the one to do that or somebody can do the filing for us?

In case I sound like I am justifying reasons why we are not using Node.js yet, the situation is quite the opposite. I completely love the all-JavaScript stack and play with it constantly in my pet projects. In a hilarious twist, I am acting like a teenage Justin Bieber fan and the (younger) team lead of the aforementioned team needs to bring me down with a cold dose of reality. I don’t blame him – I like the Node.js-based stack in principle, while he would have to burn the midnight oil making it work at the enterprise scale, and debugging hard problems that are inevitable with anything non-trivial. Leaving aside my bad case of FOMO (Fear Of Missing Out), he will only be persuaded with a problem where Node.js is clearly superior, not just another way of doing the same thing.

Ultimately, it boils down to whether you are trying to solve a unique problem or just play with a new and shiny technology. There is a class of problems that Node.js is perfectly suitable for. Then there are problems where it is not a good match. Leaving your HR and skills reasons aside, the proof is in the pudding (or as a colleague of mine would say, ‘the devil is in the pudding’). There has to be a unique problem where the existing stack is struggling, and where Node.js is clearly superior, to tip the scales.

While I personally think that Node.js is a big part of an exciting future, I have no choice but to agree with Zef Hemel that it is wise to pick your battles. I have to agree with Zef that if we were to rebuild something for the third time and know exactly what we are building, Node.js would be a great way to make that project fun. However, since we are in the ‘white space’ territory, choosing tried and true (albeit somewhat boring) building blocks and focusing the uncertainty on what we want to build is a good tradeoff.

And that’s where we are now – with me sitting on the proverbial fence, on a constant lookout for that special and unique problem that will finally open the Node.js floodgates for us. When that happens, you will be the first to know.

© Dejan Glozic, 2013

A Guide To Storage for ADD Types

800px-FACTS_ubt_2

If you read my bio on the About page, you can see that I called myself an ‘architect’. I made an explicit promise to apologize for that in a dedicated post at some point. According to a 2003 article by Martin Fowler, I may never be able to work at ThoughtWorks if this IBM thing does not work out, unless Dave Rice has changed his mind in the ensuing 10 years and is not in a grumpy mood any more. Now, if you check my tag line, it says that I care 30% about design, 70% about code, and there are no percentages left for caring how things are stored. As far as I am concerned, they can be dumped in the attic, next to a painting of me getting younger.

Nevertheless, as an architect I should care about storage, and that means going beyond putting obligatory cylinder-shaped objects on my slides. How did I ever get this far without having to worry about it? Through the genius of specialization. You see, when you are participating in a development of a large, multi-component Web applications customers install on premise, through the sheer economy of scale it makes sense that I worry about UIs and somebody else worries about the storage gore. You get to interact with nice objects that know how to persist themselves. You are not lazy, you are efficient. But the world is rapidly changing, going towards the hosted deployment, and from big monolithic web applications to collections of apps running inside a warm bosom of CloudFoundry, Heroku and other cloud platforms. In this architecture, we all need to be polymaths and fret about all aspects of modern app development, soup to nuts.

On the cloud, storage is considered ‘implementation detail’, something you are directed at via attached resources. Adam Wiggins has channeled his inner L. Ron Hubbard by establishing his Church of 12 Factors (kudos for pulling it off without one reference to space ships), and using storage as an attached resource is one of the 12 commandments. What does all this mean? It means that in this particular game of specialization chicken, I lost and have to deeply care about storage for the first time. I don’t need to install and manage databases (cloud platforms have a buffet to chose from), but nobody is going to make that network request for me any more.

And would you look at that? Just when I needed to care about databases, somebody threw a big friggin’ rock into the sleepy storage pond. For many years, you could use any particular DB type, as long as it was relational, where people like to shout in their SQL commands. Then Johan Oskarsson wanted to organize an event to discuss open source distributed databases, and used a #nosql hashtag. The rest is, as they say, history, and we now have to consider the weird and wonderful world of schemaless storage as well. Particularly if we care about clustering, because everybody knows that “Mongo DB is Web Scale” (you will need to Google this yourself, those foul-mouthed bears are NSFW).

His conflicted thoughts on architects notwithstanding, I found Martin Fowler a great resource in my deranged binge of DB knowledge drinking. His introduction to NoSQL databases is great, and so is the 1 hour video from the GOTO Aarhus conference linked from that page. But my general feeling of catching up with the world was akin coming to a party 6 hours late, where most of the food is gone, some guests are not feeling well (to put it mildly) and through their personal experience I know which cocktails not to drink and why Vodka and Red Bull are not such great bedfellows.

Here is a sample of what I have learned, in no particular order. This is a big topic so consider this part #1 of a longer article. If you are like me and you like to spend your time on the ‘outside in’ aspect of your apps, this may help you pick your storage poison and be done with it. If you are an expert in DBs, feel free to point at me and laugh in contempt:

  1. Objects in the relational view mirror are harder than they appear. Objects are round, tables are square, and I don’t need to work hard to turn that into a tired phrase, don’t I? ORM can never be perfect, so there is no use arguing that your approximation is 2% closer to the asymptote than mine. Throwing more resources at it can turn so bad that Ted Neward declared it Vietnam of Computer Science in 2004. To the phrase ‘all is fair in love and war’ we can add ‘and also in ORM’ and be done with it. One popular solution is to do shallow mapping of your key properties to columns to allow for fast queries and stuff the rest of the aggregate data hierarchy into a serialized LOB. If you really, really need to query the data in the LOB later, you can build an inverted index like folks at FriendFeed did and blogged about in 2009.
  2. Don’t fake schemaless storage by using EAV (entity-attribute-value) tables. Bill Karwin was on a mission at some point to convince everybody who cared that they sucked. I can only add that if you truly need that kind of open-ended freedom, RDF triple stores seem to be a better fit. And you get to use SPARQL query language, which always makes me laugh because it reminds me of Mr. Sparkle from a Simpsons episode.
  3. Do store trees of nodes in tables. The curse of EAVs is limited to aggregate data, not to uniform trees of nodes. It is OK if some rows in well designed tables have additional columns for capturing parent-child relationships. Joe Celko seems to be a go to authority on adjacency lists so it is wise to start there, then fan out as needed.
  4. We are moving away from databases as integration platforms. DBAs rise to the level of demigods was facilitated by the fact that a well designed RDB can serve as an integration platform for many applications written to slice and dice the data in new and unexpected ways. Today, services are the portals where we go for our data, and as long as interchange formats are stable, databases can be relegated to the implementation detail. Both RDF and JSON-LD are designed to give you a level of ‘unexpected data utility’ of the web of linked data that no single database can give you.
  5. RDBs are hard to scale. The whole movement towards NoSQL databases was inspired by their affinity to data partitioning (sharding for capacity scaling and/or replication for fault tolerance). On a cloud platform, not only that your app can be clustered as demand grows, but so can an array of NoSQL DB nodes if you start producing a ton of data.
  6. NoSQL databases have no JOINs, and have dropped ACID. According to the CAP theorem, a distributed system cannot guarantee all three of: consistency, availability and partition tolerance. This leads to a system where your app is eventually consistent, but can occasionally have brief intervals of stupidity. There is anecdotal evidence of NoSQL databases loosing data in the real world deployments. It is disconcerting to view your database as an airline that can loose your luggage every once in a while. Now you may need a second database as a backup in case your first database partied too hard and cannot quite remember where all your JSON documents went.
  7. NoSQL databases are attractive to all-JS outfits. The flip side – imagine a system where you write your client side using a collection of JavaScript toolkits, use XHR to send objects as JSON to the server, which is itself written using Node.js. After preparing your data you send it to MongoDB, as JSON again, where it will be stored as a JSON document. You can then use JavaScript to apply map/reduce and fetch interesting projections on the data, or query on JSON properties, or simply fetch the entire JSON document back when you need all the data. Apart from clustering, NoSQL DBs are just a nicer environment for app developers, particularly the recently weaned ones that have no attention span for strongly typed languages.

This can give you a flavor of where my head is right now. I am just trying to be practical about it. I know that there is no free lunch, but ‘kids these days’ know how to make things new and shiny. I know an adult in me should still prefer RDBs for their battle hardened dependability and predictability, but look at Mongoose JS library and tell me you are not swayed by the simplicity (and nobody is yelling at you):

    var mongoose = require('mongoose');
    mongoose.connect('mongodb://localhost/test');

    var Cat = mongoose.model('Cat', { name: String });
    var kitty = new Cat({ name: 'Zildjian' });
    kitty.save(function (err) {
       if (err) // ...
          console.log('meow');
    });

This is as close as I have come to replicating that ‘objects that know how to persist themselves’ feeling of yore, minus the quagmire of ORM. It is hard to resist the siren call of NoSQL. But I think the biggest lesson of my forray into the wonderful world of storage is that today you can change your mind about it. As long as you truly hide your storage choices as implementation detail, you can switch later, so stop agonizing, pick one and move on to the more interesting areas, like user experience.

© Dejan Glozic, 2013