Nodeconf.eu 2014: Trip Report (Part 1)

nodeconfeu-1

Shady’s back, tell a friend! Fresh from the green grass of Ireland where I attended (and presented) at this year’s nodeconf.eu, I am now back to report on it as promised.

This year’s conference is a second instance of a format started last year by Cian Ó Maidín and the friends from Near Form. The goal is to carefully curate talks across the Node.js community to ensure quality over quantity. I had a great pleasure attending the conference, particularly as I was one of the ‘carefully curated’ speakers this year. What adds a particular flare and sparkle to the event is the location – an actual Irish castle in Waterford, a south-eastern town of Ireland and the oldest of them all.

Getting to the Waterford castle involves an excerpt from the movie Plains, Trains and Automobiles – an itinerary that involves flying to Dublin, catching a bus to Waterford, then switching to a taxi that at some point needs to go over the river Suir via a private car ferry to get to the castle on the island.

The event started in the evening with a welcome reception that involved circus acts – a very ‘tall’ lady (see above) and a guy literately playing with fire.

nodeconfeu-2After-dinner entertainment included a lovely young lady playing a cello and singing in a way that fuses transitional Irish music and modern sensibilities – perfect for Irish hipsters (they do exist, don’t they?). Unfortunately her name escapes me – if you know it (and have a link to her home page), please drop me a comment (Edit: @fritzvd was kind enough to point out her name is Alana Henderson – thanks!).

nodeconfeu-3

Nodeconf.eu was held in a club house of the nearby Golf Club (part of the same Waterford Castle resort). For the next three days, our master of ceremonies was Mikeal, who was well known to most attendees (just look at your apps and if you require ‘request‘, you know him too).

nodeconfeu-4
Mikeal tells people to sit down so that we can start.

Conference opened with a welcome addressed by Cian, outlining what is awaiting us for the next three days, and upholding the conference Code of Conduct, which was wonderfully short:

  1. No harassment of any kind is allowed
  2. Please don’t fall into the Suir river (apparently somebody did not long ago)

The technical part of the conference started with Node’s own TJ Fontaine, Node.js core lead, with his ‘State of Node’ address. TJ posed that the industry is in a state that makes this a perfect timing for a JavaScript framework. He also re-iterated some of the key tenets of Node.js philosophy, including non-blocking I/O and ‘do one thing, and do it well’ ethos. Finally, he stressed that the evolution of Node.js is not about what other languages or frameworks are doing, but what is good for Node.js and JavaScript themselves.

nodeconfeu-6
TJ Fontaine delivers the State of Node address.

NearForm’s own Richard Rodger kicked off the Micro-Services block with the accumulated experience of deploying micro-services (and in particular Node.js micro-services) in production. He highlighted some natural advantages (scalability, flexibility of deployment) but also disadvantages (added latency). From his real-world experience, he concluded that business logic should be in the services (no core monolith), that developers should resist Tower of Babel (the temptation to use many languages and stacks) and to assume you can design upfront (services are ‘discovered’, not designed). Nevertheless, he reiterated one of the strong suits of micro-services – the fact that you can change your mind, swap databases mid-project (or anything else).

nodeconfeu-7
NearForm’s Richard Rodger kicks off the Micro-Services block.

Clifton Cunningham focused on a very hard problem of micro-services – the fact that while multiple services are responsible for various parts of the system, pages still need to share some content. He enumerated options used in the past – client-side stitching using Ajax, front-end server, Server Side Includes (SSIs), Edge-side includes (esi) etc. He presented his team’s take on the problem – an open-source module called Compoxure that has a number of advanced features to deal with the problems normally faced in production – performance, latency, failures and authentication. He also addressed the problem of delivering CSS and JS for this shared content across the micro-services.

nodeconfeu-8
Clifton Cunnigham on Compoxure.

Then it was time for me. My take was how my team in IBM DevOps Services decided to pursue micro-services in general, and Node.js in particular as we started evolving our system. I don’t need to go into details – this whole blog is a weekly chronicle of our journey. I added a twist from a position of doing Node.js in a large enterprise – the need to legally vet a large number of Node.js modules (causing legal to hate us) and the complexity of deploying a large number of services in a secure manner (causing OPS to hate us).

nodeconfeu-9
Dejan Glozic on Node.js micro-services at IBM (photo courtesy of Louis Faustino via SmugMug).

The last speaker in the Micro-Services block was Fred George, bringing his wealth of experiences building micro-services in a number of real-world projects. He brought forward several examples of projects using different technologies and architecture, unified in the fact that most of the time an event based (asynchronous) approach was a better fit than synchronous. Out of that experience he extracted a concept of a system where all services receive all the messages all the time, but the messages are semantically classifiable, forming ‘rapids’ (all events), ‘rivers’ (themed events) and ‘ponds’ (state/history).

nodeconfeu-10
Fred George on rapids, rivers and ponds.

After a coffee and JS cupcakes break, we switched to the ‘Production’ track, starting with Brian McCallister from Groupon walking us from the familiar experience of a company whose monolith has become so large and unwieldy that it eventually made adding new features virtually impossible. Groupon slowly and carefully migrated their two large installations (North America and Europe) to a micro-service system using Node.js, resulting in unblocking of further evolution, with performance and scalability improvements tossed into the mix. This was in a sense a partial continuation of the micro-service track, considering that Groupon’s new system shares many traits with the ones we talked about in the preceding talks.

nodeconfeu-11
Brian McCallister on building the I-Tier system at Groupon.

PayPal’s own Jeff Harrell zeroed on a number of anti-patterns of implementing Node.js in a real-world project in PayPal. PayPal made a wholesale transformation to Node.js that is still ongoing, and their large team contributed a number of these anti-patterns. Among them were: bringing baggage from the previous projects, re-creating monolithic applications using Node.js, Googling ‘how to do X in JavaScript’, wrapping everything in promises, sloppy async code, using Node.js for everything, and ignoring the ecosystem.

PayPal's Jeff Harrell on Node.js real world anti-patterns.
PayPal’s Jeff Harrell on Node.js real world anti-patterns.

The last speaker for the day was Aman Kohli from Citi Bank, bringing to us the experience of deploying Node.js to provide back-end services for mobile apps in an environment that is everything but forgiving when it comes to security, adherence to regulations and process. According to Aman, they chose Node because of the async event model, being ideally suited for mobile and sensor apps, the fact that it was approved for internal usage, that it required fewer controls, and due to the good success they had with using Hapi framework for building mobile API services.

Aman Kohli on using Node.js for mobile services in Citi Bank.
Aman Kohli on using Node.js for mobile services in Citi Bank.

At this point it was time to break for lunch. From several choices I picked to attend Kraken.js workshop for my afternoon activity, where I could pick the brains of Jeff Harrell and Erik Toth from PayPal on the philosophy and the plans for this open source suite we already use in our micro-services.

Evening R&R was provided to us with a combination of Irish whiskey tasting (not bad at all, but still prefer Scottish single malt) and a great local folk band treating us with a mix of traditional and Irish-treated covers.

Continue to the part 2 of the report.

© Dejan Glozic, 2014

Advertisements

The Year of Blogging Dangerously

391px-Extremely_yummy_raspberry_cheesecake

Wow, has it been a year already? I am faking surprise, of course, because WordPress has notified me well ahead of time that I need to renew my dejanglozic.com domain. So in actuality I said ‘wow, will it soon be a year of me blogging’. Nevertheless, the sentiment is genuine.

It may be worthwhile to look back at the year, if only to reaffirm how quickly things change in this industry of ours, and also to notice some about-faces, changes of direction and mind.

I started blogging in the intent to stay true to the etymological sense of the word ‘blog’ (Web log). As a weekly diary of sorts, it was supposed to chronicle trials and tribulations of our team as it boldly goes into the tumultuous waters of writing Web apps in the cloud. I settled on a weekly delivery, which is at times doable, at other times a nightmare. I could definitely do without an onset of panic when I realize that it is Monday and I forgot to write a new entry.

Luckily we have enough issues we deal with daily in our work to produce enough material for the blog. In that regard, we are like a person who just moved into a new condo after his old apartment went up in flames and went to Ikea. If an eager clerk asks him ‘what do you need in particular’, his genuine answer must be ‘everything – curtains, rugs, new mattress, a table, chairs, a sofa, a coffee table …’.

At least that’s how we felt – we were re-doing everything in our distributed system and we were able to re-use very little from our past lives, having boldly decided to jump ahead as far as possible and start clean.

Getting things out of the system

That does not mean that the blog actually started with a theme or a direction. In the inaugural post The Turtleneck and The Hoodie, I proudly declared that I care both about development AND the design and refuse to choose. But that is not necessarily a direction to sustain a blog. It was not an issue for a while due to all these ideas that were bouncing in my head waiting to be written down. Looking back, I think it sort of worked in a general-purpose, ‘good advice’ kind of way. Posts such as Pulling Back from Extreme AJAX or A Guide to Storage for ADD Types were at least very technical and based on actual research and hands-on experience.

Some of the posts were just accumulated professional experience that I felt the need to share. Don’t Get Attached to Your Code or Dumb Code Good, Smart Code Bad were crowd pleasers, at least in the ‘yeah, it happened to me too’ way. Kind of like reading that in order to lose weight you need to eat smart and go outside. Makes a lot of sense except for the execution, which is the hard part.

344px-'Be_smart..Act_dumb^_-_NARA_-_513932

Old man yells at the cloud

Funnily enough, some of my posts, after using up all the accumulated wisdom to pass on, sound somewhat cranky in hindsight. I guess I disagreed with some ideas and directions I noticed, and the world ignored my disagreement and continued, unimpressed. How dare people do things I don’t approve of!

Two cranky posts that are worth highlighting are Swimming Against the Tide, in which I am cranky regarding client side MVC frameworks, and Sitting on the Node.js Fence, in which I argue with myself on pros and cons of Node.js. While my subsequent posts clearly demonstrate that I resolved the latter dilemma and went down the Node.js route hook, line and sinker, I am still not convinced that all that JavaScript required to write non-trivial Single Page Apps (SPAs) is a very good idea, particularly if you have any ambition to run them on mobile devices. But it definitely sounds funny to me now – as if I was expressing an irritated disbelief that, after publishing all the bad consequences of practicing extreme Ajax, people still keep doing it!

I heart Node.js

Of course, once our team went down Node.js route (egged on and cajoled by me), you could not get me to shut up about it. In fact, the gateway drug to it was my focus on templating solutions, and our choice of Dust.js (LinkedIn fork). By the way, it is becoming annoying to keep adding ‘LinkedIn fork’ all the time – that’s the only version that is actively worked on anyway.

Articles from this period are more-less setting the standard for my subsequent posts: they are about 1500 words long, have a mix of outgoing links, a focused technical topic, illustrative embedded tweets (thanks to @cra who taught me how not to embed tweets as images like a loser). As no story about Node.js apps is complete without Web Sockets and clustering, and both were dully covered.

Schnorr_von_Carolsfeld_Bibel_in_Bildern_1860_006

I know micro-services!

Of course, it was not until I went to attend NodeDay in February that a torrent of posts on micro-services was unleashed. The first half of 2014 was all ablaze with the posts and tweets about micro-services around the world anyway, which my new Internet buddy Adrian Rossouw dully documented in his Wayfinder field guide. It was at times comical to follow food fights about who will provide the bestest definition of them all:

If you follow a micro-services tag for my blog, the list of posts is long and getting longer every week. At some point I will stop tagging posts with it, because if everything is about them, nothing is – I need to be more specific. Nevertheless, I am grateful for the whole topic – it did after all allow me to write the most popular post so far: Node.js and Enterprise – Why Not?

monty-1920-1200-wallpaper

What does the future hold?

Obviously Node.js, messaging and micro-services will continue to dominate our short-term horizon as we are wrestling with them daily. I spoke about them at the recent DevCon5 in NYC and intend to do the same at the upcoming nodeconf.eu in September.

Beyond that, I can see some possible future topics (although I can’t promise anything – it is enough to keep up as it is).

  • Reactive programming – I have recently presented at the first Toronto Reactive meetup, and noticed this whole area of Scala and Akka that is a completely viable alternative to implement micro-services and scalable distributed systems that confirm to the tenets of Reactive Manifesto. I would like to probe further.
  • Go language – not only because TJ decided to go that route, having an alternative to Node.js while implementing individual micro-services is a great thing, particularly for API and back-end services (I still prefer Node.js for Web serving apps).
  • Libchan – Docker’s new project (like Go channels over the network) currently requires Go (duh) but I am sure Node.js version will follow.
  • Famo.us – I know, I know, I have expressed my concerns about their approach, but I did the same with Node.js and look at me now.
  • Swift – I am a registered XCode developer and have the Swift-enabled update to it. If only I could find some time to actually create some native iOS apps. Maybe I will like Swift more than I do Objective-C.

I would like to read this post in a year and see if any of these bullets panned out (or were instead replaced with a completely different list of even newer and cooler things). In this industry, I would not be surprised.

Whatever I am writing about, I would like to thank you for your support and attention so far and hope to keep holding it just a little bit longer. Now if you excuse me, I need to post this – I am already late this week!

© Dejan Glozic, 2014

Node.js Apps and Periodic Tasks

397px-Kitchen_alarm_clock

When working on a distributed system of any size, sooner or later you will hit a problem and proclaim ‘well, this is a first’. My second proclamation in such situations is ‘this is a nice topic for the blog’. Truth to form, I do it again, this time with the issue of running periodic tasks, and the twist that clustering and high availability efforts add to the mix.

First, to frame the problem: a primary pattern you will surely encounter in a Web application is Request/Response. It is a road well traveled. Any ‘Hello, World’ web app is waving you a hello in a response to your request.

Now add clustering to the mix. You want to ensure that no matter what is happening to the underlying hardware, or how many people hunger for your ‘hello’, you will be able to deliver. You add more instances of your app, and they ‘divide and conquer’ the incoming requests. No cry for a warm reply is left unanswered.

Then you decide that you want to tell a more complex message to the world because that’s the kind of person you are: complex and multifaceted. You don’t want to be reduced to a boring slogan. You store a long and growing list of replies in a database. Because you are busy and have no time for standing up databases, you use one hosted by somebody else, already set up for high availability. Then each of your clustered nodes talk to the same database. You set the ‘message of the day’ marker, and every node fetches it. Thousands of people receive the same message.

Because we are writing our system in Node.js, there are several ways to do this, and I have already written about it. Of course, a real system is not an exercise in measuring HWPS (Hello World Per Second). We want to perform complex tasks, serve a multitude of pages, provide APIs and be flexible and enable parallel development by multiple teams. We use micro-services to do all this, and life is good.

I have also written about the need to use messaging in a micro-service system to bring down the inter-service chatter. When we added clustering into the mix, we discovered that we need to pay special attention to ensure task dispatching similar to what Node.js clustering or proxy load balancing is providing us. We found our solution in round-robin dispatching provided by worker queues.

Timers are something else

Then we hit timers. As long as information flow in a distributed system is driven by user events, clustering works well because dispatching policies (most often round-robin) are implemented by both the Node.js clustering and proxy load balancer. However, there is a distinct class of tasks in a distributed system that is not user-driven: periodic tasks.

Periodic tasks are tasks that are done on a timer, outside of any external stimulus. There are many reasons why you would want to do it, but most periodic tasks service databases. In a FIFO of a limited size, they delete old entries, collapse duplicates, extract data for analysis, report them to other services etc.

For periodic tasks, there are two key problems to solve:

  1. Something needs to count the time and initiate triggers
  2. Tasks need to be written to execute when initiated by these triggers

The simplest way to trigger the tasks is known by every Unix admin – cron. You set up a somewhat quirky cron table, and tasks are executed according to the schedule.

The actual job to execute needs to be provided as a command line task, which means your app that normally accesses the database needs to provide additional CLI entry point sharing most of the code. This is important in order to keep with the factor XII from the 12-factors, which insists one-off tasks need to share the same code and config as the long running processes.

 

There are two problems with cron in the context of the cloud:

  1. If the machine running cron jobs malfunctions, all the periodic tasks will stop
  2. If you are running your system on a PaaS, you don’t have access to the OS in order to set up cron

The first problem is not a huge issue since these jobs run only periodically and normally provide online status when done – it is relatively easy for an admin to notice when they stop. For high availability and failover, Google has experimented with a tool called rcron for setting up cron over a cluster of machines.

Node cron

The second problem is more serious – in a PaaS, you will need to rely on a solution that involves your apps. This means we will need to set up a small app just to run an alternative to cron that is PaaS friendly. As usual, there are several options, but node-cron library seems fairly popular and has graduated past the version 1.0. If you run it in an app backed by supervisor or PM2, it will keep running and executing tasks.

You can execute tasks in the same app where node-cron is running, providing these tasks have enough async calls themselves to allow the event queue to execute other callbacks in the app. However, if the tasks are CPU intensive, this will block the event queue and should be extracted out.

A good way of solving this problem would be to hook up the app running node-cron to the message broker such as RabbitMQ (which we already use for other MQ needs in our micro-service system anyway). The only thing node-cron app will do is publish task requests to the predefined topics. The workers listening to these topics should do the actual work:

node-cron

The problem with this approach is that a new task request can arrive while a worker has not finished running the previous task. Care should be taken to avoid workers stepping over each other.

Interestingly enough, a hint at this approach can be found in aforementioned 12-factors, in the section on concurrency. You will notice a ‘clock’ app in the picture, indicating an app whose job is to ‘poke’ other apps at periodic intervals.

There can be only one

A ‘headless’ version of this approach can be achieved by running multiple apps in a cluster and letting them individually keep track of periodic tasks by calling ‘setTimeout’. Since these apps share nothing, they will run according to the local server clock that may nor may not be in sync with other servers. All the apps may attempt to execute the same task (since they are clones of each other). In order to prevent duplication, each app should attempt to write a ‘lock’ record in the database before starting. To avoid deadlock, apps should wait random amount of time before retrying.

Obviously, if the lock is already there, apps should fail to create their own. Therefore, only one app will win in securing the lock before executing the task. However, the lock should be set to expire in a small multiple of times required to normally finish the task in order to avoid orphaned locks due to crashed workers. If the worker has not crashed but is just taking longer than usual, it should renew the lock to prevent it from expiring.

The advantage of this approach is that we will only schedule the next task once the current one has finished, avoiding the problem that the worker queue approach has.

Note that in this approach, we are not achieving scalability, just high availability. Of the several running apps, at least one app will succeed in securing the lock and executing the task. The presence of other apps ensures execution but does not increase scalability.

I have conveniently omitted many details about writing and removing the lock, retries etc.

Phew…

I guarantee you that once you start dealing with periodic tasks, you will be surprised with the complexity of executing them in the cloud. A mix of cloud, clustering and high availability makes running periodic tasks a fairly non-trivial problem. Limitations of PaaS environments compound this complexity.

If you visit TJ’s tweet above, you will find dozen of people offering alternatives in the replies (most of them being variations of *ron). The plethora of different solutions will be a dead giveaway that this is a thorny problem. It is not fully solved today (at least not in the context of the cloud and micro-service systems), hence so many alternatives. If you use something that works well for you, do share in the ‘Reply’ section.

© Dejan Glozic, 2014

Is There Life After TJ?

tjh

What is going to happen now?
Nothing. We will be sad for a while, then we will move on.

 

Mad Men, Don Drapper discusses Kennedy assassination with kids

Every once in a while a event occurs that pushes regular programming aside. If you are CNN, it happens with such annoying regularity that completely obviates the meaning of the phrase “breaking news”. Nevertheless, if you are at least a bit involved with the Node.js community, the news that TJ Holowaychuk turned his back on Node in favour of Go restores breaking news’ original meaning.

Up to this point Node.js had a bit of a problem with TJs as soon as TJ Fontaine assumed the post of Node.js lead. You had to add the last name to disambiguate, leading to a no TJ slide at a recent NodeDay. Before Fontaine, the real slim TJ authored such a monstrous number of Node.js modules (some of which, like Express, Jade and Mocha, being wildly popular), that there is a semi-serous thread on Quora that he is not a real person. Proof: total absence from the conference-industrial complex, no pictures, no videos, and inhumane coding output that suggest an army of ghost coders. Also, an abrupt change in import declaration style in 2013.

My first encounter with TJ (metaphorically speaking) was in the summer of 2012, when we evaluated Node.js for our project. Our team lead noticed that an unusually large cluster of modules needed to convert us from a servlet/JSP back end was authored by a single person. There was something unnerving about moving from software written by an army of big company developers to something written by a kid with an Emo haircut and a colorful t-shirt (look like TJ was too busy coding to notice that the official look is now beard and plaid shirt, and does not need prescription glasses to care about Warby Parker).

Unlike many people, I am also not completely gaga about his coding and documentation style. I never warmed up to Jade and Stylus – I spent many an hour tearing my hair out about some unexpected Jade behavior that turned out to be one tab or space too many. Dust.js we now use for our templating needs is much less susceptible to magic in the name of DRY.

Similarly, Express documentation is wonderfully minimalistic until you need some clarification, at which point you would gladly trend it for verbose but useful. Some of the Express APIs suffer from the same problem. You can ‘use’ all kinds of object types, and while ‘set’ is a setter, ‘get’ defines a route for an HTTP endpoint. Not that it matters anyway – an army of developers wrote the missing manual on Stack Overflow.

All this does not diminish the importance and the imprint that TJ left on the Node.js community. The Hacker News discussion was long and involved, with many participants trying to guess the ‘real reason’ behind the switch. Zef Hemel contemplated what he considers a ‘march toward Go’. Other people with an investment in Node.js commented as well:

[tweet 485132339362529280 align=’center’]

For anyone making a transition to Node.js and feeling the cold sweat of ‘buyer’s remorse’, I am offering the following opinion. In a brazen display of self-promotion, I will quote none other than myself from one of my previous posts:

It was always hard for me to make choices. In the archetypical classification of ‘satisficers’ and ‘maximizers’, I definitely fall into the latter camp. I research ad nauseum, read reviews, measure carefully, take everything into account, and then finally make a move, mentally exhausted. Then I burst into cold sweat of buyer’s remorse, or read a less than glowing review of the product I just purchased, and my happiness is dimished. It sucks to be me.

My point here is as follows: at any point in time, there will be many ways to solve your particular software problem. Chances are, you have spent a lot of time researching before pulling the plug on Node.js. You have made the switch, and your app or site is working well, purring like a kitten. I don’t think your app will suddenly start misbehaving just because TJ got a case of code rage. In fact, the signs were on the wall for quite some time because his twitter feed was all ‘Node.js errors this, Node.js errors that’ – lots of frustration about some very arcane details I didn’t understand at all.

If anything, this case teaches us a lesson that it does not pay to search for a language Messiah – a language/platform to end all languages/platforms. As it became plainly obvious, the future is polyglot, and Go is appearing as a growing alternative for writing apps for distributed systems. Even by TJ’s admission, Node.js is still a great choice for Web sites, as well as API services. For example, a whole class of page rendering template libraries are hard to replicate in Go (Mustache, Handlebars, Dust), and so are the building and testing solutions (Grunt, Mocha, Jasmine).

Node.js was never positioned as a platform for solving all kinds of computational problems, and even before Go, distributed systems had apps written in stacks other than Node.js. I think attempts to render Node.js as a solution for every problem were always misguided and no serious member of the Node.js community made such claims. As long as you use Node.js for what it is very good for – page serving apps and API services with a lot of I/O (and not too many CPU intensive, long running tasks), there is no need for TJH to somehow make your app less ‘correct’.

If, on the other hand, you are impressionable and must check out Go right away, by all means do, but if you keep putting all your eggs in a new and shiny basket, people like TJH will continue to stress you out as they move on to another new and shiny language or platform.

If you feel a need to solve a problem that Node.js is failing to solve or is not suitable for, and Go or Scala or any other solution fit the bill, by all means go ahead and use them. Otherwise, move along, people – nothing to see here. We wish TJ all the best in the Go future, while we continue to focus on our own problems and challenges.

I guess that means a ‘Yes, of course’ to the question in the title.

© Dejan Glozic, 2014

SoundCloud is Reading My Mind

Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, Cincinnati & New York, 1900
Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, 1900

“Bad artists copy. Good artists steal.”

– Pablo Picasso

It was bound to happen. In the ultra-connected world, things are bound to feed off of each other, eventually erasing differences, equalizing any differential in electric potentials between any two points. No wonder the weirdest animals can be found on islands (I am looking at you, Australia). On the internet, there are no islands, just a constant primordial soup bubbling with ideas.

The refactoring of monolithic applications into distributed systems based on micro-services is slowly becoming ‘a tale as old as time’. They all follow a certain path which kind of makes sense when you think about it. We are all impatient, reading the first few Google search and Stack Overflow results ‘above the fold’, and it is no coincidence that the results start resembling majority rule, with more popular choices edging out further and further ahead with every new case of reuse.

Luke Wroblewski of Mobile First fame once said that ‘two apps do the same thing and suddenly it’s a pattern’. I tend to believe that people researching the jump into micro-services read more than two search results, but once you see certain choices appearing in, say, three or four stories ‘from the trenches’, you become reasonably convinced to at least try them yourself.

If you were so kind as to read my past blog posts, you know some of they key points of my journey:

  1. Break down a large monolithic application (Java or RoR) into a number of small and nimble micro-services
  2. Use REST API as the only way these micro-services talk to each other
  3. Use message broker (namely, RabbitMQ) to apply event collaboration pattern and avoid annoying inter-service polling for state changes
  4. Link MQ events and REST into what I call REST/MQTT mirroring to notify about resource changes

Then this came along:

As I was reading the blog post, it got me giddy at the realization we are all converging on the emerging model for universal micro-service architecture. Solving their own unique SoundCloud problems (good problems to have, if I may say – coping with millions of users falls into such a category), SoundCloud developers came to very similar realizations as many of us taking a similar journey. I will let you read the post for yourself, and then try to extract some common points.

Stop the monolith growth

Large monolithic systems cannot be refactored at once. This simple realization about technical debt actually has two sub-aspects: the size of the system at the moment it is considered for a rewrite, and the new debt being added because ‘we need these new features yesterday’. As with real world (financial) debt, the first order of business is to ‘stop the bleeding’ – you want to stop new debt from accruing before attempting to make it smaller.

At the beginning of this journey you need to ‘draw the line’ and stop adding new features to the monolith. This rule is simple:

Rule 1: Every new feature added to the system will from now on be written as a micro-service.

This ensures that precious resources of the team are not spent on making the monolith bigger and the finish line farther and farther on the horizon.

Of course, a lot of the team’s activity involves reworking the existing features based on validated learning. Hence, a new rule is needed to limit this drain on resources to critical fixes only:

Rule 2: Every existing feature that requires significant rework will be removed and rewritten as a micro-service.

This rule is somewhat less clear-cut because it leaves some room for the interpretation of ‘significant rework’. In practice, it is fairly easy to convince yourself to rewrite it this way because micro-service stacks tend to be more fun, require fewer files, fewer lines of code and are more suitable for Web apps today. For example, we don’t need too much persuasion to rewrite a servlet/JSP service in the old application as a Node.js/Dust.js micro-service whenever we can. If anything, we need to practice restraint and not fabricate excuse to rewrite features that only need touch-ups.

US_Beef_cuts_svg
Micro-services as BBQ. Mmmmm, BBQ…

An important corollary of this rule is to have a plan of action ahead of time. Before doing any work, have a ‘cut of beef’ map of the monolith with areas that naturally lend themselves to be rewritten as micro-services. When the time comes for a significant rework in one of them, you can just act along that map.

As is the norm these days, ‘there’s a pattern for that’, and as SoundCloud guys noticed, the cuts are along what is known as bounded context.

Center around APIs

As you can read at length on the API evangelist’s blog, we are transforming into an API economy, and APIs are becoming a central part of your system, rather than something you tack on after the fact. If you could get by with internal monolith services in the early days, micro-services will force you to accept APIs as the only way you communicate both inside your system and with the outside world. As SoundCloud developers realized, the days of integration around databases are over – APIs are the only contact points that tie the system together.

Rule 3: APIs should be the only way micro-services talk to each other and the outside world.

With monolithic systems, APIs are normally not used internally, so the first APIs to be created are outward facing – for third party developers and partners. A micro-service based system normally starts with inter-service APIs. These APIs are normally more powerful since they assume a level of trust that comes from sitting behind a  firewall. They can use proprietary authentication protocols, have no rate limiting and expose the entire functionality of the system. An important rule is that they should in no way be second-class compared to what you would expose to the external users:

Rule 4: Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.

Once you have the internal APIs designed this way, deciding which subset to expose as public API stops becoming a technical decision. Your external APIs look like internal with the exception of stricter visibility rules (who can see what), rate limiting (with the possibility of a rate-unlimited paid tier), and authentication mechanism that may differ from what is used internally.

Rule 5: Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.

SoundClound developers went the other way (public API first) and realized that they cannot build their entire system with the limitations in place for the public APIs, and had to resort to more powerful internal APIs. The delicate balance between making public APIs useful without giving out the farm is a decision every business need to make in the API economy. Micro-services simply encourage you to start from internal and work towards public.

Messaging

If there was a section in SoundCloud blog post that made me jump with joy was a section where they discussed how they arrived at using RabbitMQ for messaging between micro-services, considering how I write about that in every second post for the last three months. In their own words:

Soon enough, we realized that there was a big problem with this model; as our microservices needed to react to user activity. The push-notifications system, for example, needed to know whenever a track had received a new comment so that it could inform the artist about it. At our scale, polling was not an option. We needed to create a better model.

 

We were already using AMQP in general and RabbitMQ in specific — In a Rails application you often need a way to dispatch slow jobs to a worker process to avoid hogging the concurrency-weak Ruby interpreter. Sebastian Ohm and Tomás Senart presented the details of how we use AMQP, but over several iterations we developed a model called Semantic Events, where changes in the domain objects result in a message being dispatched to a broker and consumed by whichever microservice finds the message interesting.

I don’t need to say much about this – read my REST/MQTT mirroring post that describes the details of what SoundCloud guys call ‘changes in the domain objects result in a message’. I would like to indulge in a feeling that ‘great minds think alike’, but more modestly (and realistically), it is just common sense and RabbitMQ is a nice, fully featured and reliable open source polyglot broker. No shocking coincidence – it is seen in many installations of this kind. Let’s make a rule about it:

Rule 6: Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

All together now

Let’s pull all the rules together. As we speak, teams around the world are suffering under the weight of large unwieldy monolithic applications that are ill-fit for the cloud deployment. They are intrigued by micro-services but afraid to take the plunge. These rules will make the process more manageable and allow you to arrive at a better system that is easier to grow, deploy many times a day, and more reactive to events, load, failure and users:

  1. Every new feature added to the system will from now on be written as a micro-service.
  2. Every existing feature that requires significant rework will be removed and rewritten as a micro-service.
  3. APIs should be the only way micro-services talk to each other and the outside world.
  4. Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.
  5. Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.
  6. Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

This is a great time to build micro-service based systems, and collective wisdom on the best practices is converging as more systems are coming online. I will address the topic of APIs in more detail in one of the future posts. Stay tuned, and keep reading my mind!

© Dejan Glozic, 2014

Odds and Ends

Jack Spade Odds and Ends pouch
Jack Spade Odds and Ends pouch

This week and the next there will be no regular blog posts. I hate to break my routine, but I am going to IBM Innovate 2014, and being an IBMer myself, it is a working conference for me, requiring a lot of preparation time and paying my booth dues.

If you are coming to Innovate, don’t forget to come and hear myself and Dan Berg in our presentation on the new DevOps Pipeline we built for IBM DevOps Services (powered by JazzHub). Here are the coordinates:

ICD-1810 : DevOps Services: Automated Delivery Pipeline for Codename: BlueMix
Innovation – Cloud Development
Date/Time : Wed, 04-Jun, 09:15 AM-10:15 AM
Room : Dolphin-Australia 3
Co-presenter(s):Daniel Berg, IBM

While I am here, let’s review other events where you can see me. The next one comes in July 9-10, when I will present at DevCon5 in New York City. This is my talk:

Node.js Micro-Services: The Water is Fine, Jump In!

Well, duh – what did you expect, just check the number of micro-service posts I made – it is only fitting for me to spread the message in person as well.

Then, in September 7-11 in Ireland comes nodeconf.eu. I don’t have the title of the talk yet but you can expect more micro-service-y goodness.

So there you go – three opportunities to meet me and have a beer – what’s not to like?

See you again in about 10 days when I will resume the regular programming.

© Dejan Glozic, 2014

Micro-Service Fun – Node.js + Messaging + Clustering Combo

monty-1920-1200-wallpaper
Micro-Services in Silly Walk, Monty Python

A:    I told you, I’m not allowed to argue unless you’ve paid.
M:   I just paid!
A:   No you didn’t.
M:   I DID!
A:   No you didn’t.
M:  Look, I don’t want to argue about that.
A:  Well, you didn’t pay.
M:  Aha. If I didn’t pay, why are you arguing? I Got you!
A:   No you haven’t.
M:  Yes I have. If you’re arguing, I must have paid.
A:   Not necessarily. I could be arguing in my spare time.

 

Monty Python, ‘The Argument Clinic’

And now for something completely different. I decided to get off the soap box I kept climbing recently and give you some useful code for a change. I cannot stray too much from the topic of micro-services (because that is our reality now), or Node.js (because they is the technology with which we implement that reality). But I can zero in on a very practical problem we unearthed this week.

First, a refresher. When creating a complex distributed system using micro-services, one of the key problems to solve is to provide for inter-service communication. Micro-services normally provide REST APIs to get the baseline state, but the system is in constant flux and it does not take long until the state you received from the REST API is out of date. Of course, you can keep refreshing your copy by polling but this is not scalable and adds a lot of stress to the system, because a lot of these repeated requests do not result in new state (I can imagine an annoyed version of Scarlett Johansson in ‘Her’, replying with ‘No, you still have no new messages. Stop asking.’).

A better pattern is to reverse the flow of information and let the service that owns the data tell you when there are changes. This is where messaging comes in – modern micro-service based systems are hard to imagine without a message broker of some kind.

Another important topic of a modern system is scalability. In a world where you may need to quickly ramp up your ability to handle the demand (lucky you!), an ability to vertically or horizontally scale your micro-services on a moments notice is crucial.

Vertical and Horizontal Scalability

An important Node.js quirk explainer: people migrating from Java may not understand the ‘vertical scalability’ part of the equation. Due to the auto-magical handling of thread pools in a JEE container, increasing the real or virtual machine specs requires no effort on the software side. For example, if you add more CPU cores to your VM, JEE container will spread out to take advantage of them. If you add more memory, you may need to tweak the JVM settings but otherwise it will just work. Of course, at some point you will need to resort to multiple VMs (horizontal scalability), at which point you may discover that your JEE app is actually not written with clustering in mind. Bummer.

In the Node.js land, adding more cores will help you squat unless you make a very concrete effort to fork more processes. In practice, this is not hard to do with utilities such as PM2 – it may be as easy as running the following command:

pm2 start app.js -i max

Notice, however, that for Node.js, vertical and horizontal scalability is the same regarding the way you write your code. You need to cluster just to take advantage of all the CPU cores on the same machine, never mind load balancing multiple separate servers or VMs.

I actually LOVE this characteristic of Node.js – it forces you to think about clustering from the get go, discouraging you from holding onto data between requests, forcing you to store any state in a shared DB where it can be accessed by all the running instances. This makes the switch from vertical to horizontal scalability a non-event for you, which is a good thing to discover when you need to scale out in a hurry. Nothing new here, just basic share-nothing goodness (see 12factors for a good explainer).

However, there is one important difference between launching multiple Node.js processes using PM2 or Node ‘cluster’ module, and load-balancing multiple Node servers using something like Nginx. With load balancing using a proxy, we have a standalone server binding to a port on a machine, and balancing and URL proxying is done at the same time. You will write something like this:

http {
    upstream myapp1 {
        server srv1.example.com;
        server srv2.example.com;
        server srv3.example.com;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myapp1;
        }
    }
}

If you try to launch multiple Node servers on a single machine like this, all except the first one will fail because they cannot bind to the same port. However, if you use Node’s ‘cluster’ module (or use PM2 with uses the same module under the covers), a little bit of white magic happens – the master process has a bit of code that enables socket sharing between the workers using a policy (either OS-defined, or ’round-robin’ as of Node 0.12). This is very similar to what Nginx does to your Node instances running on separate servers, with a few more load balancing options (round-robin, least connected, IP-hash, weight-directed).

Now Add Messaging to the Mix

So far, so good. Now, as it is usual in life, the fun starts when you combine the two concepts I talked about (messaging and clustering) together.

To make things more concrete, let’s take a real world example (one we had to solve ourselves this week). We were writing an activity stream micro-service. Its job is to collect activities expressed using Activity Stream 2.0 draft spec, and store them in Cloudant DB, so that they can later be retrieved as an activity stream. This service does one thing, and does it well – it aggregates activities that can originate anywhere in the system – any micro-service can fire an activity by publishing into a dedicated MQTT topic.

On the surface, it sounds clear cut – we will use a well behaved mqtt module as a MQTT client, RabbitMQ for our polyglot message broker, and Node.js for our activity micro-service. This is not the first time we are using this kind of a system.

However, things become murky when clustering is added to the mix. This is what happens: MQTT is a pub/sub protocol. In order to allow each subscriber to read the messages from the queue at its own pace, RabbitMQ implements the protocol by hooking up separate queues for each Node instance in the cluster.

mqtt-cluster2

This is not what we want. Each instance will receive a ‘new activity’ message, and attempt to write it to the DB, requiring contention avoidance. Even if the DB can prevent all but one node to succeed in writing the activity record, this is wasteful because all the nodes are attempting the same task.

The problem here is that ‘white magic’ used for the clustering module to handle http/https server requests does not extend to the mqtt module.

Our initial thoughts around solving this problem were like this: if we move the message client to the master instance, it can react to incoming messages and pass them on to the forked workers in some kind of ’round- robin’ fashion. It seemed plausible, but had a moderate ‘ick’ factor because implementing our own load balancing seemed like fixing a solved problem. In addition, it would prevent us from using PM2 (because we had to be in control of forking the workers), and if we used multiple VMs and Nginx load balancing, we would be back to square one.

Fortunately, we realized that RabbitMQ can already handle this if we partially give up the pretense and acknowledge we are running AMQP under the MQTT abstraction. The way RabbitMQ works for pub/sub topologies is that publishers post to ‘topic’ exchanges, that are bound to queues using routing keys (in fact, there is direct mapping between AMQP routing keys and MQTT topics – it is trivial to map back and forth).

The problem we were having by using MQTT client on the consumer side was that each cluster instance received its own queue. By dropping to an AMQP client and making all the instances bind to the same queue, we let RabbitMQ essentially load balance the clients using ’round-robin’ policy. In fact, this way of working is listed in RabbitMQ documentation as work queue, which is exactly what we want.

amqp-cluster

OK, Show Us Some Code Now

Just for variety, I will publish MQTT messages to the topic using Eclipse PAHO Java client. Publishing using clients in Node.js or Ruby will be almost identical, modulo syntax differences:

	public static final String TOPIC = "activities";
	MqttClient client;

	try {
		client = new MqttClient("tcp://"+host+":1883",
                                          "mqtt-client1");
		client.connect();
		MqttMessage message = new MqttMessage(messageText.getBytes());
		client.publish(TOPIC, message);
	    System.out.println(" [x] Sent to MQTT topic '"
                                  +TOPIC+"': "+ message + "'");
		client.disconnect();
	} catch (MqttException e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}

The client above will publish to the ‘activities’ topic. What we now need to do on the receiving end is set up a single queue and bind it to the default AMQP topic exchange (“amq.topic”) using the matching routing key (again, ‘activities’). The name of the queue does not matter as long as all the Node workers are using it (and they will by the virtue of being clones of each other).

var amqp = require('amqp');

var connection = amqp.createConnection({ host: 'localhost' });

// Wait for connection to become established.
connection.on('ready', function () {
  // Use the default 'amq.topic' exchange
  connection.queue('worker-queue', { durable: true}, function (q) {
      // Route key identical to the MQTT topic
      q.bind('activities');

      // Receive messages
      q.subscribe(function (message, headers, deliveryInfo, messageObject) {
        // Print messages to stdout
        console.log('Node AMQP('+process.pid+'): received topic "'+
        		deliveryInfo.routingKey+
        		'", message: "'+
        		message.data.toString()+'"');
      });
  });
});

Implementation detail: RabbitMQ MQTT plug-in uses the default topic exchange “amq.topic”. If we set up a different exchange for the MQTT traffic, we will need to explicitly name the exchange when binding the queue to it.

Any Downsides?

There are scenarios in which it is actually beneficial for all the workers to receive an identical message. When the workers are managing Socket.io chat rooms with clients, a message may affect multiple clients, so all the workers need to receive it and decide if it applies to them. A single queue topology used here is limited to cases where workers are used strictly for scalability and any single worker can do the job.

From the more philosophical point of view, by resorting to AMQP we have broken through the abstraction layer and removed the option to swap in another MQTT broker in the future. We looked around and noticed that other people had to do the same in this situation. MQTT is a pub/sub protocol and many pub/sub protocols have a ‘unicast’ feature which would deliver the message to only one of the subscribers using some kind of a selection policy (that would work splendidly in our case). Unfortunately, there is no ‘unicast’ in MQTT right now.

Nevertheless, by retaining the ability to handle front end and devices publishing in MQTT we preserved the nice features of MQTT protocol (simple, light weight, can run on the smallest of devices). We continue to express the messaging side of our APIs using MQTT. At the same time, we were able to tap into the ‘unicast’ behavior by using the RabbitMQ features.

That seems like a good compromise to me. Here is hoping that unicast will become part of the MQTT protocol at some point in the future. Then we can go back to pretending we are running a ‘mystery MQTT broker’ rather than RabbitMQ for our messaging needs.

Nudge, nudge. Wink wink. Say no more.

© Dejan Glozic, 2014