Nodeconf.eu 2014: Trip Report (Part 1)

nodeconfeu-1

Shady’s back, tell a friend! Fresh from the green grass of Ireland where I attended (and presented) at this year’s nodeconf.eu, I am now back to report on it as promised.

This year’s conference is a second instance of a format started last year by Cian Ó Maidín and the friends from Near Form. The goal is to carefully curate talks across the Node.js community to ensure quality over quantity. I had a great pleasure attending the conference, particularly as I was one of the ‘carefully curated’ speakers this year. What adds a particular flare and sparkle to the event is the location – an actual Irish castle in Waterford, a south-eastern town of Ireland and the oldest of them all.

Getting to the Waterford castle involves an excerpt from the movie Plains, Trains and Automobiles – an itinerary that involves flying to Dublin, catching a bus to Waterford, then switching to a taxi that at some point needs to go over the river Suir via a private car ferry to get to the castle on the island.

The event started in the evening with a welcome reception that involved circus acts – a very ‘tall’ lady (see above) and a guy literately playing with fire.

nodeconfeu-2After-dinner entertainment included a lovely young lady playing a cello and singing in a way that fuses transitional Irish music and modern sensibilities – perfect for Irish hipsters (they do exist, don’t they?). Unfortunately her name escapes me – if you know it (and have a link to her home page), please drop me a comment (Edit: @fritzvd was kind enough to point out her name is Alana Henderson – thanks!).

nodeconfeu-3

Nodeconf.eu was held in a club house of the nearby Golf Club (part of the same Waterford Castle resort). For the next three days, our master of ceremonies was Mikeal, who was well known to most attendees (just look at your apps and if you require ‘request‘, you know him too).

nodeconfeu-4
Mikeal tells people to sit down so that we can start.

Conference opened with a welcome addressed by Cian, outlining what is awaiting us for the next three days, and upholding the conference Code of Conduct, which was wonderfully short:

  1. No harassment of any kind is allowed
  2. Please don’t fall into the Suir river (apparently somebody did not long ago)

The technical part of the conference started with Node’s own TJ Fontaine, Node.js core lead, with his ‘State of Node’ address. TJ posed that the industry is in a state that makes this a perfect timing for a JavaScript framework. He also re-iterated some of the key tenets of Node.js philosophy, including non-blocking I/O and ‘do one thing, and do it well’ ethos. Finally, he stressed that the evolution of Node.js is not about what other languages or frameworks are doing, but what is good for Node.js and JavaScript themselves.

nodeconfeu-6
TJ Fontaine delivers the State of Node address.

NearForm’s own Richard Rodger kicked off the Micro-Services block with the accumulated experience of deploying micro-services (and in particular Node.js micro-services) in production. He highlighted some natural advantages (scalability, flexibility of deployment) but also disadvantages (added latency). From his real-world experience, he concluded that business logic should be in the services (no core monolith), that developers should resist Tower of Babel (the temptation to use many languages and stacks) and to assume you can design upfront (services are ‘discovered’, not designed). Nevertheless, he reiterated one of the strong suits of micro-services – the fact that you can change your mind, swap databases mid-project (or anything else).

nodeconfeu-7
NearForm’s Richard Rodger kicks off the Micro-Services block.

Clifton Cunningham focused on a very hard problem of micro-services – the fact that while multiple services are responsible for various parts of the system, pages still need to share some content. He enumerated options used in the past – client-side stitching using Ajax, front-end server, Server Side Includes (SSIs), Edge-side includes (esi) etc. He presented his team’s take on the problem – an open-source module called Compoxure that has a number of advanced features to deal with the problems normally faced in production – performance, latency, failures and authentication. He also addressed the problem of delivering CSS and JS for this shared content across the micro-services.

nodeconfeu-8
Clifton Cunnigham on Compoxure.

Then it was time for me. My take was how my team in IBM DevOps Services decided to pursue micro-services in general, and Node.js in particular as we started evolving our system. I don’t need to go into details – this whole blog is a weekly chronicle of our journey. I added a twist from a position of doing Node.js in a large enterprise – the need to legally vet a large number of Node.js modules (causing legal to hate us) and the complexity of deploying a large number of services in a secure manner (causing OPS to hate us).

nodeconfeu-9
Dejan Glozic on Node.js micro-services at IBM (photo courtesy of Louis Faustino via SmugMug).

The last speaker in the Micro-Services block was Fred George, bringing his wealth of experiences building micro-services in a number of real-world projects. He brought forward several examples of projects using different technologies and architecture, unified in the fact that most of the time an event based (asynchronous) approach was a better fit than synchronous. Out of that experience he extracted a concept of a system where all services receive all the messages all the time, but the messages are semantically classifiable, forming ‘rapids’ (all events), ‘rivers’ (themed events) and ‘ponds’ (state/history).

nodeconfeu-10
Fred George on rapids, rivers and ponds.

After a coffee and JS cupcakes break, we switched to the ‘Production’ track, starting with Brian McCallister from Groupon walking us from the familiar experience of a company whose monolith has become so large and unwieldy that it eventually made adding new features virtually impossible. Groupon slowly and carefully migrated their two large installations (North America and Europe) to a micro-service system using Node.js, resulting in unblocking of further evolution, with performance and scalability improvements tossed into the mix. This was in a sense a partial continuation of the micro-service track, considering that Groupon’s new system shares many traits with the ones we talked about in the preceding talks.

nodeconfeu-11
Brian McCallister on building the I-Tier system at Groupon.

PayPal’s own Jeff Harrell zeroed on a number of anti-patterns of implementing Node.js in a real-world project in PayPal. PayPal made a wholesale transformation to Node.js that is still ongoing, and their large team contributed a number of these anti-patterns. Among them were: bringing baggage from the previous projects, re-creating monolithic applications using Node.js, Googling ‘how to do X in JavaScript’, wrapping everything in promises, sloppy async code, using Node.js for everything, and ignoring the ecosystem.

PayPal's Jeff Harrell on Node.js real world anti-patterns.
PayPal’s Jeff Harrell on Node.js real world anti-patterns.

The last speaker for the day was Aman Kohli from Citi Bank, bringing to us the experience of deploying Node.js to provide back-end services for mobile apps in an environment that is everything but forgiving when it comes to security, adherence to regulations and process. According to Aman, they chose Node because of the async event model, being ideally suited for mobile and sensor apps, the fact that it was approved for internal usage, that it required fewer controls, and due to the good success they had with using Hapi framework for building mobile API services.

Aman Kohli on using Node.js for mobile services in Citi Bank.
Aman Kohli on using Node.js for mobile services in Citi Bank.

At this point it was time to break for lunch. From several choices I picked to attend Kraken.js workshop for my afternoon activity, where I could pick the brains of Jeff Harrell and Erik Toth from PayPal on the philosophy and the plans for this open source suite we already use in our micro-services.

Evening R&R was provided to us with a combination of Irish whiskey tasting (not bad at all, but still prefer Scottish single malt) and a great local folk band treating us with a mix of traditional and Irish-treated covers.

Continue to the part 2 of the report.

© Dejan Glozic, 2014

HA All The Things

HA all the things

I hate HA (High Availability). Today everything has to be highly available. All of the sudden SA (Standard Availability) isn’t cutting it any more. Case in point: I used to listen to music on my way to work. Not any more – my morning meeting schedule intrudes into my ride, forcing me to participate in meetings while driving, Bluetooth and all. My 8 speaker, surround sound Acura ELS system hates me – built for high resolution multichannel reproduction, it is reduced to ‘Hi, who just joined?’ in glorious mono telephony. But I digress.

You know that I wrote many articles on micro-services because it is our ongoing concern as we are slowly evolving our topology away from monolithic systems and towards micro-services. I have already written about my thoughts on now to scale and provide HA for Node.js services. We have also solved our problem of handling messaging in a cluster using AMQP worker queues.

However, we are not done with HA. Message broker itself needs to be HA, and we only have one node. We are currently using RabbitMQ, and so far it has been rock solid, but we know that in a real-world system it is not a matter of ‘if’ but ‘when’ it will suffer a problem, bringing all the messaging capabilities of the system with it. Or we will mess around with the firewall rules and block access to it by accident. Hey, contractors rupture gas pipes and power cables by accident all the time. Don’t judge.

Luckily RabbitMQ can be clustered. RabbitMQ documentation is fairly extensive on clustering and HA. In short, you need to:

  1. Stand up multiple RabbitMQ instances (nodes)
  2. Make sure all the instances use the same Erlang cookie which allows them to talk to each other (yes, RabbitMQ is written in Erlang; you learn on the first day when you need to install Erlang environment before you install Rabbit)
  3. Cluster nodes by running rabbitmqctl join_cluster –ram rabbit@<firstnode> on the second server
  4. Start the nodes and connect to any of them

RabbitMQ has an interesting feature in that nodes in the cluster can join in RAM mode or in disc mode. RAM nodes will replicate state only in memory, while in disc mode they will also write it to disc. While in theory it is enough to have only one of the nodes in the cluster use disc mode, performance gain of using RAM mode is not worth the risk (performance gain of RAM mode is restricted to joining queues and exchanges, not posting messages anyway).

Not so fast

OK, we cluster the nodes and we are done, right? Not really. Here is the problem: if we configure the clients to connect to the first node and that node goes down, messaging is still lost. Why? Because RabbitMQ guys chose to not implement the load balancing part of clustering. The problem is that clients communicate with the broker using TCP protocol, and Swiss army knives of proxying/caching/balancing/floor waxing such as Apache or Nginx only reverse-proxy HTTP/S.

After I wrote that, I Googled just in case and found Nginx TCP proxy module on GitHub. Perhaps you can get away with just Nginx if you use it already. If you use Apache, I could not find TCP proxy module for it. It it exists, let me know.

What I DID find is that a more frequently used solution for this kind of a problem is HAProxy. This super solid and widely used proxy can be configured for Layer 4 (transport proxy), and works flawlessly with TCP. It is fairly easy to configure too: for TCP, you will need to configure the ‘defaults’, ‘frontend’ and ‘backend’ sections, or join both and just configure the ‘listen’ section (works great for TCP proxies).

I don’t want to go into the details of configuring HAProxy for TCP – there are good blog posts on that topic. Suffice to say that you can configure a virtual broker address that all the clients can connect to as usual, and it will proxy to all the MQ nodes in the cluster. It is customary to add the ‘check’ instruction to the configuration to ensure HAProxy will check that nodes are alive before sending traffic to them. If one of the brokers goes down, all the message traffic will be routed to the surviving nodes.

Do I really need HAProxy?

If you truly want to HA all the things, you need to now worry that you made the HAProxy a single point of failure. I told you, it never ends. The usual suggestions are to set up two instances, one standard and another backup for fail-over.

Can we get away with something simpler? It depends on how you define ‘simpler’. Vast majority of systems RabbitMQ runs on are some variant of Linux, and it appears there is something called LVS (Linux Virtual Server). LVS seems to be perfect for our needs, being a low-level Layer 4 switch – it just passes TCP packets to the servers it is load-balancing. Except in section 2.15 of the documentation I found this:

This is not a utility where you run ../configure && make && make check && make install, put a few values in a *.conf file and you’re done. LVS rearranges the way IP works so that a router and server (here called director and realserver), reply to a client’s IP packets as if they were one machine. You will spend many days, weeks, months figuring out how it works. LVS is a lifestyle, not a utility.

OK, so maybe not as perfect a fit as I thought. I don’t think I am ready for a LVS lifestyle.

How about no proxy at all?

Wouldn’t it be nice if we didn’t need the proxy at all? It turns out, we can pull that off, but it really depends on the protocol and client you are using.

It turns out not all clients for all languages are the same. If you are using AMQP, you are in luck. The standard Java client provided by RabbitMQ can accept a server address array, going through the list of servers when connecting or reconnecting until one responds. This means that in the event of node failure, the client will reconnect to another node.

We are using AMQP for our worker queue with Node.js, not Java, but the Node.js module we are using supports a similar feature. It can accept an array for the ‘host’ property (same port, user and password though). It will work with normal clustered installations, but the bummer is that you cannot install two instances on localhost to try the failure recovery out – you will need to use remote servers.

On the MQTT side, Eclipse Paho Java client supports multiple server URLs as well. Unfortunately, our standard Node.js MQTT module currently only supports one server. I was assured code contributions will not be turned away.

This solution is fairly attractive because it does not add any more moving parts to install and configure. The downside is that the clients becomes fully aware of all the broker nodes – we cannot just transparently add another node as we could in the case of the TCP load balancer. All the client must add it to the list of nodes to connect to for this addition to work. In effect, our code becomes aware of our infrastructure choices more than it should.

All this may be unnecessary for you if you use AWS since Google claims AWS Elastic Load Balancing can serve as a TCP proxy. Not a solution for us IBMers of course, but it may work for you.

Give me PaaS or give me death

This is getting pretty tiring – I wish we did all this in a PaaS like our own Bluemix so that it is all taken care of. IaaS gives you the freedom that can at times be very useful and allow you to do powerful customizations, but at other times makes you wish to get out of the infrastructure business altogether.

I told you I hate HA. Now if you excuse me, I need to join another call.

© Dejan Glozic, 2014

The Year of Blogging Dangerously

391px-Extremely_yummy_raspberry_cheesecake

Wow, has it been a year already? I am faking surprise, of course, because WordPress has notified me well ahead of time that I need to renew my dejanglozic.com domain. So in actuality I said ‘wow, will it soon be a year of me blogging’. Nevertheless, the sentiment is genuine.

It may be worthwhile to look back at the year, if only to reaffirm how quickly things change in this industry of ours, and also to notice some about-faces, changes of direction and mind.

I started blogging in the intent to stay true to the etymological sense of the word ‘blog’ (Web log). As a weekly diary of sorts, it was supposed to chronicle trials and tribulations of our team as it boldly goes into the tumultuous waters of writing Web apps in the cloud. I settled on a weekly delivery, which is at times doable, at other times a nightmare. I could definitely do without an onset of panic when I realize that it is Monday and I forgot to write a new entry.

Luckily we have enough issues we deal with daily in our work to produce enough material for the blog. In that regard, we are like a person who just moved into a new condo after his old apartment went up in flames and went to Ikea. If an eager clerk asks him ‘what do you need in particular’, his genuine answer must be ‘everything – curtains, rugs, new mattress, a table, chairs, a sofa, a coffee table …’.

At least that’s how we felt – we were re-doing everything in our distributed system and we were able to re-use very little from our past lives, having boldly decided to jump ahead as far as possible and start clean.

Getting things out of the system

That does not mean that the blog actually started with a theme or a direction. In the inaugural post The Turtleneck and The Hoodie, I proudly declared that I care both about development AND the design and refuse to choose. But that is not necessarily a direction to sustain a blog. It was not an issue for a while due to all these ideas that were bouncing in my head waiting to be written down. Looking back, I think it sort of worked in a general-purpose, ‘good advice’ kind of way. Posts such as Pulling Back from Extreme AJAX or A Guide to Storage for ADD Types were at least very technical and based on actual research and hands-on experience.

Some of the posts were just accumulated professional experience that I felt the need to share. Don’t Get Attached to Your Code or Dumb Code Good, Smart Code Bad were crowd pleasers, at least in the ‘yeah, it happened to me too’ way. Kind of like reading that in order to lose weight you need to eat smart and go outside. Makes a lot of sense except for the execution, which is the hard part.

344px-'Be_smart..Act_dumb^_-_NARA_-_513932

Old man yells at the cloud

Funnily enough, some of my posts, after using up all the accumulated wisdom to pass on, sound somewhat cranky in hindsight. I guess I disagreed with some ideas and directions I noticed, and the world ignored my disagreement and continued, unimpressed. How dare people do things I don’t approve of!

Two cranky posts that are worth highlighting are Swimming Against the Tide, in which I am cranky regarding client side MVC frameworks, and Sitting on the Node.js Fence, in which I argue with myself on pros and cons of Node.js. While my subsequent posts clearly demonstrate that I resolved the latter dilemma and went down the Node.js route hook, line and sinker, I am still not convinced that all that JavaScript required to write non-trivial Single Page Apps (SPAs) is a very good idea, particularly if you have any ambition to run them on mobile devices. But it definitely sounds funny to me now – as if I was expressing an irritated disbelief that, after publishing all the bad consequences of practicing extreme Ajax, people still keep doing it!

I heart Node.js

Of course, once our team went down Node.js route (egged on and cajoled by me), you could not get me to shut up about it. In fact, the gateway drug to it was my focus on templating solutions, and our choice of Dust.js (LinkedIn fork). By the way, it is becoming annoying to keep adding ‘LinkedIn fork’ all the time – that’s the only version that is actively worked on anyway.

Articles from this period are more-less setting the standard for my subsequent posts: they are about 1500 words long, have a mix of outgoing links, a focused technical topic, illustrative embedded tweets (thanks to @cra who taught me how not to embed tweets as images like a loser). As no story about Node.js apps is complete without Web Sockets and clustering, and both were dully covered.

Schnorr_von_Carolsfeld_Bibel_in_Bildern_1860_006

I know micro-services!

Of course, it was not until I went to attend NodeDay in February that a torrent of posts on micro-services was unleashed. The first half of 2014 was all ablaze with the posts and tweets about micro-services around the world anyway, which my new Internet buddy Adrian Rossouw dully documented in his Wayfinder field guide. It was at times comical to follow food fights about who will provide the bestest definition of them all:

If you follow a micro-services tag for my blog, the list of posts is long and getting longer every week. At some point I will stop tagging posts with it, because if everything is about them, nothing is – I need to be more specific. Nevertheless, I am grateful for the whole topic – it did after all allow me to write the most popular post so far: Node.js and Enterprise – Why Not?

monty-1920-1200-wallpaper

What does the future hold?

Obviously Node.js, messaging and micro-services will continue to dominate our short-term horizon as we are wrestling with them daily. I spoke about them at the recent DevCon5 in NYC and intend to do the same at the upcoming nodeconf.eu in September.

Beyond that, I can see some possible future topics (although I can’t promise anything – it is enough to keep up as it is).

  • Reactive programming – I have recently presented at the first Toronto Reactive meetup, and noticed this whole area of Scala and Akka that is a completely viable alternative to implement micro-services and scalable distributed systems that confirm to the tenets of Reactive Manifesto. I would like to probe further.
  • Go language – not only because TJ decided to go that route, having an alternative to Node.js while implementing individual micro-services is a great thing, particularly for API and back-end services (I still prefer Node.js for Web serving apps).
  • Libchan – Docker’s new project (like Go channels over the network) currently requires Go (duh) but I am sure Node.js version will follow.
  • Famo.us – I know, I know, I have expressed my concerns about their approach, but I did the same with Node.js and look at me now.
  • Swift – I am a registered XCode developer and have the Swift-enabled update to it. If only I could find some time to actually create some native iOS apps. Maybe I will like Swift more than I do Objective-C.

I would like to read this post in a year and see if any of these bullets panned out (or were instead replaced with a completely different list of even newer and cooler things). In this industry, I would not be surprised.

Whatever I am writing about, I would like to thank you for your support and attention so far and hope to keep holding it just a little bit longer. Now if you excuse me, I need to post this – I am already late this week!

© Dejan Glozic, 2014

Node.js Apps and Periodic Tasks

397px-Kitchen_alarm_clock

When working on a distributed system of any size, sooner or later you will hit a problem and proclaim ‘well, this is a first’. My second proclamation in such situations is ‘this is a nice topic for the blog’. Truth to form, I do it again, this time with the issue of running periodic tasks, and the twist that clustering and high availability efforts add to the mix.

First, to frame the problem: a primary pattern you will surely encounter in a Web application is Request/Response. It is a road well traveled. Any ‘Hello, World’ web app is waving you a hello in a response to your request.

Now add clustering to the mix. You want to ensure that no matter what is happening to the underlying hardware, or how many people hunger for your ‘hello’, you will be able to deliver. You add more instances of your app, and they ‘divide and conquer’ the incoming requests. No cry for a warm reply is left unanswered.

Then you decide that you want to tell a more complex message to the world because that’s the kind of person you are: complex and multifaceted. You don’t want to be reduced to a boring slogan. You store a long and growing list of replies in a database. Because you are busy and have no time for standing up databases, you use one hosted by somebody else, already set up for high availability. Then each of your clustered nodes talk to the same database. You set the ‘message of the day’ marker, and every node fetches it. Thousands of people receive the same message.

Because we are writing our system in Node.js, there are several ways to do this, and I have already written about it. Of course, a real system is not an exercise in measuring HWPS (Hello World Per Second). We want to perform complex tasks, serve a multitude of pages, provide APIs and be flexible and enable parallel development by multiple teams. We use micro-services to do all this, and life is good.

I have also written about the need to use messaging in a micro-service system to bring down the inter-service chatter. When we added clustering into the mix, we discovered that we need to pay special attention to ensure task dispatching similar to what Node.js clustering or proxy load balancing is providing us. We found our solution in round-robin dispatching provided by worker queues.

Timers are something else

Then we hit timers. As long as information flow in a distributed system is driven by user events, clustering works well because dispatching policies (most often round-robin) are implemented by both the Node.js clustering and proxy load balancer. However, there is a distinct class of tasks in a distributed system that is not user-driven: periodic tasks.

Periodic tasks are tasks that are done on a timer, outside of any external stimulus. There are many reasons why you would want to do it, but most periodic tasks service databases. In a FIFO of a limited size, they delete old entries, collapse duplicates, extract data for analysis, report them to other services etc.

For periodic tasks, there are two key problems to solve:

  1. Something needs to count the time and initiate triggers
  2. Tasks need to be written to execute when initiated by these triggers

The simplest way to trigger the tasks is known by every Unix admin – cron. You set up a somewhat quirky cron table, and tasks are executed according to the schedule.

The actual job to execute needs to be provided as a command line task, which means your app that normally accesses the database needs to provide additional CLI entry point sharing most of the code. This is important in order to keep with the factor XII from the 12-factors, which insists one-off tasks need to share the same code and config as the long running processes.

 

There are two problems with cron in the context of the cloud:

  1. If the machine running cron jobs malfunctions, all the periodic tasks will stop
  2. If you are running your system on a PaaS, you don’t have access to the OS in order to set up cron

The first problem is not a huge issue since these jobs run only periodically and normally provide online status when done – it is relatively easy for an admin to notice when they stop. For high availability and failover, Google has experimented with a tool called rcron for setting up cron over a cluster of machines.

Node cron

The second problem is more serious – in a PaaS, you will need to rely on a solution that involves your apps. This means we will need to set up a small app just to run an alternative to cron that is PaaS friendly. As usual, there are several options, but node-cron library seems fairly popular and has graduated past the version 1.0. If you run it in an app backed by supervisor or PM2, it will keep running and executing tasks.

You can execute tasks in the same app where node-cron is running, providing these tasks have enough async calls themselves to allow the event queue to execute other callbacks in the app. However, if the tasks are CPU intensive, this will block the event queue and should be extracted out.

A good way of solving this problem would be to hook up the app running node-cron to the message broker such as RabbitMQ (which we already use for other MQ needs in our micro-service system anyway). The only thing node-cron app will do is publish task requests to the predefined topics. The workers listening to these topics should do the actual work:

node-cron

The problem with this approach is that a new task request can arrive while a worker has not finished running the previous task. Care should be taken to avoid workers stepping over each other.

Interestingly enough, a hint at this approach can be found in aforementioned 12-factors, in the section on concurrency. You will notice a ‘clock’ app in the picture, indicating an app whose job is to ‘poke’ other apps at periodic intervals.

There can be only one

A ‘headless’ version of this approach can be achieved by running multiple apps in a cluster and letting them individually keep track of periodic tasks by calling ‘setTimeout’. Since these apps share nothing, they will run according to the local server clock that may nor may not be in sync with other servers. All the apps may attempt to execute the same task (since they are clones of each other). In order to prevent duplication, each app should attempt to write a ‘lock’ record in the database before starting. To avoid deadlock, apps should wait random amount of time before retrying.

Obviously, if the lock is already there, apps should fail to create their own. Therefore, only one app will win in securing the lock before executing the task. However, the lock should be set to expire in a small multiple of times required to normally finish the task in order to avoid orphaned locks due to crashed workers. If the worker has not crashed but is just taking longer than usual, it should renew the lock to prevent it from expiring.

The advantage of this approach is that we will only schedule the next task once the current one has finished, avoiding the problem that the worker queue approach has.

Note that in this approach, we are not achieving scalability, just high availability. Of the several running apps, at least one app will succeed in securing the lock and executing the task. The presence of other apps ensures execution but does not increase scalability.

I have conveniently omitted many details about writing and removing the lock, retries etc.

Phew…

I guarantee you that once you start dealing with periodic tasks, you will be surprised with the complexity of executing them in the cloud. A mix of cloud, clustering and high availability makes running periodic tasks a fairly non-trivial problem. Limitations of PaaS environments compound this complexity.

If you visit TJ’s tweet above, you will find dozen of people offering alternatives in the replies (most of them being variations of *ron). The plethora of different solutions will be a dead giveaway that this is a thorny problem. It is not fully solved today (at least not in the context of the cloud and micro-service systems), hence so many alternatives. If you use something that works well for you, do share in the ‘Reply’ section.

© Dejan Glozic, 2014

Socket.io: Mind the Gap

Wikimedia Commons, 'Mind the Gap', 2008, Clicsouris
Wikimedia Commons, ‘Mind the Gap’, 2008, Clicsouris

Welcome to our regular edition of ‘Socket.io version 1.0 watch’ or ‘Making sure Guillermo Rauch is busy working on Socket.io 1.0 instead of whatever he does to pay the rent that does nothing for me’. I am happy to inform you that Socket.io 1.0 is now available, with the new logo and everything. Nice job!

With that piece of good news, back to our regular programming. First, a flashback. When I was working on my doctoral studies in London, England, one of the most memorable trivia was a dramatic voice on the London Underground PA system warning me to ‘Mind the Gap’. Since then I seldomly purchase my clothes in The Gap, choosing its more upmarket sibling Banana Republic. JCrew is fine too. A few years ago a friend went to London to study and she emailed me that passengers are still reminded about the dangers of The Gap.

We have recently experienced a curious problem in our usage of WebSockets – our own gap to mind, as it were. It involves a topology I have already written about. You will most likely hit it too, so here it goes:

  1. A back-end system uses message queue to pass messages about state changes that affect the UI
  2. A micro-service serves a Web page containing Socket.io client that turns around and establishes a connection with the server once page has been loaded
  3. In the time gap between the page has been served and the client calls back to establish a WebSockets connection, new messages arrive that are related to the content on the page.
  4. By the time the WebSockets connection has been established, any number of messages will have been missed – a message gap of sorts.
mind-the-gap
WebSockets gap: in the period of time from HTTP GET response to the establishment of the WebSockets connection, msg1 and msg2 were missed. The client will receive messages starting from the msg3.

The gap may or may not be a problem for you depending on how you are using the message broker to pass messages around micro-services. As I have already written in the post about REST/MQTT mirroring, we are using MQTT to augment the REST API. This augmentation mirrors the CRUD verbs that result in state change (CUD). The devil is in the details here, and the approach taken will decide whether the ‘message gap’ is going to affect you or not.

When deciding what to publish to the subscribers using MQ, we can take two approaches:

  1. Assume subscribers have made a REST call to establish the baseline state, and only send deltas. The subscribers will work well as long as the took the baseline and didn’t miss any of the deltas for whatever reason. This is similar to showing a movie on a cable channel in a particular time slot – if you miss it, you miss it.
  2. Don’t assume subscribers have the baseline state. Instead, assume they may have been down or not connected. Send a complete state of the resource alongside the message envelope. This approach is similar to breaking news being repeated many times during the day on a news channel. If you are just joining, you will be up to date soon.

The advantages of the first approach are the message payloads. There is no telling how big JSON resources can be (a problem recently addressed by Tim Bray in his fat JSON blog post). Imagine we are tracking a build resource and it is sending us updates on the progress (20%, 50%, 70%). Do we really want to receive the entire Build resource JSON alongside this message?

On the other hand, the second approach is not inconsistent with the recommendation for PUT and PATCH REST responses. We know that the newly created resource is returned in the response body for POST requests (alongside Location header). However, it is considered a good practice to do the same in the requests for PUT and PATCH. If somebody moves the progress bar of a build by using PATCH to update the ‘progress’ property, the entire build resource will thus be returned in the response body. The service fielding this request can just take that JSON string and also attach it to the message under the ‘state’ property, as we are already doing for POST requests.

Right now we didn’t make up our minds. Sending around entire resources in each message strikes us as wasteful. This message will be copied into each queue of the subscribers listening to it, and if it is durable, will also be persisted. That’a a lot of bites to move around while using a protocol whose main selling point is that it is light on the resources. Imagine pushing these messages to a native mobile client over the air. Casually attaching entire JSON resources to messages is not something you want to do in these situations.

In the end, we solved the problem without changing our ‘baseline + deltas’ approach. We tapped into the fact that messages have unique identifiers attached to them as part of the envelope. Each service that is handling clients via WebSockets has a little buffer of messages that are published by the message broker. When we send the page the client, we also send the ID of the last known message embedded in HTML as data. When WebSockets connection is established, the client will communicate (emit) this message ID to the server, and the server will check the buffer if new messages have arrived since then. If so, it will send those messages immediately, allowing the client to catch up – to ‘bridge the gap’. After it has been caught up, the message traffic resumes as usual.

As a bonus, this approach works for cases where the client drops the WebSockets connection. When connection is re-established, it can use the same approach to catch up on the messages it has missed.

The fix: the service sends the ‘message marker’ (last message id). Client echoes the marker when connecting with WebSockets. Detecting the hole in message sequence, the service immediately sends the missing messages allowing the client to catch up.

As you can see, we are still learning and evolving our REST/MQTT mirroring technique, and we will most likely encounter more face-palm moments like this. The solution is not perfect – in an extreme edge case, the WebSockets connection can take so long that the service message buffer fills up and old messages start dropping off. A solution in those cases is to refresh the browser.

We are also still intrigued with sending the state in all messages – there is something reassuring about it, and the fact that the similarity to PATCH/PUT behavior only reinforces the mirroring aspect is great. Perhaps our resources are not that large, and we are needlessly fretting over the message sizes. On the other hand, when making a REST call, callers can use ‘fields’ and ’embed’ to control the size of the response. Since we don’t know what any potential subscriber will need, we have no choice but to send the entire resource. We need to study that approach more.

That’s it from me this week. Live long, prosper and mind the gap.

© Dejan Glozic, 2014

Is There Life After TJ?

tjh

What is going to happen now?
Nothing. We will be sad for a while, then we will move on.

 

Mad Men, Don Drapper discusses Kennedy assassination with kids

Every once in a while a event occurs that pushes regular programming aside. If you are CNN, it happens with such annoying regularity that completely obviates the meaning of the phrase “breaking news”. Nevertheless, if you are at least a bit involved with the Node.js community, the news that TJ Holowaychuk turned his back on Node in favour of Go restores breaking news’ original meaning.

Up to this point Node.js had a bit of a problem with TJs as soon as TJ Fontaine assumed the post of Node.js lead. You had to add the last name to disambiguate, leading to a no TJ slide at a recent NodeDay. Before Fontaine, the real slim TJ authored such a monstrous number of Node.js modules (some of which, like Express, Jade and Mocha, being wildly popular), that there is a semi-serous thread on Quora that he is not a real person. Proof: total absence from the conference-industrial complex, no pictures, no videos, and inhumane coding output that suggest an army of ghost coders. Also, an abrupt change in import declaration style in 2013.

My first encounter with TJ (metaphorically speaking) was in the summer of 2012, when we evaluated Node.js for our project. Our team lead noticed that an unusually large cluster of modules needed to convert us from a servlet/JSP back end was authored by a single person. There was something unnerving about moving from software written by an army of big company developers to something written by a kid with an Emo haircut and a colorful t-shirt (look like TJ was too busy coding to notice that the official look is now beard and plaid shirt, and does not need prescription glasses to care about Warby Parker).

Unlike many people, I am also not completely gaga about his coding and documentation style. I never warmed up to Jade and Stylus – I spent many an hour tearing my hair out about some unexpected Jade behavior that turned out to be one tab or space too many. Dust.js we now use for our templating needs is much less susceptible to magic in the name of DRY.

Similarly, Express documentation is wonderfully minimalistic until you need some clarification, at which point you would gladly trend it for verbose but useful. Some of the Express APIs suffer from the same problem. You can ‘use’ all kinds of object types, and while ‘set’ is a setter, ‘get’ defines a route for an HTTP endpoint. Not that it matters anyway – an army of developers wrote the missing manual on Stack Overflow.

All this does not diminish the importance and the imprint that TJ left on the Node.js community. The Hacker News discussion was long and involved, with many participants trying to guess the ‘real reason’ behind the switch. Zef Hemel contemplated what he considers a ‘march toward Go’. Other people with an investment in Node.js commented as well:

[tweet 485132339362529280 align=’center’]

For anyone making a transition to Node.js and feeling the cold sweat of ‘buyer’s remorse’, I am offering the following opinion. In a brazen display of self-promotion, I will quote none other than myself from one of my previous posts:

It was always hard for me to make choices. In the archetypical classification of ‘satisficers’ and ‘maximizers’, I definitely fall into the latter camp. I research ad nauseum, read reviews, measure carefully, take everything into account, and then finally make a move, mentally exhausted. Then I burst into cold sweat of buyer’s remorse, or read a less than glowing review of the product I just purchased, and my happiness is dimished. It sucks to be me.

My point here is as follows: at any point in time, there will be many ways to solve your particular software problem. Chances are, you have spent a lot of time researching before pulling the plug on Node.js. You have made the switch, and your app or site is working well, purring like a kitten. I don’t think your app will suddenly start misbehaving just because TJ got a case of code rage. In fact, the signs were on the wall for quite some time because his twitter feed was all ‘Node.js errors this, Node.js errors that’ – lots of frustration about some very arcane details I didn’t understand at all.

If anything, this case teaches us a lesson that it does not pay to search for a language Messiah – a language/platform to end all languages/platforms. As it became plainly obvious, the future is polyglot, and Go is appearing as a growing alternative for writing apps for distributed systems. Even by TJ’s admission, Node.js is still a great choice for Web sites, as well as API services. For example, a whole class of page rendering template libraries are hard to replicate in Go (Mustache, Handlebars, Dust), and so are the building and testing solutions (Grunt, Mocha, Jasmine).

Node.js was never positioned as a platform for solving all kinds of computational problems, and even before Go, distributed systems had apps written in stacks other than Node.js. I think attempts to render Node.js as a solution for every problem were always misguided and no serious member of the Node.js community made such claims. As long as you use Node.js for what it is very good for – page serving apps and API services with a lot of I/O (and not too many CPU intensive, long running tasks), there is no need for TJH to somehow make your app less ‘correct’.

If, on the other hand, you are impressionable and must check out Go right away, by all means do, but if you keep putting all your eggs in a new and shiny basket, people like TJH will continue to stress you out as they move on to another new and shiny language or platform.

If you feel a need to solve a problem that Node.js is failing to solve or is not suitable for, and Go or Scala or any other solution fit the bill, by all means go ahead and use them. Otherwise, move along, people – nothing to see here. We wish TJ all the best in the Go future, while we continue to focus on our own problems and challenges.

I guess that means a ‘Yes, of course’ to the question in the title.

© Dejan Glozic, 2014

Micro-service APIs With Some Swag (part 2)

London Cries: A Man Swaggering, Paul Sandby, 1730
London Cries: A Man Swaggering, Paul Sandby, 1730

Read part 1 of the article.

Last week I delved into the problem of presenting a unified API doc from a distributed system composed of micro-services. In the second installment, we will get our hands dirty with help of Swagger by Wordnik.

A quick recap: the problem we are trying to solve is how to document all the APIs of the system when these APIs are an aggregation of endpoints contributed by individual micro-services. To illustrate the possible solution, we will imagine a distributed system consisting of two micro-services:

  • Projects – this service provides APIs to query update and delete projects (very simplified, of course).
  • Users – this service provides APIs to query and create user profiles. Projects API will make a reference to users when listing project members.

For simplicity, we will choose only a handful of properties and methods. Project model is defined in Swagger like this:

"Project": {
   "id": "Project",
   "description": "A single project model",
   "properties": {
      "name": {
      "type": "string",
      "description": "name of the project",
      "required": true
   },
   "description": {
      "type": "string",
      "description": "description of the project"
   },
   "avatar": {
      "type": "string",
      "description": "URL to the image representing the project avatar"
   },
   "owner": {
      "type": "string",
      "description": "unique id of the projects' owner",
      "required": true
   },
   "members": {
      "type": "array",
      "description": "array of unique ids of project members",
      "items": {
         "type": "string"
      }
   }
}

Users are provided by another service and have the following model, defined the same way:

"User": {
   "id": "User",
   "description": "A single user model",
   "properties": {
      "id": {
         "type": "string",
         "description": "unique user id",
         "required": true
      },
      "name": {
      "type": "string",
         "description": "user name",
         "required": true
      },
      "email": {
         "type": "string",
         "description": "user email",
         "required": true
      },
      "picture": {
         "type": "string",
         "description": "thumbnail picture of the user"
      }
   }
}

The key to the proposed solution lies in Swagger’s feature that allows the composite API document to be composed of APIs coming from multiple places. The entry point to the document definition will look like this:

{
    "apiVersion":"1.0",
    "swaggerVersion":"1.2",
    "apis":[
        {
            "path": "http://localhost:3000/api-docs/projects.json",
            "description":"Projects"
        },
        {
            "path": "http://localhost:3001/api-docs/users.json",
            "description":"Users"
        }
    ],
    "info":{
        "title":"30% Turtleneck, 70% Hoodie API Example",
        "description":"This API doc demonstrates how API definitions can be aggregated in a micro-service system, as demonstrated at <a href=\"http://dejanglozic.com\">http://dejanglozic.com</a>."
    }
}

Each API group with its set of endpoints can be sources from a different URL, allowing us the flexible solution to provide this resource by the actual micro-service that owns that API portion.

Each individual API document will list the endpoints and resources it handles. For each endpoint and verb combination, it will list parameters, request/response bodies, as well as data models and error responses. This and much more is fully documented in Swagger 1.2 specification.

{
  "path": "/users/{id}",
  "operations": [
    {
      "method": "GET",
      "summary": "Returns a user profile",
      "notes": "Implementation notes on GET.",
      "type": "User",
      "nickname": "getUser",
      "authorizations": {},
      "parameters": [
        {
          "name": "id",
          "description": "Unique user identifier",
          "paramType": "path",
          "type": "string"
        }
      ],
      "responseMessages": [
        {
          "code": 404,
          "message": "User with a provided id not found"
        }
      ]
    }
  ]
}

Swagger handles most of its specification through the parameter list, which is fairly clever. In addition to query parameters, they can be used to define path segments, as well as request body for POST, PUT and PATCH. In addition, request and response body schemas are linked to the model specifications further down the document.

Needless to say, I skipped over huge swaths of the specification dealing with authentication. Suffice to say that Swagger specification currently supports basic Auth, API key and OAuth 2 as authentication options.

At this point of the article, I realized that I cannot show you the actual JSON files without making the article long and unwieldy. Luckily, I also realized I actually work for IBM and more importantly, IBM DevOps Services (JazzHub). So here is what I have done:

The entire Node.js app is now available as a public project on DevOps Services site:

ids-swagger

Once you explored the code, you can see it running in IBM Bluemix. You can drill into the Swagger API UI (despite what the UI is telling you, you cannot yet ‘Try it’ – the actual API is missing, this is just a doc; for a real API, this part will also work).

bm-swagger

I hope you agree that showing a running app is better than a screenshot. From now on, I will make it my standard practice to host complete demo apps in DevOps Services and run them in Bluemix for your clicking pleasure.

© Dejan Glozic, 2014

Micro-service APIs With Some Swag (part 1)

London Cries: A Man Swaggering, Paul Sandby, 1730
London Cries: A Man Swaggering, Paul Sandby, 1730

Every aspect of the API matters to some Client.

Jim des Rivieres, Evolving Eclipse APIs

It is fascinating that the quote above is 14 years old now. It was coined by the Benevolent Dictator of Eclipse APIs Jim des Rivieres in the days when we defined how Eclipse Platform APIs were to be designed and evolved. Of course, APIs in question were Java, not the REST variety that is ruling the API economy these days. Nevertheless, the key principles hardly changed.

Last week when I wrote about the switch to micro-services by SoundCloud, I noted that APIs are predominantly a public-facing concern in monolithic applications. There is no arms-length relationship between providers and consumers of functional units, enabling a low-ceremony evolution of the internal interfaces. They are the ‘authorized personal only’ rooms in a fancy restaurant – as long as the dining room is spotless, we will ignore the fact that the gourmet meals are prepared by a cute rat that sounds a lot like Patton Oswald.

Put another way, APIs are not necessary in order to get the monolithic application up and running. They are important the moment you decide to share your data with third-party developers, write a mobile app, or enable partner integrations. Therefore, monolithic applications predominantly deal with public API.

Things are much different for a micro-service based distributed system. Before any thought is put in how the general public will interact with the system, micro-services need to figure out how they will interact with each other.

In the blog post about Node.js clustering, I pointed out that Node is inherently single-threaded, and clustering is required just to stretch to all the cores of a single server, never mind load balancing across multiple VMs. This ‘feature’ essentially makes clustering an early consideration, and switching from vertical to horizontal scaling (across multiple machines) mostly a configuration issue. Presumably your instances have already been written to share-nothing and do not really care where they are physically running.

Micro-service APIs are very similar in that regard. They force developers to start with a clean API for each service, and since a complex system is often built with several teams working in parallel, it would turn into a total chaos without clean contracts between the services. In micro-service systems, APIs are foundational.

Internal APIs – an oxymoron?

In the previous post, I put forward a few rules of writing a micro-service based distributed system that concern APIs. Here they are again:

  • Rule 3: APIs should be the only way micro-services talk to each other and the outside world.
  • Rule 4: Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.
  • Rule 5: Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.

The aforementioned Jim des Rivieres used to say that “there is no such a thing as internal API”. Interfaces are either firm contracts exhibiting all the qualities of APIs or they can change at any time without warning. There is no mushy middle ground. I tend to agree with him when it comes to monolithic systems, where ‘internal’ refers to ‘written for systems’ internal use only’. However, in distributed systems ‘internal’ refers to traffic between services, or between systems behind the firewall. It is more to do with ‘things we say to the members of our own family’, presumably versus ‘things we say to the outside world’.

In this context, ‘internal APIs’ is a legitimate thing because ‘internal’ refers to the visibility rules, not the quality of the API contract. Rule #4 above explicitly states that – there is nothing different about internal APIs except visibility.

Presenting unified API front

If APIs are the only way micro-services should communicate with each other and the outside world, the consumers need to be presented with a cleanly documented contract. Documenting the APIs cannot be an afterthought – it needs to be built with the micro-service, sometimes even before the documented endpoints actually work.

The fact that our distributed system is composed of micro-services is a great feature for us and our ability to quickly evolve and deploy the system with little of no downtime. However, API consumers can’t care less about it – they want one place to go to see all the APIs.

There are multiple ways of skinning that particular cat, but we have decided to do as follows:

  1. Proxy all the APIs to the common path (e.g. https://example.com/api)
  2. Expose the API version in the URL (I know, I know, we can yell at each other until the cows come home about how that is great or stupid, but many popular APIs are doing it and so are we). Thus the common path gets a version (e.g. https://example.som/api/v1)
  3. Reserve a segment after the version for each micro-service that exposes APIs (e.g. /projects, /users etc.).
  4. Provide API specification using a popular Open Source API doc solution

On the last point, we looked around and considered several alternatives, finally settling on Swagger by Wordnik. It is a popular solution, with a vibrant community, fairly well defined API spec, a reusable live API UI that can be included in our UI, and with a path forward towards version 2.0 that promises to address currently missing features (the current version is 1.2).

A micro-service based system using Swagger to define APIs could look like this:

swagger

Each micro-service that provides APIs will make a Swagger API doc resource available, describing all the endpoints, verbs, parameters and request/response bodies.  Documentation micro-service can render these in two ways – using Swagger Live UI and rendering static docs.

Swagger Live UI is available as an Open Source project and allows users to not only read the rendered documentation, but enter values and try it out in place. To see it in action, try out the Pat Store sample.

The UI is all client side, which makes it stack-agnostic and fit for being served by a multitude of platforms, but if you are aggregating your definitions like we do, you need go around browser’s Single-Origin limitation. You can either proxy the API definitions or use CORS. In our case, it helps that we proxy all the services to the single external URL root, which is on the same domain as the doc UI – problem solved.

I can stop now while I am ahead – this being the part 1 of a multi-part article. In the next installment, I will walk you through an example of two micro-services – one providing API for Projects, another for Users. We will spec out the API, document the spec using Swagger, write a Node.js app to serve the UI from these definitions, and also render an alternative static version of the API doc.

See you next week, off to write some API micro-services.

© Dejan Glozic, 2014

 

SoundCloud is Reading My Mind

Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, Cincinnati & New York, 1900
Marvelous feats in mind reading, The U.S. Printing Co., Russell-Morgan Print, 1900

“Bad artists copy. Good artists steal.”

– Pablo Picasso

It was bound to happen. In the ultra-connected world, things are bound to feed off of each other, eventually erasing differences, equalizing any differential in electric potentials between any two points. No wonder the weirdest animals can be found on islands (I am looking at you, Australia). On the internet, there are no islands, just a constant primordial soup bubbling with ideas.

The refactoring of monolithic applications into distributed systems based on micro-services is slowly becoming ‘a tale as old as time’. They all follow a certain path which kind of makes sense when you think about it. We are all impatient, reading the first few Google search and Stack Overflow results ‘above the fold’, and it is no coincidence that the results start resembling majority rule, with more popular choices edging out further and further ahead with every new case of reuse.

Luke Wroblewski of Mobile First fame once said that ‘two apps do the same thing and suddenly it’s a pattern’. I tend to believe that people researching the jump into micro-services read more than two search results, but once you see certain choices appearing in, say, three or four stories ‘from the trenches’, you become reasonably convinced to at least try them yourself.

If you were so kind as to read my past blog posts, you know some of they key points of my journey:

  1. Break down a large monolithic application (Java or RoR) into a number of small and nimble micro-services
  2. Use REST API as the only way these micro-services talk to each other
  3. Use message broker (namely, RabbitMQ) to apply event collaboration pattern and avoid annoying inter-service polling for state changes
  4. Link MQ events and REST into what I call REST/MQTT mirroring to notify about resource changes

Then this came along:

As I was reading the blog post, it got me giddy at the realization we are all converging on the emerging model for universal micro-service architecture. Solving their own unique SoundCloud problems (good problems to have, if I may say – coping with millions of users falls into such a category), SoundCloud developers came to very similar realizations as many of us taking a similar journey. I will let you read the post for yourself, and then try to extract some common points.

Stop the monolith growth

Large monolithic systems cannot be refactored at once. This simple realization about technical debt actually has two sub-aspects: the size of the system at the moment it is considered for a rewrite, and the new debt being added because ‘we need these new features yesterday’. As with real world (financial) debt, the first order of business is to ‘stop the bleeding’ – you want to stop new debt from accruing before attempting to make it smaller.

At the beginning of this journey you need to ‘draw the line’ and stop adding new features to the monolith. This rule is simple:

Rule 1: Every new feature added to the system will from now on be written as a micro-service.

This ensures that precious resources of the team are not spent on making the monolith bigger and the finish line farther and farther on the horizon.

Of course, a lot of the team’s activity involves reworking the existing features based on validated learning. Hence, a new rule is needed to limit this drain on resources to critical fixes only:

Rule 2: Every existing feature that requires significant rework will be removed and rewritten as a micro-service.

This rule is somewhat less clear-cut because it leaves some room for the interpretation of ‘significant rework’. In practice, it is fairly easy to convince yourself to rewrite it this way because micro-service stacks tend to be more fun, require fewer files, fewer lines of code and are more suitable for Web apps today. For example, we don’t need too much persuasion to rewrite a servlet/JSP service in the old application as a Node.js/Dust.js micro-service whenever we can. If anything, we need to practice restraint and not fabricate excuse to rewrite features that only need touch-ups.

US_Beef_cuts_svg
Micro-services as BBQ. Mmmmm, BBQ…

An important corollary of this rule is to have a plan of action ahead of time. Before doing any work, have a ‘cut of beef’ map of the monolith with areas that naturally lend themselves to be rewritten as micro-services. When the time comes for a significant rework in one of them, you can just act along that map.

As is the norm these days, ‘there’s a pattern for that’, and as SoundCloud guys noticed, the cuts are along what is known as bounded context.

Center around APIs

As you can read at length on the API evangelist’s blog, we are transforming into an API economy, and APIs are becoming a central part of your system, rather than something you tack on after the fact. If you could get by with internal monolith services in the early days, micro-services will force you to accept APIs as the only way you communicate both inside your system and with the outside world. As SoundCloud developers realized, the days of integration around databases are over – APIs are the only contact points that tie the system together.

Rule 3: APIs should be the only way micro-services talk to each other and the outside world.

With monolithic systems, APIs are normally not used internally, so the first APIs to be created are outward facing – for third party developers and partners. A micro-service based system normally starts with inter-service APIs. These APIs are normally more powerful since they assume a level of trust that comes from sitting behind a  firewall. They can use proprietary authentication protocols, have no rate limiting and expose the entire functionality of the system. An important rule is that they should in no way be second-class compared to what you would expose to the external users:

Rule 4: Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.

Once you have the internal APIs designed this way, deciding which subset to expose as public API stops becoming a technical decision. Your external APIs look like internal with the exception of stricter visibility rules (who can see what), rate limiting (with the possibility of a rate-unlimited paid tier), and authentication mechanism that may differ from what is used internally.

Rule 5: Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.

SoundClound developers went the other way (public API first) and realized that they cannot build their entire system with the limitations in place for the public APIs, and had to resort to more powerful internal APIs. The delicate balance between making public APIs useful without giving out the farm is a decision every business need to make in the API economy. Micro-services simply encourage you to start from internal and work towards public.

Messaging

If there was a section in SoundCloud blog post that made me jump with joy was a section where they discussed how they arrived at using RabbitMQ for messaging between micro-services, considering how I write about that in every second post for the last three months. In their own words:

Soon enough, we realized that there was a big problem with this model; as our microservices needed to react to user activity. The push-notifications system, for example, needed to know whenever a track had received a new comment so that it could inform the artist about it. At our scale, polling was not an option. We needed to create a better model.

 

We were already using AMQP in general and RabbitMQ in specific — In a Rails application you often need a way to dispatch slow jobs to a worker process to avoid hogging the concurrency-weak Ruby interpreter. Sebastian Ohm and Tomás Senart presented the details of how we use AMQP, but over several iterations we developed a model called Semantic Events, where changes in the domain objects result in a message being dispatched to a broker and consumed by whichever microservice finds the message interesting.

I don’t need to say much about this – read my REST/MQTT mirroring post that describes the details of what SoundCloud guys call ‘changes in the domain objects result in a message’. I would like to indulge in a feeling that ‘great minds think alike’, but more modestly (and realistically), it is just common sense and RabbitMQ is a nice, fully featured and reliable open source polyglot broker. No shocking coincidence – it is seen in many installations of this kind. Let’s make a rule about it:

Rule 6: Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

All together now

Let’s pull all the rules together. As we speak, teams around the world are suffering under the weight of large unwieldy monolithic applications that are ill-fit for the cloud deployment. They are intrigued by micro-services but afraid to take the plunge. These rules will make the process more manageable and allow you to arrive at a better system that is easier to grow, deploy many times a day, and more reactive to events, load, failure and users:

  1. Every new feature added to the system will from now on be written as a micro-service.
  2. Every existing feature that requires significant rework will be removed and rewritten as a micro-service.
  3. APIs should be the only way micro-services talk to each other and the outside world.
  4. Internal APIs should be documented and otherwise written as if they will be exposed to the open Internet at any point.
  5. Public APIs are a subset of internal APIs with stricter visibility rules, rate limiting and separate authentication.
  6. Use a message broker to stay in sync with changes in domain models managed by micro-services and avoid polling.

This is a great time to build micro-service based systems, and collective wisdom on the best practices is converging as more systems are coming online. I will address the topic of APIs in more detail in one of the future posts. Stay tuned, and keep reading my mind!

© Dejan Glozic, 2014

For Once, Being Reactive is Good

5 Gum - React
5 Gum – React

Apple said Monday that it sold more than 300,000 iPads on the first day of its launch, ushering a new era of people buying things in order to find out what they are.

 

SNL Weekend Update, season 35, episode 18

All my life, I thought ‘reaction’ was a bad word. Ever since the French Revolution, being ‘reactionary’ could get you into a lot of trouble. More recently (and less detrimental to your health and limb count), being in ‘reactionary’ mode is considered merely an anti-pattern. We were all in situations in life where we felt like we were merely reacting to changes foisted upon us, like tall grass helplessly flailing on a windy day. We all want to be the wind, not the grass.

As you could read only everywhere, including my own blog post, Agile movement has been declared dead (although ‘agility’ is still fine and dandy, thank you). Being communal people and in need of an idea to gather around, and not liking the traditional organized religions’ early hours, we looked for a more suitable replacement.

Not that others were not trying, and even before Agile’s passing. For example, Adam Wiggins of Heroku extraction has channeled his inner L. Ron Hubbard by establishing his Church of 12 Factors (kudos for pulling it off without one reference to space ships). It it is chock-full of Cloud-y goodness and is actually quote good and useful. I think Adam is now beating himself up for not waiting a bit and slapping ‘micro-services’ all over the manifest, because that is totally what ’12 factors’ are about.

According to Adam, 12-factor apps:

  • Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;
  • Have a clean contract with the underlying operating system, offering maximum portability between execution environments;
  • Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
  • Minimize divergence between development and production, enabling continuous deployment for maximum agility;
  • And can scale up without significant changes to tooling, architecture, or development practices.

So what went wrong? It is still a worthwhile document, and I keep revisiting it often, but it lacks the galvanizing force that true movements have. Maybe there are just too many factors (even actual religions knew to stop at 10), or maybe because it sounds too much like inspirational lifestyle articles such as 12 Lifestyle Factors That Make You Feel Depressed.

Then the Agile thing happened, and it was time to get cracking. It was not a long wait – behold The Reactive Manifesto.

Now, I make it sound like it all happened in a neat chronological order (something my buddy Adrian Rossouw would organize in a Wayfinder timeline), but it did not. The first version of the manifesto was published by Jonas Boner and friends and described on the Typesafe blog in July 2013. It was uploaded to GitHub and the community was invited to help tweak the document. The current version (1.1) dates September 2013 and is signed by thousands of believers (I meant ‘supporters’). In Jonas’ own words, the motivation for putting the manifesto forward was:

The primary motivation for this manifesto is to come up with a name for these new type of applications (similar to NOSQL, Big Data, SOA and REST) and to define a common vocabulary for describing them — both in terms of business values and technical concepts. Names matter and hopefully this will help bring the communities together and make it easier for users and vendors to talk about things and understand each other.

 

Up to now the usual way to describe this type of application has been to use a mix of technical and business buzzwords; asynchronous, non-blocking, real-time, highly-available, loosely coupled, scalable, fault-tolerant, concurrent, reactive, event-driven, push instead of pull, distributed, low latency, high throughput, etc. This does not help communication, quite the contrary, it greatly hinders it.

The four traits

In its core, The Reactive Manifesto puts forward four reactive traits a modern distributed system should possess (notice the small number of key tenets – that’s how it’s done). Let’s see how these qualities intersect with the modern micro-service based systems I was writing about in the last few months:

  1. Reactive to events – a modern micro-service based distributed system is asynchronous by nature, with each service sitting dormant until an event wakes it up. Lest it turns out we are talking REST only, a loosely coupled system using some kind of message broker is a better fit because it offers further decoupling. Publishing into a pub/sub topic does not require any knowledge of the possible consumers of the message in a way that A->B REST calls do. And of course, while the authors of the manifest seem to be coming from the Scala background (with Play framework also playing a part), it is easy to notice that Node.js is an even better fit. Its asynchronous nature ‘all the way down’ ensures non-blocking way of reacting to events.
  2. Reactive to load – a corollary of the micro-service based system is the freedom to scale out each service independently of the rest of the system. The ability to instrument the nodes and cluster hot spots while living the less popular services as-is is of great help in the cloud environments. Cloud resources are finite and cost money. Knowing which nodes to cluster (and more importantly, where that would be overkill) is essential to arriving at a system that is reactive to load while still keeping the monthly bill reasonable.
  3. Reactive to failure – when there are many moving parts, failure is inevitable. Successful ‘born on the net’ companies with complex distributed systems not only guard against failure, they openly embrace it (who hasn’t heard about the Netflix’s Chaos Monkey).
  4. Reactive to users – this part is a bit confusing. You would think that the four reactive traits are like ‘four pillars of heaven’ or ‘the four elements’ (minus Mila Jovovic). As it turns out, the previous three reactive properties are preconditions to the system being reactive to users – by providing real time, engaging, performant user experiences. Being reactive to events, load and failure will simply increase your chances to be reactive to users in a way that will keep them from leaving in frustration.

Reactive reactions

When you research a topic, and the second Google search hit after the actual topic is “Reactive Manifesto bulls**t”, you cannot resist clicking on the link. In it, Paul Chiusano argues that Reactive Manifest is not only not right, it’s not even wrong. It looks like the ultimate insult is being banished into the binary system limbo, where you are neither right nor wrong, you are, well, nothing (it’s like Louis CK joke that because he owes $50, he needs to raise $50 just to be broke).

Of course, there are positive reactions and people who don’t really care – the usual spectrum.

Here is what I think: I am actually positive about the intent of Reactive Manifesto. First off, it redefines ‘reactive’ as something positive. Of course it didn’t invent anything new, but that is not the first time in history somebody came and put a name on something we were already doing but didn’t know it. Remember JJG and Ajax? He didn’t invent it – he just put a name on a technique that is the bedrock of any modern client side application. How about ‘micro-services’? Many people fail to see how they are different from SOA or just plain ‘services’ or ‘distributed systems’ – a lot of haters in that camp too.

But here is what my first reaction was: coming from a former communist country, my first association was with a guy whose beard is every hipster’s dream – Karl Marx. When he and his buddy Engels came forward with the Communist Manifesto, they didn’t invent alienation, oppression and capitalism’s seedy underbelly. They just articulated them succinctly and gave something to millions of disgruntled workers around the world to rally around. It didn’t turn out all that well in hindsight, but notice what I am getting at here – manifestos don’t invent new things, they put a names on concepts, helping the adherents galvanize around them and form movements.

So to those who say ‘The Reactive Manifesto’ didn’t invent anything new, you are right: that was not the intention. Go read your history or  Google ‘manifesto’.

The ‘What, Why, How’ trifecta

I would like to circle back to ’12 factors’ and ‘micro-services’ and claim that we now have a fairly complete set of answers to big questions we may ask while we build modern distributed systems:

  1. What are we building? A micro-service based distributed system.
  2. Why are we building it? Because we want it to be reactive to events, load, failure and users.
  3. How are we building it? Using the techniques and recommendations outlined in the 12-factors.

There you go – no need to choose one over the other – a simple ‘1 + 4 + 12’ formula that will bring you happiness and make you rich.

As for me, I am taking a leaf from the SNL book – I am going to the first Toronto Reactive Meetup. In fact, I am raising SNL by not only participating but actually speaking at it – a new era of people presenting on topics to find out what they are.

If you are in Toronto on June 24, join us – in addition to reading my musings, you can see me deliver them in full HD.

© Dejan Glozic, 2014