Social Micro-Services: Activity Streams 2.0

Awkward social encounters, Wikimedia Commons, Chrisrobertsantieau
Awkward social encounters, Wikimedia Commons, Chrisrobertsantieau

If I had a dollar for every time somebody mentioned the phrase ‘social’ to me in the last couple of years, my pants would suffer since I tend to carry all my coins in my right pocket. With my soft credit card wallet. I carry my iPhone in my left pocket, where I carry cash in paper bills (paper bills will not scratch the ‘naked’ iPhone, while metal coins might). No, I don’t have Asperger’s, why do you ask?

Tapping into the opportunity to exploit user activity is now considered a norm in any system of a decent complexity. Users of a system create a social trail that can be put to good use to improve collaboration and insight.

More conveniently, adding social dimension to a system that already provides value on its own is an order of a magnitude easier proposition. Think about it: dedicated social networks such as Facebook and Twitter vitally depend on users generating primary data by posting updates, pictures, videos and otherwise exposing themselves to the advertisers. In a system where users go to accomplish a primary task (track and plan, write code, build and deploy code, monitor running apps etc.), social events are happening as a side effect of user activity – they are not the sole value proposition of the system. All we need to do is capture and distribute these social events – no extra user effort required.

Another characteristic of value systems (in contrast to pure social systems) is that social activity is not limited to users. A part of the system can kick in, perform a task and then notify interested users about it. Before we called this ‘social’, they were ‘events’ and/or ‘notifications’. We now understand that activity of programmatic agents can easily and usefully appear on your social stream, mixed with the updates of actual users.

Social streams and micro-services

If you have followed this blog recently, you already know we now prefer to build our systems using micro-services. Extracting social streams out of such a system seems plausible for the following reasons:

  1. Overall activity of a micro-service-based system is a combination of activities of each individual service
  2. We are already using a message broker to ensure efficient and flexible message passing between micro-services
  3. Micro-services already publish messages when their state changes, mostly as a result of user actions
  4. Having a dedicated activity stream micro-service to add social dimension to the system is in itself consistent with micro-service architecture

OK, sounds like a plan. As I have already written in my post on clustering and messaging, we need a dedicated service that aggregates social activities from all the corners of your micro-service system. Our first instinct may be to tap into existing messages already flowing around, but this may turn out not to be a good idea:

  1. Messages that are published by the micro-services tend to supplement REST API. Not every CRUD event in every service is a social event worthy of appearing in activity feeds.
  2. There may not be enough data in the CRUD messages to build up a complete activity record.

For these reasons, it is a better practice to dedicate a separate messaging channel (or ‘topic’) for social activities and let micro-services choose which subset of the CRUD message traffic is social-worthy, and if so, publish a message that contains all the additional information required by the social stream.

Anatomy of an activity

What would that information be? We don’t have to guess – there is a public specification available to follow. An activity typically consists of an actor, verb and object, and optionally a target. In an activity that can be expressed as “Jane posted a picture to the album ‘Jane’s Vacation'”, we can see all four (Jane is the actor, ‘post’ is the verb, ‘picture’ is the object and ‘album’ is the target). Expressed using Activity Stream draft 2.0 JSON syntax, it could look like this:

{
   "verb": "post",
   "published": "2011-02-10T15:04:55Z",
   "language": "en",
   "actor": {
     "objectType": "person",
     "id": "urn:example:person:jane",
     "displayName": "Jane Doe",
     "url": "http://example.org/jane",
     "image": {
       "url": "http://example.org/jane/image.jpg",
       "mediaType": "image/jpeg",
       "width": 250,
       "height": 250
     }
   },
   "object" : {
     "objectType": "picture",
     "id": "urn:example:picture:abc123/xyz"
     "url": "http://example.org/pictures/2011/02/pic1",
     "displayName": "Jane jumping into water"
   },
   "target" : {
     "objectType": "album",
     "id": "urn:example:albums:abc123",
     "displayName": "Jane's Vacation",
     "url": "http://example.org/pictures/albums/janes_vacation/"
   }
}

Notice that an equivalent CRUD message produced as a result of a new picture resource being added in a Picture micro-service that manages images would follow the REST POST action that was performed to add the picture:


POST /pictures/albums/janes_vacation

In the command above, the new picture that Jane added to the album was in the HTTP request body.

As you may have noticed, CRUD messages are resource-centric, while activities are actor-centric. A Web page rendering the ‘Jane’s Vacation’ album will want to refresh to include a new picture (possibly using Web Sockets), but does not care who initiated the action. This is why it is hard to ‘synthesize’ activities out of CRUD messages – it is much better for the micro-service at the source to fire a clean, well formed activity object according to the public spec from the get go. It is virtually impossible to synthesize an activity example as shown above unless you are the service owning the data.

A vital part of firing a new activity is audience targeting. Let’s say that there is a micro-service that manages projects in a system. The project owner has decided to change the project description. Who should receive this activity on their personal social stream? There are two ways to implement this – user-centric and service-centric:

In a user-centric implementation, each user has a social graph of relationships. When an activity is performed by a node in her social graph, it should end up on her personal social stream. This approach looks very logical but is actually hard to implement if you are not Facebook or Twitter. I don’t think it is actually necessary in a system where social is enhancing to the primary value, rather than the value itself.

In a service-centric implementation, we assume that when an event occurs that is deemed social, the service has all the information it needs to determine activity’s primary and secondary audience. It so happens that activity stream specification has just such a feature. In our example with changing the project description, the service already knows all the members of the project, and all the users who are ‘subscribed’ or ‘watching’ the project somehow. Therefore, it should fire an activity like this:

{
   ....
   "to":  [{ "objectType": "person",
             "id": "johndoe"},
           { "objectType": "project",
             "id": "xxzqIHH_556X" }
   ],
   "cc":  [{ "objectType": "person",
             "id": "fredf"},
           { "objectType": "person",
             "id": "jasonj"}
   ]
}

In the example above, the activity is addressed to John Doe (the owner of the project) and the project’s dedicated activity stream, while “Fred F” and “Jason J” who are ‘watching’ the project will receive the update by the virtue of being on the “cc” list. This illustrates another powerful feature of the audience targeting – the ability to target object types other than ‘person’.

When such an activity arrives at the dedicated activity stream micro-service, it can simply store a copy in the social stream of each of the targets in the target audience. The publishing service has done all the work by identifying the audience – the activity stream service will simply honor the directive.

Social streams can be used to mix events from various sources. For example, system-wide alerts and broadcasts can end up on personal streams as well (things like ‘maintenance restart in 5 minutes’) for awareness and audit purposes.

Similarly, activities performed by various engines can be mixed with activities performed by actual users – activity stream specification is flexible enough to allow actors other than persons. That’s why you can have an activity such as ‘Continuous Integration started a build #45’ as well as ‘Build #45 failed with 45 errors’.

iphone-activities-ee2

Filtering

Finally, activity stream specification fits micro-services like a glove when it comes to filtering. Aggregating all the system chatter produce a fire-hose activity stream that ends up ignored due to its multitude of entries. This is where semantics of activity streams is superior to the RSS/ATOM feeds. Each micro-service can provide a definition of verbs and object types it intends to use in its activities. Since the core set of verbs and object types can easily becomes inadequate for a typical complex system, the definitions of extensions are vital to allow for powerful filtering based on verbs and object types, something like:

  • Hide all ‘build’ updates – filtering based on object type
  • Hide all ‘build succeeded’ updates – filtering based on object type and verb combination
  • Hide all updates like this – filtering of future occurrences of a something you don’t care about

Housekeeping

A micro-service system of a decent size can quickly produce a lot of activity chatter. If some of these activities target multiple users, copies per user add to ever growing database. Obviously we need to draw a line somewhere.

Again, social streams in a system where social data is not the primary value are less critical when it comes to data preservation. The assumption is that the primary data is safely stored, and if data changes need to be preserved for audit purposes, this audit trail is itself safely stored. Social streams are just an echo of these auditable changes, and do not need to be preserved in long term storage.

You can experiment with your activity stream micro-service storage, but keeping a week worth of social streams may be plenty. Alternatively, you can draw a line at a number of activities, or storage size, or a combination of all three.

Whichever method you pick, you need to run a ‘pruner’ task that deletes old activities from the database. In a distributed system based on micro-services, 12factors comes to the rescue with a recommendation for running admin tasks as one-off processes.

And there you have it. In a distributed system based around micro-services, there are already messages flying around. Opening up a social channel and collecting dedicated messages into a social stream is just an extra step that will help your users with the added insight into the activity of the system, and the actions of other users and agents they should know about.

In addition to the activity streams draft 2.0 spec, there is now a GitHub project with both client and server side implementation. It appears in the early days but if you don’t want to write everything from scratch, a Java as well as Node.js implementation is readily available – give it a test drive and let me know what you think.

© Dejan Glozic, 2014

Advertisements

Micro-Service Fun – Node.js + Messaging + Clustering Combo

monty-1920-1200-wallpaper
Micro-Services in Silly Walk, Monty Python

A:    I told you, I’m not allowed to argue unless you’ve paid.
M:   I just paid!
A:   No you didn’t.
M:   I DID!
A:   No you didn’t.
M:  Look, I don’t want to argue about that.
A:  Well, you didn’t pay.
M:  Aha. If I didn’t pay, why are you arguing? I Got you!
A:   No you haven’t.
M:  Yes I have. If you’re arguing, I must have paid.
A:   Not necessarily. I could be arguing in my spare time.

 

Monty Python, ‘The Argument Clinic’

And now for something completely different. I decided to get off the soap box I kept climbing recently and give you some useful code for a change. I cannot stray too much from the topic of micro-services (because that is our reality now), or Node.js (because they is the technology with which we implement that reality). But I can zero in on a very practical problem we unearthed this week.

First, a refresher. When creating a complex distributed system using micro-services, one of the key problems to solve is to provide for inter-service communication. Micro-services normally provide REST APIs to get the baseline state, but the system is in constant flux and it does not take long until the state you received from the REST API is out of date. Of course, you can keep refreshing your copy by polling but this is not scalable and adds a lot of stress to the system, because a lot of these repeated requests do not result in new state (I can imagine an annoyed version of Scarlett Johansson in ‘Her’, replying with ‘No, you still have no new messages. Stop asking.’).

A better pattern is to reverse the flow of information and let the service that owns the data tell you when there are changes. This is where messaging comes in – modern micro-service based systems are hard to imagine without a message broker of some kind.

Another important topic of a modern system is scalability. In a world where you may need to quickly ramp up your ability to handle the demand (lucky you!), an ability to vertically or horizontally scale your micro-services on a moments notice is crucial.

Vertical and Horizontal Scalability

An important Node.js quirk explainer: people migrating from Java may not understand the ‘vertical scalability’ part of the equation. Due to the auto-magical handling of thread pools in a JEE container, increasing the real or virtual machine specs requires no effort on the software side. For example, if you add more CPU cores to your VM, JEE container will spread out to take advantage of them. If you add more memory, you may need to tweak the JVM settings but otherwise it will just work. Of course, at some point you will need to resort to multiple VMs (horizontal scalability), at which point you may discover that your JEE app is actually not written with clustering in mind. Bummer.

In the Node.js land, adding more cores will help you squat unless you make a very concrete effort to fork more processes. In practice, this is not hard to do with utilities such as PM2 – it may be as easy as running the following command:

pm2 start app.js -i max

Notice, however, that for Node.js, vertical and horizontal scalability is the same regarding the way you write your code. You need to cluster just to take advantage of all the CPU cores on the same machine, never mind load balancing multiple separate servers or VMs.

I actually LOVE this characteristic of Node.js – it forces you to think about clustering from the get go, discouraging you from holding onto data between requests, forcing you to store any state in a shared DB where it can be accessed by all the running instances. This makes the switch from vertical to horizontal scalability a non-event for you, which is a good thing to discover when you need to scale out in a hurry. Nothing new here, just basic share-nothing goodness (see 12factors for a good explainer).

However, there is one important difference between launching multiple Node.js processes using PM2 or Node ‘cluster’ module, and load-balancing multiple Node servers using something like Nginx. With load balancing using a proxy, we have a standalone server binding to a port on a machine, and balancing and URL proxying is done at the same time. You will write something like this:

http {
    upstream myapp1 {
        server srv1.example.com;
        server srv2.example.com;
        server srv3.example.com;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myapp1;
        }
    }
}

If you try to launch multiple Node servers on a single machine like this, all except the first one will fail because they cannot bind to the same port. However, if you use Node’s ‘cluster’ module (or use PM2 with uses the same module under the covers), a little bit of white magic happens – the master process has a bit of code that enables socket sharing between the workers using a policy (either OS-defined, or ’round-robin’ as of Node 0.12). This is very similar to what Nginx does to your Node instances running on separate servers, with a few more load balancing options (round-robin, least connected, IP-hash, weight-directed).

Now Add Messaging to the Mix

So far, so good. Now, as it is usual in life, the fun starts when you combine the two concepts I talked about (messaging and clustering) together.

To make things more concrete, let’s take a real world example (one we had to solve ourselves this week). We were writing an activity stream micro-service. Its job is to collect activities expressed using Activity Stream 2.0 draft spec, and store them in Cloudant DB, so that they can later be retrieved as an activity stream. This service does one thing, and does it well – it aggregates activities that can originate anywhere in the system – any micro-service can fire an activity by publishing into a dedicated MQTT topic.

On the surface, it sounds clear cut – we will use a well behaved mqtt module as a MQTT client, RabbitMQ for our polyglot message broker, and Node.js for our activity micro-service. This is not the first time we are using this kind of a system.

However, things become murky when clustering is added to the mix. This is what happens: MQTT is a pub/sub protocol. In order to allow each subscriber to read the messages from the queue at its own pace, RabbitMQ implements the protocol by hooking up separate queues for each Node instance in the cluster.

mqtt-cluster2

This is not what we want. Each instance will receive a ‘new activity’ message, and attempt to write it to the DB, requiring contention avoidance. Even if the DB can prevent all but one node to succeed in writing the activity record, this is wasteful because all the nodes are attempting the same task.

The problem here is that ‘white magic’ used for the clustering module to handle http/https server requests does not extend to the mqtt module.

Our initial thoughts around solving this problem were like this: if we move the message client to the master instance, it can react to incoming messages and pass them on to the forked workers in some kind of ’round- robin’ fashion. It seemed plausible, but had a moderate ‘ick’ factor because implementing our own load balancing seemed like fixing a solved problem. In addition, it would prevent us from using PM2 (because we had to be in control of forking the workers), and if we used multiple VMs and Nginx load balancing, we would be back to square one.

Fortunately, we realized that RabbitMQ can already handle this if we partially give up the pretense and acknowledge we are running AMQP under the MQTT abstraction. The way RabbitMQ works for pub/sub topologies is that publishers post to ‘topic’ exchanges, that are bound to queues using routing keys (in fact, there is direct mapping between AMQP routing keys and MQTT topics – it is trivial to map back and forth).

The problem we were having by using MQTT client on the consumer side was that each cluster instance received its own queue. By dropping to an AMQP client and making all the instances bind to the same queue, we let RabbitMQ essentially load balance the clients using ’round-robin’ policy. In fact, this way of working is listed in RabbitMQ documentation as work queue, which is exactly what we want.

amqp-cluster

OK, Show Us Some Code Now

Just for variety, I will publish MQTT messages to the topic using Eclipse PAHO Java client. Publishing using clients in Node.js or Ruby will be almost identical, modulo syntax differences:

	public static final String TOPIC = "activities";
	MqttClient client;

	try {
		client = new MqttClient("tcp://"+host+":1883",
                                          "mqtt-client1");
		client.connect();
		MqttMessage message = new MqttMessage(messageText.getBytes());
		client.publish(TOPIC, message);
	    System.out.println(" [x] Sent to MQTT topic '"
                                  +TOPIC+"': "+ message + "'");
		client.disconnect();
	} catch (MqttException e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}

The client above will publish to the ‘activities’ topic. What we now need to do on the receiving end is set up a single queue and bind it to the default AMQP topic exchange (“amq.topic”) using the matching routing key (again, ‘activities’). The name of the queue does not matter as long as all the Node workers are using it (and they will by the virtue of being clones of each other).

var amqp = require('amqp');

var connection = amqp.createConnection({ host: 'localhost' });

// Wait for connection to become established.
connection.on('ready', function () {
  // Use the default 'amq.topic' exchange
  connection.queue('worker-queue', { durable: true}, function (q) {
      // Route key identical to the MQTT topic
      q.bind('activities');

      // Receive messages
      q.subscribe(function (message, headers, deliveryInfo, messageObject) {
        // Print messages to stdout
        console.log('Node AMQP('+process.pid+'): received topic "'+
        		deliveryInfo.routingKey+
        		'", message: "'+
        		message.data.toString()+'"');
      });
  });
});

Implementation detail: RabbitMQ MQTT plug-in uses the default topic exchange “amq.topic”. If we set up a different exchange for the MQTT traffic, we will need to explicitly name the exchange when binding the queue to it.

Any Downsides?

There are scenarios in which it is actually beneficial for all the workers to receive an identical message. When the workers are managing Socket.io chat rooms with clients, a message may affect multiple clients, so all the workers need to receive it and decide if it applies to them. A single queue topology used here is limited to cases where workers are used strictly for scalability and any single worker can do the job.

From the more philosophical point of view, by resorting to AMQP we have broken through the abstraction layer and removed the option to swap in another MQTT broker in the future. We looked around and noticed that other people had to do the same in this situation. MQTT is a pub/sub protocol and many pub/sub protocols have a ‘unicast’ feature which would deliver the message to only one of the subscribers using some kind of a selection policy (that would work splendidly in our case). Unfortunately, there is no ‘unicast’ in MQTT right now.

Nevertheless, by retaining the ability to handle front end and devices publishing in MQTT we preserved the nice features of MQTT protocol (simple, light weight, can run on the smallest of devices). We continue to express the messaging side of our APIs using MQTT. At the same time, we were able to tap into the ‘unicast’ behavior by using the RabbitMQ features.

That seems like a good compromise to me. Here is hoping that unicast will become part of the MQTT protocol at some point in the future. Then we can go back to pretending we are running a ‘mystery MQTT broker’ rather than RabbitMQ for our messaging needs.

Nudge, nudge. Wink wink. Say no more.

© Dejan Glozic, 2014

We Don’t Have MongoDB – is CouchDB OK?

First Choice Colas, Wikimedia Commons, 2007, Erik1980
First Choice Colas, Wikimedia Commons, 2007, Erik1980

‘It can’t be helped,’ the saddlebum said. ‘What’s happened is, you’ve overloaded your analogizing faculty, thereby blowing a fuse. Accordingly, your perceptions have taken up the task of experimental normalization. This state is known as “metaphoric deformation”‘.

 

Robert Sheckley, “Mind Swap”, 1966.

One of the Sci-Fi books that left a deep impression on my young mind was Mind Swap by Robert Sheckley (no, I didn’t read it in 1966, in case you were wandering – I am not that old). It introduced a number of hilarious concepts that were surprisingly applicable to the real world, and not just the crazy ones he created in the book. One of them was the aforementioned ‘metaphoric deformation’. According to Robert, it is a disease that often befalls interstellar travellers, where a mind overwhelmed by a stimulation it cannot process and make sense of, starts generating more bearable hallucinations.

I think about metaphoric deformation a lot these days when the world of software development is changing at an unprecedented pace, and in multiple dimensions at once. In our development workflow, we create a set of roles for libraries, databases, languages and platforms. We understand that the things have changed and are willing to replace A for B but are desperately trying to keep the old habits in place, just replace the players.

Case in point – the MEAN stack. The open source world was happy with the LAMP stack (Linux, Apache, MySQL, PHP). Facebook now sits on the world-biggest mountain of PHP and stores precious big data in racks upon racks of MySQL because that’s what Mark reached for in his Harvard dorm room to create thefacebook.com (OK, I know they bolted Hack on top of PHP but still). Now that the world has changed unrecognizably, developers are searching for another four-letter stack lest they slip into a metaphoric deformation of their own.

And yet, that’s just a crutch. Let’s analyze what MEAN stands for:

  • M is for MongoDB – a goto NoSQL database for MySQL withdrawal, it recommends itself mostly because of the SQL-like query language that makes porting manageable.
  • E is for Express.js – a middleware framework for Node.js that allows you to create the familiar world of server side MVC.
  • A is for Angular.js – because if MVC on the server is good, MVC on the client must be great, and the good people of Google have thought about everything so that you don’t have to (and if you do, it’s kind of hard to wrestle the control back anyway).
  • N is for Node.js – because awesome.

See what we have done? We just recreated our old world using the new tech. Very human, but you will soon realize that the old world is gone for good and we will have to do better.

Fearful for the fragile minds of their fellow citizens, the first engineers to experiment with cars simply set out to create ‘horseless carriages’. Replace a horse with an engine, keep everything else as it has always been since the dawn of time. Now, look around you – does that 10 air bag, GPS-connecting, direct-fuel-injecting, microprocessor-controlled car that just whizzed by has any semblance to the horseless carriage to you?

Nowhere is this inclination to replace the parts but keep the structure more bizarre than in the current software world. I am not the first to notice – Forrest Norvel from New Relic has observed it in his presentation Beyond the MEAN Stack.

Forrest noticed that Node.js is all about modularity and freedom of choice. For every letter in the MEAN stack (except N, of course, lets not get crazy here) there is an alternative. Instead of MongoDB, there are other NoSQL databases. Instead of Express.js, there is Hapi.js (created by Eran Hammer, a person I deeply admire ever since he left OAuth 2.0 Workgroup with a bang). Even the author of Express, the prolific TJ Holowaychuk, moved on to the next generation framework Koa that uses ES6 generators and looks very promising. Instead of Angular, there is Ember.js, or Backbone.js, or Knockout.js, or none at all (why do you think you absolutely need MVC on the client if some jQuery or even just vanilla JS can suffice?).

Adrian Rossouw dared to ask the right question – why the crazy stampede to MongoDB? There are many situations where another NoSQL database is a better fit, so why instinctively reaching for ‘MySQL of NoSQLs’, other than to keep you from slipping into metaphoric deformation?

Due to some legal issues (AGPL license of MongoDB server is giving our lawyers heartburn), we were slow to make that instinctive choice ourselves. Then IBM bought Cloudant, and it was an easy decision – use our very own NoSQL DB, and let somebody else manage it to boot. True master-master replication can’t hurt either.

Not so fast. In case you are not familiar, Cloudant is a hosted CouchDB fork with the *Apache Lucene on top. Just looking at CouchDB itself, the goal was to do the key things well, not help developers transition to the new world by dragging in the old habits. We were staring at map/reduce views feeling the anxiety of the New, looking for the ways to make dynamic queries.

Turns out you can’t. That’s what Apache Lucene was there for – if you need to create ad hoc queries based on some computed constraints, you use Apache Lucene. Map/reduce views are for predefined queries – things you can craft in advance.

Paul Klicnik, a developer from our team, spent a lot of time banging his head against the Cloudant wall, looking wistfully over into the MongoDB land where he could have just formed a dynamic query using JavaScript. However, when he made the realization he could use Apache Lucene for that, things started looking up. In the end, we ended up with a very fast, solid implementation of a Node.js micro-service that uses nano module to persist data in Cloudant.

In Paul’s own words:

My experience is that map/reduce is a little tricky to get the hang of, but it’s very powerful. Now that I’ve experienced it for myself, I can say I like it a lot. I’m actually using a regular view with map/reduce for retrieving user filters, and it works great. There is something very simple and elegant about Couch. It’s focused on being very good in certain scenarios, and not trying to be a general purpose database which can solve every problem. I also like that the API is RESTful – I know it’s a small detail, but REST is already pretty natural for us, and it makes it easy to work with.

Apparently, Paul is firmly in the new world now – using CouchDB in a way it was intended, not by trying to pretend it is an SQL database where you don’t have to create tables any more.

Of course, not everything is perfect:

Without the Lucene functionality, we would not be able to use CouchDB for the type of querying we need to do. I have mixed feeling about this. So far, I have had zero problems with Lucene and it performs great, but I’m uneasy about the fact that someone essentially shoe-horned in a major feature. I view Cloudant as a two headed monster – one head being normal CouchDB that behaves as expected, the other a weird appendage that seems to be behaving normally but you’re never quite sure if it’ll bite your head off. I’m being a bit dramatic, but I’m not fond of solutions where someone alters the functionality and attempts to make a product do something that it was never intended to do, ultimately comprising the system on the way. I realize that this is a purist view; maybe under the covers everything is technically sound.

I have added Paul’s second comment in case you interpreted this post as an ad for Cloudant. We are just finding our way in the new world and trying to use it as intended, and sometimes it gets a little weird. I guess that’s half the fun.

As I always say, “si fueris Rōmae, Rōmānō vīvitō mōre (rough translation from Latin: “when in Rome, for the love of God stop eating at McDonald’s – there are like a million trattorias serving actual Italian pizza and stuff”). Stop searching for new stacks to recreate your familiar world and instead open your mind for the new ways of looking at things.

Or even better – don’t create a single stack. For each element of the old stack, go through several awesome choices available today, and take a pick based on the suitability for the task at hand. Contractors don’t have one hammer, one saw and one drill set – why should you?

Oh, and one more thing – stop drinking caffeinated sugary drinks. Both of them.

 

*The original version of the article claimed Cloudant uses ‘Lucene Elastic Search’. This was a mistake, although Apache Lucene (definitely not Elastic, or Lucene Search) is involved, as per company website.

© Dejan Glozic, 2014

Micro-Services – Fad, FUD, Fo, Fum

The Triumph of Death, Pieter Bruegel, 1562
The Triumph of Death, Pieter Bruegel, 1562

But band leader Johnny Rotten didn’t sell out in the way that you’re thinking. He didn’t start as a rebellious kid and tone it down so he could cash in. No, Rotten did the exact opposite — his punk rebelliousness was selling out.

[…] And he definitely wasn’t interested in defining an entire movement.

From 5 artistic geniuses who only became great after selling out

We all intrinsically know that internet cycle is shortening, but what is happening these days with micro-services is bordering on ridiculous. But let’s start from another cycle that is now circling the mortal coil on an ever shorter leash – the Agile movement. Let’s hear it from @pragdave (one of the founding fathers) himself:

Ignoring the typo in the tweet (Dave Thomas obviously meant ‘word’, not ‘work’), this is a familiar story that starts with enthusiasm and fervor of a few ‘founding fathers’ (you can see them artistically blurred in comfortable pants). But before you know it (and in this case ‘before you know it’ takes a span of 13 years), the idea is distorted, commercialized, turned into professional certifications, turned into a rigid religion, and essentially morphed into an abomination barely recognizable to the said founding fathers. The Agile Manifesto defined good practices ‘in the left column’ to combat bad practices ‘in the right column’, yet Dave founds the reality steadily creating ‘agile’ practices that are hard to distinguish from the very things the original movement was raising against.

He who is bitten by a snake fears a lizard.

East African proverb

If you read so far, you are still waiting to see the connection to micro-services. There seems to be a micro-service bubble in formation, inflated by a steady stream of articles in the last couple of months. I have written recently about it, but it bears repeating.

The term itself entered my consciousness during the NodeDay 2014, thanks to great presentations of Ian Livingstone and Richard Roger (both the slides and videos are now available). It followed by an epic 7-episode miniseries by Martin Fowler, only to be challenged to a post-off with an alternative article).

Not to be outdone, Adrian Rossouw published the first article in a series (what is it with micro-services that makes people think they write copy for HBO?), and he brought the parabola of monkeys into the mix. He also expressed concerns that micro-services may not be the best solution in some contexts (client side – duh, I don’t even know how to make browsers run multiple processes even if I wanted to; but also REST APIs, which I completely disagree – in fact, they map perfectly into a system API, where each micro-service can handle a group of API endpoints, nicely routed to by a proxy such as NGINX).

Finally, Russ Miles collapsed 13 years of Agile journey into a Janus-like post that declares a tongue-in-cheek Micro-Services Manifesto. In addition to the real-fake manifesto with signatures (something Nicholas Cage could search for in his next movie), he proposed an official mascot for the movement – Chlamidia bacteria (not a virus, Russ, and you don’t want to contract it):

Chlamidia bacteria
Chlamidia bacteria – a microbial mascot for micro-services

The moral of this timeline? It definitely took us way less than 13 years to get from excited to jaded – about three months, I think. We should all back off a bit and see why the micro-services came to being before declaring another ‘Triumph of Death’.

On Premise Systems – Driver Analogy

Before the cloud, most of the enterprise software was meant to be installed on premise i.e. on customers’ computers, managed by their own administrators. A good analogy here is creating a car for consumers. While a modern car has hundreds of systems and moving parts, this complexity needs to be abstracted out for the average driver. Drivers just want to enter the car, turn on the key and drive away (North American drivers don’t even want to be bothered to shift gears, to the horror and loathing of their European brethren).

By this analogy, monolithic on premise software is good – it is easy to install, start and stop (I said ‘easy’, not ‘fast’). When a new version arrives, it is stopped, updated and started again in a previously announced ‘maintenance window’. The car analogy holds – nobody replaces catalytic converter while hurling down the highway at 120km/h (this is Canada, eh – we hurl down the highway in the metric system). This operation requires service ‘downtime’.

Cloud Systems – Passenger Analogy

With cloud systems, complexity is not a problem because somebody else is driving. Therefore, it does not matter if you cannot do it as long as your driver knows how to operate the velocitator and deceleratrix. What you do expect is that the driver is always available – you expect no downtime, even for service and repairs. That means that the driver needs to plan for maintenance while you are in a meeting or sleeping or otherwise not needing the car. He may need to involve his twin brother to drive you in an identical car while the other one is in service. You don’t care – as far as you are concerned, the system is ‘always on’.

What does this means for our cloud system? It means that monolithic application is actually hurting our ‘always on’ goal. It is a bear to cluster, scale, and keep on all the time because it take so long to build, deploy, start and stop. In addition, you need to stop everything to tweak a single word in a single page.

Enter micro-services, or whatever practitioners called them before we gave them that name. Cutting the monolith into smaller chunks is just good common sense. It adds complexity and more moving parts, but with redundant systems and controlled network latency, the overhead is worth it. Moreover, in a passenger analogy the added complexity is ‘invisible’ because it is the driver’s problem, not yours. It remains the implementation detail of a hosted system.

With micro-services, we can surgically update a feature without affecting the rest of the system. We can beef up (cluster) a feature because it is popular without the need to needlessly cluster all other features that do not receive as much traffic. We can rewrite a service without committing to rewrite the entire monolith (which rarely happens). And thanks to DevOps and all the automation, we can deploy hundred times a day, and then monitor all the services in real time, through various monitoring and analytics software.

Before we called it ‘micro-services’, Amazon called it services built by two pizza teams. We also had SOA, although they were more about reusable services put into catalogs then breaking down a hosted the system into manageable chunks without grand aspirations of reusability.

The thing is – I don’t care how we call it. I don’t even care if we come up with a set of rules to live by (in fact, some may hurt – what if I cannot fit a micro-service into 100 lines of code; did I fail, will I be kicked out of the fraternity). There is a need for decomposing a large hosted system into smaller bits that are easier to manage, cluster and independently evolve. We probably should not call it after a nasty STD or monkeys as suggested, but in the end, the need will survive the movement and the inevitable corruption and sell out.

If you are writing a cloud-based app these days, and through trial and error discovered that you are better off breaking it down into several independent services with a strong contract, communicating through some kind of a pub/sub messaging system, you just backed into micro-services without even knowing.

In the end, while ‘Agile’ was declared dead by a founding father, ‘Agility’ lives on and is as needed as ever. Here is hoping that if we manage to kill the the micro-service movement somehow (and in a much shorter time), the need for composable hosted systems where parts independently evolve will live on.

That’s the kind of afterlife I can believe in. That and the one by Arcade Fire.

 © Dejan Glozic, 2014