Social Micro-Services: Activity Streams 2.0

Awkward social encounters, Wikimedia Commons, Chrisrobertsantieau
Awkward social encounters, Wikimedia Commons, Chrisrobertsantieau

If I had a dollar for every time somebody mentioned the phrase ‘social’ to me in the last couple of years, my pants would suffer since I tend to carry all my coins in my right pocket. With my soft credit card wallet. I carry my iPhone in my left pocket, where I carry cash in paper bills (paper bills will not scratch the ‘naked’ iPhone, while metal coins might). No, I don’t have Asperger’s, why do you ask?

Tapping into the opportunity to exploit user activity is now considered a norm in any system of a decent complexity. Users of a system create a social trail that can be put to good use to improve collaboration and insight.

More conveniently, adding social dimension to a system that already provides value on its own is an order of a magnitude easier proposition. Think about it: dedicated social networks such as Facebook and Twitter vitally depend on users generating primary data by posting updates, pictures, videos and otherwise exposing themselves to the advertisers. In a system where users go to accomplish a primary task (track and plan, write code, build and deploy code, monitor running apps etc.), social events are happening as a side effect of user activity – they are not the sole value proposition of the system. All we need to do is capture and distribute these social events – no extra user effort required.

Another characteristic of value systems (in contrast to pure social systems) is that social activity is not limited to users. A part of the system can kick in, perform a task and then notify interested users about it. Before we called this ‘social’, they were ‘events’ and/or ‘notifications’. We now understand that activity of programmatic agents can easily and usefully appear on your social stream, mixed with the updates of actual users.

Social streams and micro-services

If you have followed this blog recently, you already know we now prefer to build our systems using micro-services. Extracting social streams out of such a system seems plausible for the following reasons:

  1. Overall activity of a micro-service-based system is a combination of activities of each individual service
  2. We are already using a message broker to ensure efficient and flexible message passing between micro-services
  3. Micro-services already publish messages when their state changes, mostly as a result of user actions
  4. Having a dedicated activity stream micro-service to add social dimension to the system is in itself consistent with micro-service architecture

OK, sounds like a plan. As I have already written in my post on clustering and messaging, we need a dedicated service that aggregates social activities from all the corners of your micro-service system. Our first instinct may be to tap into existing messages already flowing around, but this may turn out not to be a good idea:

  1. Messages that are published by the micro-services tend to supplement REST API. Not every CRUD event in every service is a social event worthy of appearing in activity feeds.
  2. There may not be enough data in the CRUD messages to build up a complete activity record.

For these reasons, it is a better practice to dedicate a separate messaging channel (or ‘topic’) for social activities and let micro-services choose which subset of the CRUD message traffic is social-worthy, and if so, publish a message that contains all the additional information required by the social stream.

Anatomy of an activity

What would that information be? We don’t have to guess – there is a public specification available to follow. An activity typically consists of an actor, verb and object, and optionally a target. In an activity that can be expressed as “Jane posted a picture to the album ‘Jane’s Vacation'”, we can see all four (Jane is the actor, ‘post’ is the verb, ‘picture’ is the object and ‘album’ is the target). Expressed using Activity Stream draft 2.0 JSON syntax, it could look like this:

{
   "verb": "post",
   "published": "2011-02-10T15:04:55Z",
   "language": "en",
   "actor": {
     "objectType": "person",
     "id": "urn:example:person:jane",
     "displayName": "Jane Doe",
     "url": "http://example.org/jane",
     "image": {
       "url": "http://example.org/jane/image.jpg",
       "mediaType": "image/jpeg",
       "width": 250,
       "height": 250
     }
   },
   "object" : {
     "objectType": "picture",
     "id": "urn:example:picture:abc123/xyz"
     "url": "http://example.org/pictures/2011/02/pic1",
     "displayName": "Jane jumping into water"
   },
   "target" : {
     "objectType": "album",
     "id": "urn:example:albums:abc123",
     "displayName": "Jane's Vacation",
     "url": "http://example.org/pictures/albums/janes_vacation/"
   }
}

Notice that an equivalent CRUD message produced as a result of a new picture resource being added in a Picture micro-service that manages images would follow the REST POST action that was performed to add the picture:


POST /pictures/albums/janes_vacation

In the command above, the new picture that Jane added to the album was in the HTTP request body.

As you may have noticed, CRUD messages are resource-centric, while activities are actor-centric. A Web page rendering the ‘Jane’s Vacation’ album will want to refresh to include a new picture (possibly using Web Sockets), but does not care who initiated the action. This is why it is hard to ‘synthesize’ activities out of CRUD messages – it is much better for the micro-service at the source to fire a clean, well formed activity object according to the public spec from the get go. It is virtually impossible to synthesize an activity example as shown above unless you are the service owning the data.

A vital part of firing a new activity is audience targeting. Let’s say that there is a micro-service that manages projects in a system. The project owner has decided to change the project description. Who should receive this activity on their personal social stream? There are two ways to implement this – user-centric and service-centric:

In a user-centric implementation, each user has a social graph of relationships. When an activity is performed by a node in her social graph, it should end up on her personal social stream. This approach looks very logical but is actually hard to implement if you are not Facebook or Twitter. I don’t think it is actually necessary in a system where social is enhancing to the primary value, rather than the value itself.

In a service-centric implementation, we assume that when an event occurs that is deemed social, the service has all the information it needs to determine activity’s primary and secondary audience. It so happens that activity stream specification has just such a feature. In our example with changing the project description, the service already knows all the members of the project, and all the users who are ‘subscribed’ or ‘watching’ the project somehow. Therefore, it should fire an activity like this:

{
   ....
   "to":  [{ "objectType": "person",
             "id": "johndoe"},
           { "objectType": "project",
             "id": "xxzqIHH_556X" }
   ],
   "cc":  [{ "objectType": "person",
             "id": "fredf"},
           { "objectType": "person",
             "id": "jasonj"}
   ]
}

In the example above, the activity is addressed to John Doe (the owner of the project) and the project’s dedicated activity stream, while “Fred F” and “Jason J” who are ‘watching’ the project will receive the update by the virtue of being on the “cc” list. This illustrates another powerful feature of the audience targeting – the ability to target object types other than ‘person’.

When such an activity arrives at the dedicated activity stream micro-service, it can simply store a copy in the social stream of each of the targets in the target audience. The publishing service has done all the work by identifying the audience – the activity stream service will simply honor the directive.

Social streams can be used to mix events from various sources. For example, system-wide alerts and broadcasts can end up on personal streams as well (things like ‘maintenance restart in 5 minutes’) for awareness and audit purposes.

Similarly, activities performed by various engines can be mixed with activities performed by actual users – activity stream specification is flexible enough to allow actors other than persons. That’s why you can have an activity such as ‘Continuous Integration started a build #45’ as well as ‘Build #45 failed with 45 errors’.

iphone-activities-ee2

Filtering

Finally, activity stream specification fits micro-services like a glove when it comes to filtering. Aggregating all the system chatter produce a fire-hose activity stream that ends up ignored due to its multitude of entries. This is where semantics of activity streams is superior to the RSS/ATOM feeds. Each micro-service can provide a definition of verbs and object types it intends to use in its activities. Since the core set of verbs and object types can easily becomes inadequate for a typical complex system, the definitions of extensions are vital to allow for powerful filtering based on verbs and object types, something like:

  • Hide all ‘build’ updates – filtering based on object type
  • Hide all ‘build succeeded’ updates – filtering based on object type and verb combination
  • Hide all updates like this – filtering of future occurrences of a something you don’t care about

Housekeeping

A micro-service system of a decent size can quickly produce a lot of activity chatter. If some of these activities target multiple users, copies per user add to ever growing database. Obviously we need to draw a line somewhere.

Again, social streams in a system where social data is not the primary value are less critical when it comes to data preservation. The assumption is that the primary data is safely stored, and if data changes need to be preserved for audit purposes, this audit trail is itself safely stored. Social streams are just an echo of these auditable changes, and do not need to be preserved in long term storage.

You can experiment with your activity stream micro-service storage, but keeping a week worth of social streams may be plenty. Alternatively, you can draw a line at a number of activities, or storage size, or a combination of all three.

Whichever method you pick, you need to run a ‘pruner’ task that deletes old activities from the database. In a distributed system based on micro-services, 12factors comes to the rescue with a recommendation for running admin tasks as one-off processes.

And there you have it. In a distributed system based around micro-services, there are already messages flying around. Opening up a social channel and collecting dedicated messages into a social stream is just an extra step that will help your users with the added insight into the activity of the system, and the actions of other users and agents they should know about.

In addition to the activity streams draft 2.0 spec, there is now a GitHub project with both client and server side implementation. It appears in the early days but if you don’t want to write everything from scratch, a Java as well as Node.js implementation is readily available – give it a test drive and let me know what you think.

© Dejan Glozic, 2014

Advertisements

2 thoughts on “Social Micro-Services: Activity Streams 2.0

  1. Fantastic article. I love how you’ve separated out activities from CRUD messages – something that is lost on many. I also like the delineation of user-centric and service-centric implementations, an important aspect of implementation once you start building collaboration into software.

    At Collabinate (http://www.collabinate.com), we’ve implemented an API to handle the heavy lifting of managing all of the data and relationships, which makes it super simple for people to implement what you have described here. We handle the more difficult user-centric case via our graph database back end, and we also allow the actor-centric streams to be associated with resource-centric entities, so that you don’t have to think of your activity system as something separate.

    I would love to get your feedback on our API – let me know if you’re available to chat.

    1. Hey, Jack – thanks for the kind words.

      I can take a look with normal caveat that I am blogging my personal views and am not at liberty to speak on behalf of IBM, or in context of our own implementation details around activity streams.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s