Plan for the Delivery Aftershocks

March_11_2010_aftershocksMy home country is not by itself an earthquake-prone region but we did get jolted ever once in a while with an echo of a truly damaging quake in the neighborhood. People who experienced earthquakes know that after the main event, a series of progressively smaller tremors are normal, indicating the plates settling into a new stable state. They are called ‘aftershocks’ and even though they are not nearly as damaging as the real deal, they can rattle the frail nerves.

As a team leader in various incarnations, I established The Rule of Aftershocks as it is applied to software integration. It works with such a casual certainty that each time we had a snafu caused by a big code delivery, my team would shrug their collective shoulders and say ‘yup, aftershocks’. This is how it normally plays out:

  1. You work on a big, sweeping feature that touches a lot of files. It is very exciting, and it is going to be great, that is, when you finally finish it.
  2. Weeks are passing by, you are working like mad. Your team mates are working too, delivering code changes around you into the repository. You are trying to keep up, frequently merging their code into your changes.
  3. The code starts to burn a hole in your hard disk, begging you to release it already. You test and test and test, trying to leave no stones unturned.
  4. Finally you deliver all 800 pounds of it. It immediately breaks the integration build because you forgot that it has a separate way of managing dependencies than the test builds you were running. You fix that (#1).
  5. Sanity test of the integration build fails because the database and/or the server software is slightly different than the one you used. IT SHOULD NOT MATTER, you say, these are all APIs, but somehow it still fails. You find out what the problem is (grumble, grumble) and fix it (#2).
  6. The build is now deployed and people are starting to use it. They discover all kinds of glitches only real-life use can uncover. You are fixing like mad, trying to stay ahead of the bug reports as they pour in. (#3+)
  7. After you fix all the obvious bugs, you get to the bottom of the barrel. People report mysterious, hard to diagnose and reproduce problems that seem to only happen every second Friday if it’s a full Moon and you had tuna sandwich for lunch. (#4)
  8. You forgo social life, family, natural light and even personal hygiene (if you work from home) trying to fix these maddening bugs. Eventually you do, after two milestones/sprints/whatever-you-call-iterations.

In the scenario above, your initial delivery of the code bomb counts as Event Zero, and I counted at least four aftershocks. Here is the maddening thing: it is really, really hard, if not impossible, to completely avoid them. No amount of testing and re-testing can spare you from them, it only affects their number and concentration. At some point your focus should be on minimizing their number, and ensuring they all occur early while the iron is still hot.

OK, so aftershocks are like death and taxes, if you can’t avoid them, why bother? Well, you should because they make you look bad as a developer or a team leader, and because you CAN do something about them. You simply need to gauge the size of the code you are about to release into the wild and leave the aftershock buffer in your plan. If somebody on your team is delivering a big code bomb, leave one iteration for the aftershock management. If you expect an epic code bomb to drop, leave two iterations. And woe unto you if you allow a Fat Bastard sized code delivery on the last Friday of the last coding iteration. Aftershocks cannot be completely avoided, but they can be managed and planned for. A prudent team lead front-loads big deliveries, accepting aftershocks as a price of progress, knowing that chasing zero aftershock chimera leads to an overly conservative team. You don’t want to become so afraid of braking anything that it leads to the heat death of the project.

As a side note, I would say that epic code bombs are themselves a problem – very few features require working in such large batches. Therefore, I would amend The Rule of Aftershocks to be: for a big code drop, plan one iteration aftershock buffer, and simply don’t allow code drops that require more. This compromise strikes a nice balance between making progress and causing people at the receiving end of your bugs to hate you with passion.

© Dejan Glozic, 2013

Don’t Get Attached To Your Code

All_Gizah_Pyramids

Many years ago when I moved to Canada, my father-in-law came to visit. He was showing interest in what I did for living and I tried to explain to the best of my abilities. I failed miserably, leaving him befuddled that people are actually paying me money for lining up the bytes ‘just so’. For the longest time, software development, or ‘anything computers’ was a black art for most people, simultaneously feared and ridiculed (except when they needed advice on what computer to buy on Boxing Day or how to find lost files that they saved ‘in Word’). While those same people could not perform brain surgery or represent somebody in a trial, at least they understood the key ‘value propositions’ of those professions. What developers did every day was a mystery.

This all changed when people started carrying millions of lines of code running in their pockets – NOW they know what we do (sort of). But do we?

Sometimes when the light of the Indian summer sun hits my office window at a particular angle, I find myself caught in a ‘what is our contribution to humanity’ stream of thought. However cool what we work on is at the moment, its very nature is ephemeral. Teenage girls will not cry to our code surrounded by lit candles. Tourists will not make goofy pictures with our code precariously leaning in the background. And our code will not be the last to survive After Humans, giving the pyramids a run for their money (eat that, pharaohs!). No matter how important our code seems to us, an object of a lasting value it is not.

Ours is not the only profession where the fruits of thy labor are of a fleeting nature. Bakers used to wake up at 2am to produce beautiful bread that had to be eaten by the same evening lest it turns into a hard object you can bludgeon somebody to death with (I am talking artisan bread here, not the mutant Ninja variety that is sold in plastic bags nowadays). But at least they spent only a few hours on their creation. What about the wine makers? They toil year around, harvest the grapes, ferment them, let the wine sit in wooden caskets for years, bottle it with meticulous attention to detail. To what end? As Stereophile’s Michael Fremer used to say, no matter how expensive the wine, in the end you are left with memories and urine, and then only memories.

Developers invest a lot of time crafting their code. It is the ultimate expression of their intellect, and if they are not careful, even their souls and their very creative essence. I say ‘if they are not careful’ because code, like bread or wine, has an expiration date, and getting attached to an artifact of a fleeting nature is not wise and can lead to heartbreak. There are many ways a piece of code can end up on the chopping block: change in requirements, target environment, new OS or browser version that makes your code obsolete, refactoring, performance improvements, ‘what were we thinking’ moments, you name it. Or you can get assigned to a new task and somebody else (the horror!) ends up owning it.

Why do we invest so much personal value in code? It may be the effort required to craft it, or the sacrifices needed along the way (I wish I had a dollar for every perfect day I observed through the window of my office while writing the latest absolutely awesome installment of the future legacy code). Some people go as far as to invest a lot of meaning in the actual syntax and how all the statements and punctuation are lined up (the best way to turn such a developer into a ball of rage is to run their code through an automatic formatter). We can also write code with an intention to impress, which is a sure sign it will be too smart for its own good.

Another common reason for clinging to code is that it represents our self-worth and importance. If I give up code I own now, what will I do the whole day? Typically this is an illusion, similar to what Jerry Seinfeld was told as a kid (‘don’t eat cookies before lunch, you will ruin your appetite’). As a grown-up, Jerry now understands that even if you ruin that particular appetite, a perfectly good appetite is just around the corner – there is no danger of running out of appetites. Or problems for which new code needs to be written.

We should learn from those before us that engaged in professions that by their very nature do not produce long-lived objects (even though you could argue that the Cobol software still running in banks and airline reservations is pushing the meaning of the word ‘fleeting’). We should focus on the positive effect of our code: how many lives it improved, how much time it saved to its users, how much faster it made other developers for a while. A long gone bottle of wine that started a romance that blossomed into a lifelong marriage is worth its weight in gold. Good code can inspire, generate many more ideas, be a stepping stone to even greater heights. Even bad code can be a learning experience, at least as in ‘we should not do that again’.

So there you have it. Focus on the transcendental value of your code – what it means to your users and how it makes their life better, at least for a moment, and cherish that value. While physical manifestations of your code may succumb to the vagaries of the fast-moving industry (phone app development, anyone?), nobody can take away the memories and the learning that your code brought you.

And if you are still yearning for something physical to create, maybe you can take up painting. Or you can build a pyramid in your back yard. Even if it fails to become the world’s 8th wonder, you can still use it as a tool shed.

© Dejan Glozic, 2013

RESS to the Rescue

It seems that every couple of years we feel a collective urge to give a technique a catchy acronym in order to speed up conversation about UI design. Last couple of years, we grew accustomed to throwing around the term Responsive Design casually, probably because it rolls off the tongue easier then “we need to make the UI, like, re-jiggle itself on each thingy”. Although I like saying ‘re-jiggle’. Re-swizzle is my close second.

As we were working on the tech preview of the Jazz Platform Home app, we naturally wanted it to behave on phones and tablets. And it did. The header in particular is a breathing, living thing, going through a number of metamorphoses until a desktop caterpillar turns into a mobile butterfly:

res-all

It all works very well, but a (butter)fly in the ointment is that we need to send the markup for both the caterpillar and the butterfly on each request. We do CSS media query transformations to make them do what we want, but we are clearly suboptimal here. I am sure that with some CSS shenanigans we could pull it off with one set of HTML tags, but at some point it just becomes too hard and error-prone. Clearly it is one thing to make the desktop/tablet header drop elements and adjust with screen size, and another to switch to a completely new element (off-screen navigation) while keeping the former rotting in the display:none hell.

This is not the only example where we had to use this technique. The next example has the section switcher in the Project Details page go from vertical set of tabs to a horizontal icon bar:

proj-res

Again, both markups are sent to the client, and switched in and out by CSS as needed. This may seem minor but in a more complex page you may end up sending too much stuff on all clients. We need to involve the server into this using some kind of a hybrid technique.

As JJG before, Luke Wroblewski has not actually invented this technique but simply put a name on it. RESS stands for REsponsive design + Server Side and first appeared in Luke’s article in 2011. The technique calls for involving the server in responsive design so that only what is needed is sent to the client. It also goes by the name ‘adaptive design’. The server is not meant to replace but rather complement responsive design. You can view the server as a more coarse-grained player in this duo, sending ‘mobile’ or ‘desktop’ stuff to the client, while the responsive design takes care of adapting to the device in more steps or decision points. It needs to because what does it mean ‘mobile’ today anyway? There is a continuum of screen sizes now, from phones to phablets to mini-tablets to regular tables to small laptop screens to desktops – what is ‘mobile’ in that context?

It seems like RESS would help us in our two examples above by allowing us to not send the desktop header to the phone and off-screen navigation to the desktop. However, the server has limited means to perform device detection. At the very bottom, there is HTTP header ‘user-agent’. Agent sniffing is a never-ending task because of new devices that keep coming online. There is a lot of virtual ink spilled on the algorithms that will correctly detect what you are running. This is error-prone activity relying on pattern matching against lists that normally go stale if not updated constantly. There were attempts to subcontract this job, but many are either stack-specific or fee-based (such as WURFL or detectmobilebrowsers.mobi), or a combination of all these approaches. I don’t see how any of these solutions make sense to most of the web sites that want to go the RESS route. Simply judging by the number of available solutions, you can see that this is not a solved problem, analogous to the number of hair loss treatments.

Another approach we can use (or combine with user agent detection) is to send the viewport size to the server in a cookie. This arms the server with the additional means to make correct decision because while, say, iPad is reported as ‘mobile’, it has enough resolution to be sent a full desktop version of the page, and viewport size would reveal that. The problem with this technique is that your site needs to be visited at least once to give JavaScript a chance to set the cookie, which means that the optimization only kicks in on repeat visits.

Considering the imprecise nature of device detection, you should consider RESS a one-two approach – server tries its best to detect the device and send the appropriate content, then responsive design on the client kicks in. A prudent approach would be to err on the safe side on the server and give up on device detection at the first sign of trouble, letting the client do its thing (when, say, user-agent has never been seen before or cookies are disabled preventing viewport size transmission). With this approach, we should ensure that client-side responsive design works well on its own, and consider server-side an optimization, a bonus feature that kicks in for the most straightforward cases (i.e. most well known phones, tablets) and/or on repeat visits.

By using the vision analogy, the most you would expect your server to do in a RESS-enabled site is to tell you that the vehicle on the road is a car and not a truck, and that it is black(ish). As long as you don’t expect it to tell you that it is a 2012 BMW i3 with leather seats and technology package, you will be OK.

© Dejan Glozic, 2013

The Gryphon Dilemma

Gryphon

In my introductory post The Turtleneck and the Hoodie I kind of lied a bit that I stopped doing everything I did in my youth. In fact, I am playing music, recording and producing more than I did in a while. I realized I can do things in the comfort of my home that I could have only dreamed in my youth. My gateway drug was Apple GarageBand, but I recently graduated to the real deal – Logic Pro X. As I was happily mixing a song in its beautifully redesigned user interface, I needed some elaborate delay so I reached for the Delay Designer plug-in. What popped up required some time to get used to:

Delay-Designer

This plug-in (and a few more in Logic Pro) clearly marches to a different drummer. My hunch is that it is a carry-over from the previous version, probably sub-contracted and the authors of the plug-in didn’t get to updating it to the latest L&F. Nevertheless, they shipped it this way because it very powerful and it does a great primary task, albeit in its own quirky way.

This experience reminded my of a dilemma we are faced today in any distributed system composed of a number of moving parts (let’s call them ‘apps’). A number of apps running in a cloud platform can serve Web pages, and you may want to hook them up together in a distributed ‘site of sites’. Clearly the loose nature of this system is great from the point of view of flexibility. You can individually evolve each app as long as the contract that glues them together is upheld. One app can be stable and move slowly, while you can rev the other one like mad. This whole architecture works great for non-visual services. The problem is when you try to pull any kind of coherent end user experience out of this unruly bunch.

A ‘web site’ is an illusion, inasmuch ‘movies’ are ‘moving’ – they are really a collection of still images switched at 24fps or faster. Web browsers are always showing one page at a time. If a user clicks on an external link, the browser will unceremoniously dump all of page’s belongings on the curb and load a new one. If it wasn’t for the user session and content caching, browsers would be like Alzheimer patients, having a first impression of the same page over and over. What binds pages together are common areas that these pages share with their keen, making them a part of the whole.

In order to ensure this illusion, web pages have always shared common areas that navigationally bind them to other pages. For the illusion to be complete, these common areas need to be identical from page to page (modulo selection highlights). Browsers have become so good at detecting shared content on pages they are switching in and out, that you can only spot a flash or flicker if the page is slow or there is another kind of anomaly. Normally the common areas are included using the View part of the MVC framework – including page fragments is 101 of the view templates. Most of the time it appears as if only the unique part of the page is actually changing.

Now, imagine what happens when you attempt to build a distributed system of apps where some of the apps are providing pages and others are supplying common areas. When all the apps are version 1.0, all is well – everything fits together and it is impossible to tell your pages are really put together like words on ransom notes. After a while, the nature of independently moving parts take over. We have two situations to contend with:

  1. An app that supplies common areas is upgraded to v2.0 while the other ones stay at v1.0
  2. An app that provides some of the pages is upgraded to v2.0 while common areas stay at 1.0
intra-page
Evolution of composite pages with parts evolving at a different pace.

These are just two sides of the same coin – in both cases, you have a potential for end-results that turn into what I call ‘a Gryphon UX’ – a user experience where it is obvious different parts of the page have diverged.

Of course, this is not a new situation. Operating system UIs go through these changes all the time with more or less controversy (hello, Windows 8 and iOS7). When that happens, all the clients using their services get the free face lift, willy-nilly. However, since native apps (either desktop or mobile) normally use native widgets, there are times when even an unassisted upgrade turns out without a glitch (your app just looks more current), and in real world cases, you only need to do some minor tweaking to make it fit the new L&F.

On the Web however, the Web site design runs much deeper, affecting everything on each page. A full-scale site redesign is a sweeping undertaking that is seldom attempted without full coordination of components. Evolving only parts of a page is plainly obvious and results in it not only being put together like a ransom note but actually looking like one.

There is a way out of this conundrum (sort of). In a situation where a common area can change on you any time, app developers can sacrifice inter-page consistency for intra-page consistency. There is no discussion that common set of links is what makes a site, but these links can be shared as data, not finished page fragments. If apps agree on the navigational data interchange format, they can render the common areas themselves and ensure gryphons do not visit them. This is like reducing your embassy status to a consulate – clearly a downturn in relationships.

Let’s apply this to a scenario above. With version evolution, individual pages will maintain their consistency, but as users navigate between them, common areas will change – the illusion will be broken and it will be very obvious (and jarring) that each page is on its own. In effect, what was before a federation of pages is more like a confederation, a looser union bound by common navigation but not the common user experience (a ‘page ring’ of sorts).

inter-page
Evolution of composite page where each page is fully controlled by its provider.

It appears that this is not as much a way out of the problem as ‘pick your poison’ situation. I already warned you that the Gryphon dilemma is more of a philosophical problem than a technical one. I would say that in all likelihood, apps that coordinate and work closely together (possibly written by the same team) will opt to share common areas fully. Apps that are more remote will prefer to maintain their own consistency at the expense of inter-page experience.

I also think it all depends on how active the app authors are. In a world of continuous development, perhaps short periods of Gryphon UX can be tolerated knowing that a new stable state is only a few deploys away. Apps that have not been visited for a while may prefer to not be turned into mythological creatures without their own consent.

And to think that Sheldon Cooper wanted to actually clone a gryphon – he could have just written a distributed Web site with his three friends and let the nature take its course.

© Dejan Glozic, 2013

Dumb Code Good, Smart Code Bad

344px-'Be_smart..Act_dumb^_-_NARA_-_513932

I owe the idea for my Orwellian title to a well known engineer Robert W. Lucky who came into my life through a ritual of cracking open a brand new and shiny IEEE Spectrum magazine and immediately going for his column “Lucky Strikes” (this awesome name acquired a whole new meaning after I started watching Mad Men). This was at a time when magazines were on paper. There is no such a thing as a ‘new blog smell’.

Anyway, I remember an issue in which Robert tried to imagine the unnamed engineer who came home to his wife one night and quipped: “Honey, I came to a great idea today! Instead of storing years as four digits, we will store only the last two. Imagine the megabytes of disc space we will save. Memory is very expensive these days, you know.” Fast forward to the Y2K nail biting, that smart engineer is nowhere to be found. Perhaps his wife divorced him for wreaking such havoc on the humanity, including characters from Office Space.

Throughout my career in software industry, I got to see ideas that seemed clever at the time turn out not as clever in hindsight. This just reinforced what Robert Lucky tried to convey, until I crystallized my ‘dumb code, smart code’ law. Before counting the ways why smart code is bad, let me be clear that by saying ‘dumb code’, I don’t mean ‘bag of hammers’ dumb. Just ordinary, by the book, consistent, easy to read, following the good practices, as simple as possible code. The kind that makes hipster hackers feel sick and ironic at the same time.

Software developers like to think of themselves as smart (actually smarter than most people, which may explain why they are prone to losing their lunch money). Ever since all the sorting algorithms have been invented and CSc departments banned any new submissions, a future software developer cannot go through university without writing a compiler or three. And the new languages they invent need to be quirky and different (I guess that’s why Scala has no semicolons – you just ‘sense’ the end of a statement by vibe).

When software developers join companies, they carry over their taste for indie code practices to the production code they start writing. Let me count the ways why their code ends up causing headache to everybody:

  1. Using code to impress. Developers sometimes feel they need to prove themselves, and code seems to be a great way to show ‘them’ what they are capable of. This means passing every opportunity to use the simplest solution that does the job.
  2. Local solution for a global problem. Without control over the project as a whole, developers tend to try to fix a problem locally. Local solutions only address that one instance, create inconsistencies, and will be a burden at some point in the future when the fix of the global problem is attempted.
  3. Nobody understands your code. Clever code is by definition unusual, needs some time to digest, and is often incomprehensible to everybody including the author after a month (at most). Since code lives forever, this particular corner will be avoided at all costs by poor developers assigned to maintain it, it will be worked around and eventually yanked in frustration.
  4. Smart code is hard to optimize. Straightforward code responds well to automated optimization and refactoring. Compilers are more likely to automatically speed up code that does nothing strange or crazy. As said in (2), it is easy to make a sweep through the entire project if all the code that needs to be visited is easy to understand.
  5. Smart code is buggy. As Seinfeld claimed, sometimes the road less traveled is less traveled for a reason. Smart code is attempting something novel and unusual, and as such there are always some rough edges to smooth out, necessitating frequent revisions. Novel and unusual algorithms and approaches typically look great during a coffee-fueled all-nighter, but often require a lot of tuning to work well in production.
  6. Smart code gains are ephemeral. Smart code is often smarter than it needs to be because there was a problem with some browser or OS version. Chances are that code will outlive its purpose soon after the next browser/OS update. I am not saying you should never write this kind of code (search all occurrences of ‘IE is a steaming pile’ in Dojo), but it is prudent to clearly mark it and make it easily defeatable when it outlives its purpose.

Lest this blog sounds as a criticism of ‘them’ whereas ‘we’ are different, I am writing it from a position of somebody who has been there, done that and bought an ironic T-shirt. I am trying to convey a hard-earned realization caused by writing smart code, feeling smug about it, forgetting how it works, hitting a problem after problem later and eventually yanking it with a sigh of relief when the new version of a library or browser made it unnecessary. The reason more seasoned developers are less likely to do it is because they had enough time to see the entire cycle, not just the initial buzz that smart code brings. In situations where writing smart code is inevitable, realize that you are fixing a temporary problem, cordon off the code, mark it clearly and keep an eye on the earliest opportunity to dump it without mercy (which requires that you avoid becoming personally attached to that code, otherwise you will feel like you put your favorite puppy to sleep).

You could say that smart code is a drug – it brings euphoria when you write it, but you pay the dire price down the road. Therefore: kids, say NO to smart code!

© Dejan Glozic, 2013