Making Gourmet Pizza in the Cloud

A while ago I had lunch with my wife in a Toronto midtown restaurant called “Grazie”. As I was perusing the menu, a small rectangle at the bottom attracted my attention. It read:

We strongly discourage substitutions. The various ingredients have been selected to complement each other. Substitutions will undermine the desired effect of the dish. Substitutions will slow down our service.

When I read it first, I filed it under ‘opinionated chef’ (as if there is any other kind) and also remembered another chef named Poppie from a Seinfeld episode, who opined on this topic thusly:

POPPIE: No, no. You can’t put cucumbers on a pizza.

KRAMER: Well, why not? I like cucumbers.

POPPIE: That’s not a pizza. It’ll taste terrible.

KRAMER: But that’s the idea, you make your own pie.

POPPIE: Yes, but we cannot give the people the right to choose any topping they want! Now on this issue there can be no debate!

Fast forward to 2013. This whole topic came back to my volatile memory as I was ruminating about what the cloud platform model means for web app developers. In the years before the cloud, we would carefully consider our storage solution, our server and client technology stack and other elements with an eye on how the application is going to be deployed. You wanted to minimize support and maintenance cost, administration headaches and also wanted to achieve economy of scale. A large web application is composed of components, and the economy of scale can be achieved if one architectural style is followed throughout, one database type, one web framework that supports components etc.

In this architecture, it is just not feasible to allow freedom for components to have a say in this matter. Maintaining three different databases or forcing three Web frameworks to cooperate is just not practical, even though these choices may be perfect for the components in question. The ‘good enough’ fit was a good compromise for consistency and ease of support and maintenance.

In contrast, a modern web application is now becoming a loose collection of smaller apps running in a cloud platform and communicating via agreed upon protocols. One of the key selling points of the cloud platform is that it is easy to write and publish an app and have it running within minutes. I am sure early successes are important for learning, and many apps will be tactical and time-boxed – up for a couple of months, then scrapped. But a much more important aspect of the new architecture to me is the opportunity to carefully curate the components of your federated application, the way an opinionated chef selects ingredients for a gourmet pizza.

Let me explain. A modern cloud platform such as EC2, Heroku or CloudFoundry provides for you a veritable buffet of databases (MySQL, Postgres, MongoDB, SQL Server), caching solutions (Memcache, Redis), message queues (RabbitMQ), language environments (Java, Scala, Python, Ruby, JavaScript), frameworks (Django, Spring, Node/Express, Play, Rails). You can then package a carefully curated collection of client-side JavaScript frameworks to include in our app. You have the freedom to choose, and not only in one way:

  1. You have the freedom to select the right database, server and client side that is the best fit for the problem that a single app is trying to solve. You can write computationally intensive app in Java or Scala, write front end app with a lot of I/O and back-and-forth chatter in Node.js, and another in Django or RubyOnRails because you simply like it and it makes you more productive. You don’t have to use whatever has been chosen by the entire team even though it does not fit your exact needs.
  2. You then have the freedom to compose your federated application out of individual apps cooperating as services, each one optimized for its particular task. As I said earlier, this would have been prohibitively expensive and impractical in the past, but is now attainable for smaller and smaller providers. The cloud platform has subsumed the role of the economy of scale vehicle that was previously required from large applications deployed on premise.

The vision of the polyglot future has been already declared for programming languages by Dave Fecak et al and also for storage solutions by Martin Fowler. A common thread in both is ‘best fit for the task’. In the former, additional angle was employment prospects of polyglot developers, while in the latter was a caveat that polyglot persistence comes with the expense of additional complexity.

I think that my point of view is subtly different: in the cloud platforms, complexity of managing many different systems is mitigated by the fact that the platform is responsible for most of it. On the other hand, while CFOs are salivating over the prospects of slashed IT expenses in hosted solutions, I am more intrigued by another unexpected side-effect: using the right tool for the right job and using multiple tools within the same project is now attainable for smaller players. All of a sudden, an application running multiple servers, each using a different database type is trivially possible for even the smallest startups.

We will never be asked again to put cucumber on our pizza just because the team as a whole has decided so, even though we hate cucumber. Conversely, we CAN put cucumber just on our pizza even though the rest of the team doesn’t like it if works wonderfully in our unique combination. In the new world of cloud platforms, we can all make gourmet pizza-style applications, each carefully put together, each uniquely delicious, and each surprisingly affordable. The opinionated chef from Grazie now seems almost prophetic: in the cloud, we DO want to select various ingredients to complement each other, because substitutions will undermine the desired effect and slow down our service(s).

You can almost pass the restaurant’s menu as the white paper.

© Dejan Glozic, 2013

Pulling Back from Extreme AJAX

I always thought that Jesse-James Garret had an awesome name. It is up there with Megan Foxx and that episode of The Simpsons when Homer briefly changed his name to Max Power. How many of us have a name that prompted John Lee Hooker to compare his level of badness to us? Granted, the Jesse in question (Jesse Woodson James) was too busy robbing banks and leading gangs to be a Chief Creative Officer of anything, but the power of the name is undeniable.

JJG, of course, is the guy who coined the phrase ‘Asynchronous JavaScript and XML’ (AJAX). It put a name on the technique that turned JavaScript from a quirky scripting language for coding web site menus to a powerhouse, delivering amazingly interactive Web applications to your browser. If we wanted to be sticklers, the majority of people now use JSON, not the original XML for the X part of the technique, but would we like JJG as much if he called the technique AJAJ? I don’t think so.

AJAX works roughly like this: in the dark ages of Web 1.0, whenever you wanted to react to user input in a significant way, you had to pass the data to the server and get a new rendition of the page. A lot of effort went into hiding this fact – session state was kept, parameters were passed, resources were cached to speed up page redraw, but it was still flashy and awkward – nothing like the creamy smoothness of desktop application UX. Then Microsoft needed some hooks, first in Outlook, then in IE, to make asynchronous requests to the server without the need to reload the page. The feature remained relatively obscure until Google made a splash with Gmail and Google Maps in a way that was browser-agnostic. Then JJG called the whole thing AJAX. You can read it all on Wikipedia.

Fascination with AJAX was long lasting. I guess we were so used to the limitations of Web 1.0 and what browsers were supposed and not supposed to do that for the longest time we had to pinch ourselves when looking at so much interactivity without page reload. I remember watching Google Wave demo at Google IO conference in 2009 and the presenter kept saying “and remember, there are no plug-ins here, this is all done with JavaScript in a regular browser”. Four years after the phrase was born!

There is a self-deprecating American joke that claims that “it is not done until it is overdone”. I live in Canada but when it comes to AJAX we are equally culpable. In our implementation of AJAX in the Rational Jazz Project, we served AJAX for every meal. We had AJAX hors d’oeuvres followed by AJAX soup, AJAX BBQ and then triple-fudge AJAX for desert. No wonder we now suffer from AJAX heartburn. Will somebody pass the Zantac?

See, the problem is that it is easy to overdo AJAX by forgetting that we are not writing desktop applications. We still need to have the healthy respect for the Web and remember that it is a fundamentally different environment than your desktop. Forgetting this fundamental tenet usually leads to a three-stage illness:

  1. Obesity. As in real-world diet, AJAX calories add up, and before you know it, your web page is ready for the Fat Farm. I have seen a page that needed to load more than 1MB of JavaScript (gzipped) before anything useful was shown.
  2. Separation Anxiety. Since it takes so much effort to load a page and make it useful, fat AJAX apps fear browser separation and do everything in their power to avoid a refresh. It can degenerate all the way to ‘one page apps’ that stay around like a bad penny, all the while faking ‘pages’ with DIVs that are hidden and shown via CSS.
  3. The God Complex. The final stage of this malady is that since we are not leaving the browser, and since any nontrivial app needs to manage a number of objects and resources, we come up with elaborate schemes for switching between them, lying to the browser about our navigation history, and finally fully re-implementing a good number of features that browsers already do, only much better. Pop-up menus, history management, tab management – we essentially ask the browser to load our app (painfully) and then go take a nap for a couple of hours – we will take over.

Should we lay this at the doorstep of Microsoft, Google and JJG? It would be easy but it would not help us. Nobody asked us to lose control of ourselves and amass such JavaScript love handles and triple chins. No AJAX Hello World snippet has 1MB of JavaScript. AJAX is wonderful when used in moderation. Large, slow to load AJAX pages are a problem even on the desktop, and are a complete non-starters in the increasingly important mobile world. Even if you ignore the problem of moving huge JavaScript payload over mobile networks, JavaScript engines in smartphone browsers have much smaller room to play and can cripple or downright terminate your app. By the way, this kind of defeats the purpose of so called ‘hybrid apps’ where you install a web app as native using PhoneGap/Cordova shim. That only eliminates the transport part but all the issues with running JavaScript on the phone remain. I am not saying that hybrid apps don’t have their place, only that they are not exempt from JavaScript weight watching.

How does a page wake up one morning and realize nothing in the wardrobe closet fits any more? Writing 1MB worth of JavaScript takes a lot of effort – it does not happen overnight. One of the reasons are desktop and server side developers writing client code but still somehow wanting to think they are in their comfort zone. A lot of Java developers think JavaScript is Java’s silly little brother. They devise reality-distortion force fields that allow them to pretend they are still writing in strongly typed languages (I am looking at you, GWT). Entire Java libraries are converted to JavaScript without a lot of scrutiny. I am strongly against it. As I always say, si fueris Romanae, Romano vivito more. If your code will end up running in a browser as JavaScript, why don’t you write in JavaScript and take advantage of all unique language features, instead of losing things in translation and generating suboptimal code?

But it is not (just) suboptimal code generation that causes the code bloat. It is the whole mindset. Proper strongly typed OO languages encourage creating elaborate class hierarchies and overbuilding your code by using every pattern from the Gang Of Four book at least once. That way the porky code lies. Coding for the web requires acute awareness that every line of code counts and rewards brevity and leanness. As it turns out, you pay for each additional line of JavaScript three times:

  1. More code takes longer to send to the browser over the moody and unpredictable networks
  2. More code takes longer for the browser to parse it before being able to use it
  3. More code will take more time to run in the browser (I know that JavaScript VMs have gotten better lately, but in the real life I still need to perform an occasional act of scriptocide – JavaScript performance problem didn’t entirely go away)

And what would an article by a middle-aged architect be without an obligatory “kids these days” lament? Many of the newly minted developers start out on the client side. They become very good at coaxing the browser into doing wonderful things, and this affinity naturally degenerates into doing ALL your work on the client. The server essentially becomes a resource delivery layer, sending JSON to JavaScript frameworks that are doing all of the interesting work in-browser. This is wrong. The proper way of looking at your application is that it consists of the following parts:

  1. Storage (Model)
  2. Server side logic (Controller)
  3. Server side rendering engine (View)
  4. Network transport layer
  5. Browser that renders HTML using CSS and reacting to user interactions via JavaScript

Notice how JavaScript is one of 7 aspects you need to care about. If you look at some implementations these days, you would think all the other aspects only exist to deliver or otherwise support JavaScript. It is the equivalent of a body builder that focused all his training on his right arm biceps. A balanced approach would yield much better results.

OK, Dejan, quit pontificating. What do we do to avoid dangers of irresponsible AJAXing in the future?

Well, armed with all this hard won experience, we have decided to take the opportunity of the new Jazz Platform project to wipe the slate clean and do things right. We are not alone in this – there is evidence that the pendulum has swung back from Extreme AJAX towards a healthier diet where all the Web App food groups are represented. For example, a lot of people read a blog post about Twitter performance-driven redesign, driven by exactly the same concerns I listed above. More recently, I stumbled upon an article by Mathias Schäfer arguing for a similarly mature and sane approach that embraces both the tried and true Web architectures and the interactivity that only JavaScript can provide.

You will get much better results by using the proper tool for the job. Want to access stored data fast? Pay attention to which storage solution to pick, and how to structure your data to take advantage of the particular properties of your choice. Need to show some HTML? Just serve a static document. Need to serve a dynamic document that will not change for the session? Use server side MVC to render it (I am talking a concept here, not frameworks – many server-side frameworks suffer from similar bloat but it is less detrimental on the server). Use the power of HTML and CSS to do as much rendering, animation, transition effects as you can (you will be surprised what is possible today with HTML5 and CSS3 alone). Then, and only then, use JavaScript to render widgets and deliver interactivity that wowed us so much in the recent past. But even then, don’t load JavaScript widgets cold – take advantage of the server-side MVC to prime them with markup and data so that no trip back to the server is necessary to render the initial page. You can use asynchronous requests later to incrementally update parts of the page as users go about using it (instead of aborting it half way because “this thing takes forever to load, I don’t have that kind of time”.

Eight years since AJAX has entered our vocabulary, the technique has matured and the early excesses are being gradually weeded out. We now know where overindulgence in JavaScript can lead us and must be constantly vigilant as we are writing new web applications for an increasingly diverse browser and mobile clients. We understand that AJAX is a tool in our toolbox, and as long as we don’t view it as a hammer and all our problems as nails, we will be OK. We can be bad like Jesse-James, guilt-free.

© Dejan Glozic, 2013

A Guide To Storage for ADD Types

800px-FACTS_ubt_2

If you read my bio on the About page, you can see that I called myself an ‘architect’. I made an explicit promise to apologize for that in a dedicated post at some point. According to a 2003 article by Martin Fowler, I may never be able to work at ThoughtWorks if this IBM thing does not work out, unless Dave Rice has changed his mind in the ensuing 10 years and is not in a grumpy mood any more. Now, if you check my tag line, it says that I care 30% about design, 70% about code, and there are no percentages left for caring how things are stored. As far as I am concerned, they can be dumped in the attic, next to a painting of me getting younger.

Nevertheless, as an architect I should care about storage, and that means going beyond putting obligatory cylinder-shaped objects on my slides. How did I ever get this far without having to worry about it? Through the genius of specialization. You see, when you are participating in a development of a large, multi-component Web applications customers install on premise, through the sheer economy of scale it makes sense that I worry about UIs and somebody else worries about the storage gore. You get to interact with nice objects that know how to persist themselves. You are not lazy, you are efficient. But the world is rapidly changing, going towards the hosted deployment, and from big monolithic web applications to collections of apps running inside a warm bosom of CloudFoundry, Heroku and other cloud platforms. In this architecture, we all need to be polymaths and fret about all aspects of modern app development, soup to nuts.

On the cloud, storage is considered ‘implementation detail’, something you are directed at via attached resources. Adam Wiggins has channeled his inner L. Ron Hubbard by establishing his Church of 12 Factors (kudos for pulling it off without one reference to space ships), and using storage as an attached resource is one of the 12 commandments. What does all this mean? It means that in this particular game of specialization chicken, I lost and have to deeply care about storage for the first time. I don’t need to install and manage databases (cloud platforms have a buffet to chose from), but nobody is going to make that network request for me any more.

And would you look at that? Just when I needed to care about databases, somebody threw a big friggin’ rock into the sleepy storage pond. For many years, you could use any particular DB type, as long as it was relational, where people like to shout in their SQL commands. Then Johan Oskarsson wanted to organize an event to discuss open source distributed databases, and used a #nosql hashtag. The rest is, as they say, history, and we now have to consider the weird and wonderful world of schemaless storage as well. Particularly if we care about clustering, because everybody knows that “Mongo DB is Web Scale” (you will need to Google this yourself, those foul-mouthed bears are NSFW).

His conflicted thoughts on architects notwithstanding, I found Martin Fowler a great resource in my deranged binge of DB knowledge drinking. His introduction to NoSQL databases is great, and so is the 1 hour video from the GOTO Aarhus conference linked from that page. But my general feeling of catching up with the world was akin coming to a party 6 hours late, where most of the food is gone, some guests are not feeling well (to put it mildly) and through their personal experience I know which cocktails not to drink and why Vodka and Red Bull are not such great bedfellows.

Here is a sample of what I have learned, in no particular order. This is a big topic so consider this part #1 of a longer article. If you are like me and you like to spend your time on the ‘outside in’ aspect of your apps, this may help you pick your storage poison and be done with it. If you are an expert in DBs, feel free to point at me and laugh in contempt:

  1. Objects in the relational view mirror are harder than they appear. Objects are round, tables are square, and I don’t need to work hard to turn that into a tired phrase, don’t I? ORM can never be perfect, so there is no use arguing that your approximation is 2% closer to the asymptote than mine. Throwing more resources at it can turn so bad that Ted Neward declared it Vietnam of Computer Science in 2004. To the phrase ‘all is fair in love and war’ we can add ‘and also in ORM’ and be done with it. One popular solution is to do shallow mapping of your key properties to columns to allow for fast queries and stuff the rest of the aggregate data hierarchy into a serialized LOB. If you really, really need to query the data in the LOB later, you can build an inverted index like folks at FriendFeed did and blogged about in 2009.
  2. Don’t fake schemaless storage by using EAV (entity-attribute-value) tables. Bill Karwin was on a mission at some point to convince everybody who cared that they sucked. I can only add that if you truly need that kind of open-ended freedom, RDF triple stores seem to be a better fit. And you get to use SPARQL query language, which always makes me laugh because it reminds me of Mr. Sparkle from a Simpsons episode.
  3. Do store trees of nodes in tables. The curse of EAVs is limited to aggregate data, not to uniform trees of nodes. It is OK if some rows in well designed tables have additional columns for capturing parent-child relationships. Joe Celko seems to be a go to authority on adjacency lists so it is wise to start there, then fan out as needed.
  4. We are moving away from databases as integration platforms. DBAs rise to the level of demigods was facilitated by the fact that a well designed RDB can serve as an integration platform for many applications written to slice and dice the data in new and unexpected ways. Today, services are the portals where we go for our data, and as long as interchange formats are stable, databases can be relegated to the implementation detail. Both RDF and JSON-LD are designed to give you a level of ‘unexpected data utility’ of the web of linked data that no single database can give you.
  5. RDBs are hard to scale. The whole movement towards NoSQL databases was inspired by their affinity to data partitioning (sharding for capacity scaling and/or replication for fault tolerance). On a cloud platform, not only that your app can be clustered as demand grows, but so can an array of NoSQL DB nodes if you start producing a ton of data.
  6. NoSQL databases have no JOINs, and have dropped ACID. According to the CAP theorem, a distributed system cannot guarantee all three of: consistency, availability and partition tolerance. This leads to a system where your app is eventually consistent, but can occasionally have brief intervals of stupidity. There is anecdotal evidence of NoSQL databases loosing data in the real world deployments. It is disconcerting to view your database as an airline that can loose your luggage every once in a while. Now you may need a second database as a backup in case your first database partied too hard and cannot quite remember where all your JSON documents went.
  7. NoSQL databases are attractive to all-JS outfits. The flip side – imagine a system where you write your client side using a collection of JavaScript toolkits, use XHR to send objects as JSON to the server, which is itself written using Node.js. After preparing your data you send it to MongoDB, as JSON again, where it will be stored as a JSON document. You can then use JavaScript to apply map/reduce and fetch interesting projections on the data, or query on JSON properties, or simply fetch the entire JSON document back when you need all the data. Apart from clustering, NoSQL DBs are just a nicer environment for app developers, particularly the recently weaned ones that have no attention span for strongly typed languages.

This can give you a flavor of where my head is right now. I am just trying to be practical about it. I know that there is no free lunch, but ‘kids these days’ know how to make things new and shiny. I know an adult in me should still prefer RDBs for their battle hardened dependability and predictability, but look at Mongoose JS library and tell me you are not swayed by the simplicity (and nobody is yelling at you):

    var mongoose = require('mongoose');
    mongoose.connect('mongodb://localhost/test');

    var Cat = mongoose.model('Cat', { name: String });
    var kitty = new Cat({ name: 'Zildjian' });
    kitty.save(function (err) {
       if (err) // ...
          console.log('meow');
    });

This is as close as I have come to replicating that ‘objects that know how to persist themselves’ feeling of yore, minus the quagmire of ORM. It is hard to resist the siren call of NoSQL. But I think the biggest lesson of my forray into the wonderful world of storage is that today you can change your mind about it. As long as you truly hide your storage choices as implementation detail, you can switch later, so stop agonizing, pick one and move on to the more interesting areas, like user experience.

© Dejan Glozic, 2013