Thursday 12 November 2009

Square Abstractions

Managing complexity is at the heart of Software Engineering, and abstraction is the tool by which we accomplish this.  But what do our abstractions look like, and how should we judge them?

Abstractions should be square.

Or cubic. Possibly n-dimensional hypercubes.  But not rectangles.  And lines are right out.  G.A. Miller wrote a classic psychology paper in 1956 with the far-reaching conclusion that in uni-dimensional data-sets, humans have a typical classification capacity of between 2 and 3 bits - between 4 and 8 items.  His paper is titled 'The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information'.  How does this apply to software abstraction? It gives us a quantitive key to determining whether an abstraction (which implies a reduction in complexity) is of sufficient quality.  It also gives us a clue to resolving the issue of abstractions still retaining too much complexity: add another dimension.

By square abstractions, I mean that a good set of abstractions in the software domain, from an arbitrarily complex starting point to the most understandable abstraction of that idea, should have approximately equal complexity in each dimension.  If the result is that each (and all, since we have decreed equality) dimension of abstraction is still too complex, we must re-dimension, refactor, and re-abstract.

Soap bubbles form perfect spheres not just because they find it aesthetically pleasing, but because they are most comfortable like that.  It takes the least effort.  In software we should similarly strive to find the solution which satisfies the constraints with the least energy.  Spheres might be nature's solution, but in software we tend to seek orthogonal abstractions - leading to squares, cubes, hypercubes, and so-on.

Getting practical for a moment, remember that every program, library, and API is an abstraction.  An application containing a single 100,000 file (yes, really...) might be perfectly good internally, but is missing out on a key abstraction in terms of translation units, modules, whatever else maps to files.  So split it into one-hundred 1000 line files - we've added a dimension and reduced the maximum unidimensional complexity.  But we should continue - 100 is more than an order of magnitude greater than our magic 7 plus or minus 2.  Directories, packages, folders: another level of abstraction.  And because we are being square, we aim to have approximately 10 directories with 10 files in each.   This stretches 7 +/- 2, but not sufficiently that any more abstraction would necessarily be helpful - adding a dimension has a cost too.
Why 100 files of 1000 lines, and not 316 files of 316 lines?  Because not all abstractions have the same cost, and we can apply additional abstractions within those files.  Like, um, classes, methods and functions.

So next time you (or I) think about adding that 100th method to our widget API, think about adding a new dimension instead.  And if it isn't obvious what that new dimension might be, then get creative and invent something new.

No comments:

Post a Comment