The Volume and Surface Area of Computer Programs

I’ve been refactoring some software. It’s not a fun task, exactly, but there’s something strangely satisfying about it — it’s a bit like folding socks. In the past, I have found the concepts of “coupling” (bad) and “cohesion” (good) useful for this kind of work. But the definitions of those concepts have always seemed a bit vague to me. The Wikipedia page linked above offers the following definition of cohesion: “the degree to which the elements of a module belong together.” The definition of coupling isn’t much better: “a measure of how closely connected two routines or modules are.”

What does that mean? Belong together in what sense? Closely connected how? Aren’t those essentially synonyms?

If I were in a less lazy mood, I’d look up the sources cited for those definitions, and find other, more reliable (?) sources than Wikipedia. But I’ve done that before, and couldn’t find anything better. And as I’ve gained experience, I’ve developed a sense that, yes, some code is tightly coupled — and just awful to maintain — and some code is cohesive without being tightly coupled — and a joy to work with. I still can’t give good definitions of coupling or cohesion. I just know them when I see them. Actually, I’m not even sure I always know them when I see them. So I’ve spent a lot of time trying to figure out a more precise way to describe these two concepts. And recently, I’ve been thinking about an analogy that might help explain the difference — it links the relationship between cohesion and coupling to the relationship between volume and surface area.

The analogy begins with the idea of interfaces. Programmers often spend long hours thinking about how to define precise channels of communication between pieces of software. And many programming styles — object-oriented development, for example — emphasize the distinction between private and public components of a computer program. Programs that don’t respect that distinction are often troublesome because there are many different ways to modify the behavior of those programs. If there are too many ways, and if all those ways are used, then it becomes much more difficult to predict the final behavior that will result.

Suppose we think of interfaces and public variables as the surface of a program, and think of private variables and methods as being part of the program’s interior — as contributing to its volume. In a very complex program, enforcing this distinction between public and private becomes like minimizing the surface area of the program. As the program gets more and more complex, it becomes more and more important to hide that complexity behind a much simpler interface, and if the interface is to remain simple, its complexity must increase more slowly than the complexity of the overall program.

What if the relationship between these two rates — the rate of increase in complexity of the interface and the rate of increase in complexity of the overall program — is governed by a law similar to the law governing the relationship between surface area and volume? Consider a cube. As the cube grows in size, its volume grows faster than its surface area. To be precise (and forgive me if this seems too obvious to be worth stating) its volume is $x^3$, while its surface area is $6 \times x^2$ for a given edge-length $x$.

Biologists have used this pattern to explain why the cells of organisms rarely grow beyond a certain size. As they grow larger, the nutrient requirements of the interior of the cell increase more quickly than the surface’s capacity to transmit nutrients; if the cell keeps growing, its interior will eventually starve. The cell can avoid that fate for a while by changing its shape to expand its surface area. But the cell can only do that for so long, because the expanded surface area costs more and more to maintain. Eventually, it must divide into two smaller cells or die.

Might this not also give us a model that explains why it’s difficult to develop large, complex programs without splitting them into smaller parts? On one hand, we have pressure to minimize interface complexity; on the other, we have pressure to transmit information into the program more efficiently. As the program grows, the amount of information it needs increases, but if the number of information inlets increases proportionally, then soon, it becomes too complex to understand or maintain. For a while, we can increase the complexity of the program while keeping the interface simple enough just by being clever. But eventually, the program’s need for external information overwhelms even the most clever design. The program has to be divided into smaller parts.

So what do “coupling” and “cohesion” correspond to in this model? I’m not sure exactly; the terms might not be defined clearly enough to have precise analogs. But I think coupling is closely related to — returning to the cell analogy now — the nutrient demand of the interior of the cell. If that demand goes unchecked, the cell will keep expanding its surface area. It will wrinkle its outer membrane into ever more complex and convoluted shapes, attempting to expose more of its interior to external nutrient sources. At this point in the analogy, the underlying concept becomes visible; here, the cell is tightly coupled to its environment. After this coupling exceeds some threshold, the cell’s outer membrane becomes too large and complex to maintain.

In turn, cohesion is closely related to the contrary impulse — the impulse to reduce surface area.

Imagine a drop of water on a smooth surface. It rests lightly on the surface seeming almost to lift itself up. Why? It turns out that surfaces take more energy to maintain than volumes; they cost more. Molecules in the interior of the drop can take configurations that molecules on the surface can’t. Some of those configurations have lower energy than any possible configuration on the surface, so molecules on the surface will tend to “fall” into the interior. The drop will try to minimize its surface area, in much the way a marble in a bowl will roll to the bottom. And the shape with the lowest surface-area-to-volume ratio is a sphere.

We have different words for this depending on context. This phenomenon is the same phenomenon we sometimes name “surface tension.” Water striders can glide across the surface of a pond because the water wants to minimize its surface area. The water does not adhere to their limbs; it coheres; it remains decoupled. These ways of thinking about coherence and coupling make the concepts seem a bit less mysterious to me.