Improving on abstractions

When you add more and more features to your software, the code will inevitably become more and more complex. Every feature you add, adds complexity to the source code. And the more complex the source code is, the more it will be to add new features. If you just don’t do anything about it, eventually you’ll run into a nightmare where you can’t make a change without breaking existing functionality, and adding the most simple feature takes a ridiculous amount of time. It looks like you’re missing some abstractions here.

Abstractions model core concepts of the domain. When you have good abstractions, working with them becomes easy because you don’t need to think about low-level details. However, in my experience, creating a good abstraction can be difficult. This article explains the abstractions concept and why it is so powerful; next, it explores various flaws that can creep into abstractions and gives some properties of good abstractions. Read on to learn more!

What is an abstraction?

I find it useful to think about an abstraction as a “layer of concern”: a set of related concepts on the same logical level. Take for example the file system: when you talk about a file system, you talk in terms of files and directories, and both have a path on disk and contents. You’re not interested where exactly on disk the files and directories are stored, that is not part of the concept.

Of course, the files require actual physical bits to be stored. The implementation of a given file system requires access to the disk. To this end, it uses disk hardware that is made accessible by the operating system: this is another abstraction. The disk storage abstraction consists of those lower level concepts. So you can also think about an abstraction as a defined responsibility.

The interesting point is this: an abstraction lets you think in terms of its concepts and hides lower level details.

Modeling an abstraction

In practice, an abstraction is an interface that defines the concepts and possible interaction. This can be an interface definition in your favorite programming language:

interface PneumaticCylinder {
    void in();
    void out();
}

However, an interface definition in a programming language is not the only way to model an abstraction:

<order id="...">
    <item id="...">
        <product id="..." />
        <count>3</count>
        <price>3.55</price>
    </item>
</order>

This XML snippet describes an order with an item. The format is based around the abstraction of an order, which contains items, which refer to a product and have a price.

Why use abstractions?

The key thing of a good abstraction is that it hides implementation details. The file system abstraction tells nothing about the actual storage medium. This means that you, as a user of the abstraction, also don’t need to deal with that. That makes it much easier to use and to reason. Your code that uses the file system is not cluttered with disk access, making reasoning about the code easy.

This leads to the other advantage: by relieving all clients from dealing with these details, those details become replaceable. The file system implementation could be replaced by a dummy version for testing that behaves according to the same interface. But it is also possible to implement the file system to operate on top of a network protocol, and the client code keeps functioning with no change at all! Good abstractions make the application much more flexible.

In my experience, there is also a third advantage: when building on good abstractions, the code tends to be grouped into a set of responsibilities. Having responsibilities clearly defined makes it easier to reason about the code. This makes the application easy extensible and reduces maintenance effort.

Flawed abstractions

So by now it is clear that abstractions make things easier. However, getting to a good abstraction is not that easy. It takes some experience to recognize a good abstraction and extract it from an existing code base. And it is easy to introduce flaws when you model the abstraction in your program. Often, you even don’t see those flaws until you want to reuse the abstraction in a different context, because that’s when the flaws get in the way.

The first kind of flaw occurs so often that it has its own name: ‘leaky abstraction’. It is an abstraction that leaks underlying concepts to the clients. Take for example the XML product above: it shows that orders, items and products have an id, which has nothing to do with the concepts themselves. The fact that those entities are stored in a database is actually leaked. A leaky abstraction creates difficulties for clients of the abstraction: when you want to add a product, you must arrange a unique id. This makes the client’s code more complex. After all, it has to introduce functionality to deal with ids right within the order handling logic.

Another related type of flaw is that the abstraction contains too much. For instance, take the following abstraction description:

interface Vehicle {
    Engine getEngine();
    ...
}

The flaw here is that not all Vehicle implementations have an Engine (OK, I made that one up, but still). So you run into problems when you want to make an implementation of a Vehicle without an Engine: that simply doesn’t fit. At such a point, you need to refactor, which includes fixing all clients. Even if you decide to let getEngine return null, you still must fix your clients because they need to deal with the new decision.

Flawed abstractions may lead to the following problems:

Huge functions (you’re missing abstractions on the functional level, that is, you’re mixing low level and higher level code in a single function);
Duplicate code (you’re missing a generalization);
Using string variables all over the place, where strings require a certain format that needs to be validated everywhere (you’re missing a data type);
It is very difficult to create another implementation because you’re missing some key information.

Fixing flawed abstractions

When you recognize something about your abstraction is clumsy, you might want to fix it. This looks often difficult, because both the implementation and all clients of the abstraction must be changed. Using the following general approach, you can fix a flaw in an abstraction in an incremental way. For the sake of simplicity, we’re assuming an interface here, but the same approach is valid for all other types.

If you need to replace a method, first define the new method in the interface next to the old one. Implement the method so that it works as desired.
Update the clients. Not all clients have to be updated at once, but you can only finish this step once the last client has been updated.
Remove the old method from the interface and its implementations.

Growing abstractions

You need to define abstractions only when you’re facing some level of complexity in your code. Abstracting requires effort, so you’d better avoid it when not needed.

Take for example a simple reader for numbers. Your first version consists of a simple function that receives a string and outputs a bounded integer. A straightforward implementation can be written in a single function. But each time new features are requested, and each time, the function grows. You can add fractions, negative numbers, complex numbers, Roman numbers and so on. You’re breaking a set of supporting functions out of the initial function to keep it maintainable. But one day, you also need to add expressions. At that moment, your code has become so complex that you should split it between a scanner to split the input in tokens, and a parser to interpret them. When you do that, a scanner abstraction emerges.

As you saw previously, fixing a flawed abstraction can be difficult as it involves making breaking changes (removing the old implementation). That makes it important to prevent these flaws. The main thing you can do to prevent problems is to keep your abstraction as simple as possible, and only extend it when there is a clear need and the addition fits well within the model. For example, the Cylinder abstraction makes it only possible to move the cylinder in or out, it is not possible to get its current state. That will only be added when it is required by some client.

How to recognize a good abstraction

As I said before, it is difficult to recognize flaws in an abstraction until you need to use it in new ways. But still, there are a number of properties that are found in good abstractions:

No temporal dependencies: you don’t need to call methodA first before methodB unless methodB requires methodA‘s output.
Tell, don’t ask: favor sending commands over requesting information.
Methods in the interface are cohesive: they operate on the same concept.
Command Query Separation: either a method performs an operation, or it returns some information, but not both.
All methods are at the same abstraction level. No disk optimization or caching functions in the file system abstraction.
Impossible things cannot be expressed. No moveTo method in Cylinder when you also have cylinders that cannot be positioned.