Optimizations and optimizations

When it comes to computer technology, we live in incredible times. Even a low end smartphone has much more computing power than the first supercomputer. Even older mid-range smart phones are able to perform over 100 MFLOPS (million floating point operations per second) per core. A recent phone can easily achieve over a billion floating point operations per second, all cores combined. And that’s just a smartphone. Desktop or server processor are much more powerful than those.

Then why do we have to wait so often? Why is it that programs don’t respond every now and then? How come that many programs take so long for a search operation on relatively little pieces of data? Apparently, it is easy to write a program that runs slow.

It is clear that optimizations are a thing. If your program takes too much time to run, then users might complain. Nobody wants to wait, as it worsens the user experience with your program.

But how to optimize? Or, in other words: how to write software that responds fast enough to the user?

Low level optimization

Decades ago, when computers were slow and had very little RAM, developers had to squeeze every bit of performance that they could get. A big part of their time was devoted to optimize for the hardware and decrease RAM usage. Reuse memory where possible, use low-level language constructs and assembler. Developers had to know the how much time specific instructions cost. That a division by two performs bad and should be replaced by a right shift operation. I’m talking about tricks that trade performance for code readability, like application of Duff’s device or bit fiddling.

Now I’m certainly not saying that you should neglect performance at all, but trading performance for readability is generally not a good idea. Code is meant to be read by humans (otherwise we wouldn’t have invented higher level languages) and your teammates (and your future self) certainly would like to be able to maintain what you write. There is no point in having a piece of code that is super fast in doing something that you don’t want and cannot change. Therefore I think that the three rules of optimization are very sensible here:

Don’t

Don’t yet

Profile first

There are multiple reasons to delay optimizations as long as possible. As stated before, low level optimization harms maintainability. It also costs time to optimize. Therefore, you don’t want to waste your time on optimizing code while not needed.

There’s also another side: humans turn out to be notoriously bad about which parts of the code are bottlenecks. So you may think that some piece will run slow and cause a performance issue, but in reality it might turn out that the actual problem lies somewhere else. Only a good measurement can reveal this, so this requires profiling the application. Postponing optimizations until they are needed prevents you from wasting your time optimizing code without any result.

Also, modern compilers nowadays do a very good job at optimizing the source code. A compiler knows how the target architecture works, but also can analyze how the pieces of code are connected together. For example, a compiler can combine information about the call graph with knowledge of the hardware architecture to decide whether to inline a method. This is information that humans not always have, except if you want to reanalyze your source code at every compilation. If you compile for an intermediate language (.NET or Java), then the runtime environment can also decide to perform optimizations based on how the program is used. You simply cannot do that as a developer.

Taken together, I try to stay away from low level optimizations unless I really have to. I’ve never experienced problems with that.

But how about that introduction then? If you don’t optimize, then how will your program run fast enough?

Optimization by design

The other type of optimization is what I would call “optimization by design”: working towards a design that is efficient for the computer to run (but also for the developer to work with). Too often have I seen design choices that result in an enormous amount of overhead.

Examples of these are putting responsibilities in the wrong places, so that invariants must be validated everywhere. Encapsulating responsibilities enforces the invariants by design. A good design strives to make breaking invariants impossible, making checks redundant.

A comparable situation is null checks: instead of checking for null everywhere, you can by design guarantee that objects never point to null. For example, null checking may be done once in the composition root, or you can choose to apply the Null Object pattern.

Algorithms and data types

As part of this, I also consider optimizations in the algorithms to use. A classical example is choosing which sorting algorithm to use. But also choosing which data type is important. The other day, I came across a module that was based on data stored in a list, but the most performed operation was searching based on some key. Searching in a list boils down to a linear search; on average, half of the elements of the list must be inspected until the item is found. A dictionary can perform lookups much faster. For instance, considering 1000 elements in the list, then a lookup in the list would require on average 500 comparisons, but a lookup in the dictionary would require ²log 1000 = 10 comparisons at most. So in this particular example, a dictionary outperforms a list by an order of magnitude. (Keep in mind that this is also specific to the size of the collection. If you typically store only 3 elements, then a list might perform better. Also, if you do more insertions and removals than lookups, then a list also still perform better than a dictionary. And the order in the list must not matter of course.)

Another situation was an algorithm that had to keep track of detection of a set of different events. A particular event could occur more than once, but had to be counted only once. At first, I added events to a list. This was obviously not very handy: each time an event occurs, I had to scan the list to check if it was not there already, then add it. And because I had to await a set of events, I also had to scan the list again to find out whether all events were received or not. Soon I found myself refactoring the list into an hash set. Insertion in a hash set does not care whether the item already exists in the set, it just overwrites existing items. But insertion straight away did save me the effort of scanning first, so it was much faster after all.

Low level optimization

Optimization by design

Algorithms and data types

Leave a Reply Cancel reply