The initialize anti-pattern
There is a general rule that a constructor should not do actual work. If someone creates an object, there must be no file access, for example. This rule makes sense in more than one aspect. For example, it makes it more difficult to test the class, but also to provide more flexibility later on.
Many people are aware of this rule. But every now and then, you run into some kind of dilemma: what if your class needs some kind of actual initialization? Like, your class operates on some kind of database table, and you need to initialize the table once? Often, people solve this dilemma by introducing a public Initialize()
method, that does the actual initialization. I think this is not a good idea, and I’m going to explain why.
What is it?
There may be many reasons that something needs to be initialized before objects of a class can be created. For instance, a class ProductUpdater
that creates the database table when it does not exist yet. Using the initialize pattern, we might model it as follows:
public class ProductUpdater
{
public ProductUpdater(Database database)
{ ... }
public void Initialize()
{
/* Create the table if it does not exist yet */
}
public void Insert(Product newProduct)
{
/* Insert into the table */
}
public void Remove(Product oldProduct)
{
/* Remove from the table */
}
}
An instance can be created using:
var updater = new ProductUpdater(database);
updater.Initialize();
updater.Insert(product)
What’s wrong with it?
There are many things wrong with this pattern. I’ll list a few here.
If you look at the public interface of the class ProductUpdater
, you’ll see the following members: Initialize()
, Insert()
and Remove()
. From just the method names, it is clear what Insert()
and Remove()
do, their intention is clear. This is however not the case for Initialize()
. It does not fit in the contract, because it makes no promise whatsoever. What would Initialize mean? Would it interact with the database? Or just set up some internal things in the class? The method name is just fuzzy.
This immediately leads to questions when ProductUpdater
objects are used. Do I really have to call Initialize()
? What would happen if I forget? Would I always notice it? Can I safely call Initialize()
multiple times on an object? And if so, would that impact performance?
To make things worse, when object structures get more complex, the initialization calls get more obscure as well. Take the next example and suppose classes A
, B
and C
all have Initialize()
methods:
public class A
{
public A(B dep1, C dep2) { ... }
}
Would class A
be responsible for calling B.Initialize()
and C.Initialize()
? Or are they guaranteed to be initialized when passed to A
‘s constructor? Or can B
and C
be assumed to be initialized when A.Initialize()
is called? This problem is made even worse when the calls to Initialize()
are separated from the object creation.
Not only initialization becomes more complex, but also the implementation of the class. You need to guard against calling class members without calling Initialize()
first, and possibly also against calling Initialize()
. I see this most often solved like this:
public class ProductUpdater
{
private bool mInitialized;
public Initialize()
{
if(mInitialized)
{
// throw or exit function
}
// initialize here
}
public void Insert(Product newProduct)
{
if(!mInitialized)
throw new NotInitializedException();
// operational code here
}
public void Remove(Product oldProduct)
{
if(!mInitialized)
throw new NotInitializedException();
// operational code here
}
}
But this is not a 100% safe fix, as it might fail in case the object is accessed from multiple threads. As you can also see, the same error handling code must be included in every public member of the class, so there is repetitive work.
One last drawback that I want to mention is that this pattern introduces temporal coupling. As a developer, you need to remember to call Initialize()
every time you create a new instance. There is nothing that helps you remember it at design time. The compiler won’t complain if you do this:
var updater = new ProductUpdater(database);
updater.Insert(product);
updater.Initialize();
Now this is a trivial example where the issue is easy to spot, but it still postpones the feedback to runtime, even though this is a coding issue. When initialization becomes more complex, this issue will become more prevalent. This, in turn, makes the code hard to change.
What to do instead?
I think that the initializer pattern is an anti-pattern because it solves a problem in the wrong way. It tries to shoehorn some functionality in a class that has basically another responsibility.
Using this insight, the solution is simply to split off the additional functionality from the class. This can be done by modeling the table as a separate class:
public class Database
{
...
public Table GetTable(string name) { ... }
}
public class Table { ... }
public class ProductUpdater
{
private readonly Table mProducts;
public ProductUpdater(Table productTable)
{
mProducts = productTable;
}
public void Insert(Product newProduct)
{
/* Insert into the table */
}
public void Remove(Product oldProduct)
{
/* Remove from the table */
}
}
You cannot have a Table
instance without an actual database table, so creation of the table is guaranteed by design. The need for an Initialize()
method is gone, and all its associated drawbacks are gone as well.
It is interesting to see that this solution effectively inverses the call order. With the Initialize()
implementation, the call order is as follows: create ProductUpdater
→ Initialize()
to create the table → use object. Introducing the Table
dependency inverses this to: create Table
→ create ProductUpdater
→ use object.