Syntax and semantics

When you compile your source code, the compiler interprets your program based on the language’s syntax, which is defined as a set of rule that determine what valid constructions are. For example, the line int y = 0; is a valid statement in Java, C++ or C# when it appears inside a method definition. If your program does not obey the language’s syntax, then compilation will fail.

In order for your program to do something useful, the rules must also have some meaning. What does a construction mean? What can I do with it? That is what semantics is about. So semantics is about the meaning of parts of the language, while syntax is about how to correctly write them down.

It is clear that syntax and semantics are tightly correlated. It is pointless to define a language construct without giving it a meaning. Equally pointless is having a language feature that cannot be expressed.

Read more: Syntax and semantics

List

But syntax and semantics are also important in higher level constructions. Take for example the next interface:

public interface ISomething<T> {
    public void Add(T item);
    public void Remove(T item);
    public bool Contains(T item);
    public int Size();
}

Every implementation of ISomething<T> must implement the listed methods: Add, Remove, Contains and Size. Syntactically, it defines two things: an implementation of ISomething<T> must implement these exact methods, and client code must call the methods exactly as defined, with the correct parameters.

But there’s also semantics: what does this actually mean? What can I do with this interface?

Suppose ISomething<T> is actually some kind of list. In that case, we could define the following behavior:

  • After calling Add(item), then item is part of the list. We can express this more formally: after calling Add(item), then Contains(item) will return true. The return value of Size() will increase by 1.
  • When I create a new instance, then the list is empty (Size() returns 0). Calling Contains(item) for any item will return false.
  • After I call Remove(item), then Contains(item) will return false if no occurrences of item are in the list anymore.
  • If I call Add(item) multiple times, then the item will be added multiple times to the list.
  • Remove(item) will remove the first occurrence of item from the list, or do nothing if item was not part of the list.

We can define more rules. We might also define different rules. But the important thing is that every implementation of ISomething must adhere to these rules, because the client code strictly depends on it. (Note that the rules above are expressed in terms of how a consumer can use the list, I have said nothing about a possible implementation.)

Set

A set is a collection that can contain an item at most one time: an item is a member of a set, or it isn’t. We can also define an interface to describe a set.

public interface ISet<T> {
    public void Add(T item);
    public void Remove(T item);
    public bool Contains(T item);
    public int Size();
}

Perhaps surprisingly, this interface is almost the same as ISomething<T>, it differs only in its name. However, since this is a set, it may behave differently. So it may have different semantics:

  • After calling Add(item), then item is part of the set. We can express this more formally: after calling Add(item), then Contains(item) will return true. The return value of Size() will increase by 1 if the item was not added before.
  • When I create a new instance, then the set is empty (Size() returns 0). Calling Contains(item) for any item will return false.
  • After I call Remove(item), then Contains(item) will return false regardless of whether the item was added before.
  • If I call Add(item) multiple times, then subsequent calls do nothing.
  • Remove(item) will remove the item from the set, or do nothing if item was not part of the set.

Replacing implementations

Consuming code can use a set in a different way from a list. If you know you have a set, you can call Add as often as you like; you know for sure that the item is added at most once. So there is no need to call Contains before to avoid adding multiple times. Also the meaning of Size() has a subtly different meaning: for a list, it means the total number of items, including duplicates; but for a set, it means the number of unique items because duplicates don’t occur in the set.

Since a set behaves differently from a list, this also means that you cannot just replace a list by a set, even when their interfaces are syntactically the same: semantically they are different, so instances behave differently.

Mocking

A number of mocking libraries exist, that can provide you an instance of a given interface. For example, look at this snippet that uses the NSubstitute mocking library:

var list = Substitute.For<IList<int>>();
list.Add(1);
list.Add(2);
list.Contains(1);   // returns false
list.Size();        // returns 0

NSubstitute can create an object for the given IList<int> interface. The resulting object is however not a list at all. It is an object that satisfies the syntax of the IList<int> interface (so it can be passed around as such), but it makes no effort whatsoever to satisfy its semantics. NSubstitute generates a mock implementation: it records calls, and you can test that certain calls were made. But it does not implement a list at all.

This shows a problem with mocking libraries: although they make it easy to create an interface implementation, that implementation is often broken. Therefore, I prefer to pass around actual implementations whenever it is reasonably possible.

Liskov substitution

The above is an example of the Liskov Substitution Principle. It states that you can only change an implementation of an interface if the outside behavior stays the same.

Note also that it is possible to make a class Set<T> that implements ISomething<T>. The compiler won’t warn you, but you’re still violating the Liskov Substitution Principle. LSP is a semantic constraint, it is about what you can expect from the class; this means that the compiler cannot validate against this principle. That is also why a mock of ISomething<T> technically works.

Bonus: semantics of Equals

Sometimes we have to implement an Equals method on a class. Have you thought about the rules that you need this implementation to satisfy? Every Equals implementation must conform to (at least) the next semantic rules:

  • If a.Equals(b), then b.Equals(a)
  • If a.Equals(b) and b.Equals(c), then a.Equals(c) must be true as well
  • a.Equals(a) must always return true
  • If a.Equals(b), then a.GetHashCode() == b.GetHashCode()
  • If a is not null, then a.Equals(null) must return false
  • If a type B inherits from A, then things get interesting. b.Equals(a) must return false because a cannot always be substituted for b, hence a.Equals(b) must also return false. In short: a.Equals(b) must only return if a and b are the exact same type.

Leave a Reply

Your email address will not be published. Required fields are marked *