Syntax and semantics
When you compile your source code, the compiler interprets your program based on the language’s syntax, which is defined as a set of rule that determine what valid constructions are. For example, the line int y = 0; is a valid statement in Java, C++ or C# when it appears inside a method definition. If your program does not obey the language’s syntax, then compilation will fail.
In order for your program to do something useful, the rules must also have some meaning. What does a construction mean? What can I do with it? That is what semantics is about. So semantics is about the meaning of parts of the language, while syntax is about how to correctly write them down.
It is clear that syntax and semantics are tightly correlated. It is pointless to define a language construct without giving it a meaning. Equally pointless is having a language feature that cannot be expressed.
Read more: Syntax and semanticsList
But syntax and semantics are also important in higher level constructions. Take for example the next interface:
public interface ISomething<T> {
public void Add(T item);
public void Remove(T item);
public bool Contains(T item);
public int Size();
}
Every implementation of ISomething<T> must implement the listed methods: Add, Remove, Contains and Size. Syntactically, it defines two things: an implementation of ISomething<T> must implement these exact methods, and client code must call the methods exactly as defined, with the correct parameters.
But there’s also semantics: what does this actually mean? What can I do with this interface?
Suppose ISomething<T> is actually some kind of list. In that case, we could define the following behavior:
- After calling
Add(item), thenitemis part of the list. We can express this more formally: after callingAdd(item), thenContains(item)will returntrue. The return value ofSize()will increase by 1. - When I create a new instance, then the list is empty (
Size()returns0). CallingContains(item)for any item will returnfalse. - After I call
Remove(item), thenContains(item)will returnfalseif no occurrences ofitemare in the list anymore. - If I call
Add(item)multiple times, then the item will be added multiple times to the list. Remove(item)will remove the first occurrence ofitemfrom the list, or do nothing ifitemwas not part of the list.
We can define more rules. We might also define different rules. But the important thing is that every implementation of ISomething must adhere to these rules, because the client code strictly depends on it. (Note that the rules above are expressed in terms of how a consumer can use the list, I have said nothing about a possible implementation.)
Set
A set is a collection that can contain an item at most one time: an item is a member of a set, or it isn’t. We can also define an interface to describe a set.
public interface ISet<T> {
public void Add(T item);
public void Remove(T item);
public bool Contains(T item);
public int Size();
}
Perhaps surprisingly, this interface is almost the same as ISomething<T>, it differs only in its name. However, since this is a set, it may behave differently. So it may have different semantics:
- After calling
Add(item), thenitemis part of the set. We can express this more formally: after callingAdd(item), thenContains(item)will returntrue. The return value ofSize()will increase by 1 if the item was not added before. - When I create a new instance, then the set is empty (
Size()returns0). CallingContains(item)for any item will returnfalse. - After I call
Remove(item), thenContains(item)will returnfalseregardless of whether the item was added before. - If I call
Add(item)multiple times, then subsequent calls do nothing. Remove(item)will remove theitemfrom the set, or do nothing ifitemwas not part of the set.
Replacing implementations
Consuming code can use a set in a different way from a list. If you know you have a set, you can call Add as often as you like; you know for sure that the item is added at most once. So there is no need to call Contains before to avoid adding multiple times. Also the meaning of Size() has a subtly different meaning: for a list, it means the total number of items, including duplicates; but for a set, it means the number of unique items because duplicates don’t occur in the set.
Since a set behaves differently from a list, this also means that you cannot just replace a list by a set, even when their interfaces are syntactically the same: semantically they are different, so instances behave differently.
Mocking
A number of mocking libraries exist, that can provide you an instance of a given interface. For example, look at this snippet that uses the NSubstitute mocking library:
var list = Substitute.For<IList<int>>();
list.Add(1);
list.Add(2);
list.Contains(1); // returns false
list.Size(); // returns 0
NSubstitute can create an object for the given IList<int> interface. The resulting object is however not a list at all. It is an object that satisfies the syntax of the IList<int> interface (so it can be passed around as such), but it makes no effort whatsoever to satisfy its semantics. NSubstitute generates a mock implementation: it records calls, and you can test that certain calls were made. But it does not implement a list at all.
This shows a problem with mocking libraries: although they make it easy to create an interface implementation, that implementation is often broken. Therefore, I prefer to pass around actual implementations whenever it is reasonably possible.
Liskov substitution
The above is an example of the Liskov Substitution Principle. It states that you can only change an implementation of an interface if the outside behavior stays the same.
Note also that it is possible to make a class Set<T> that implements ISomething<T>. The compiler won’t warn you, but you’re still violating the Liskov Substitution Principle. LSP is a semantic constraint, it is about what you can expect from the class; this means that the compiler cannot validate against this principle. That is also why a mock of ISomething<T> technically works.
Bonus: semantics of Equals
Sometimes we have to implement an Equals method on a class. Have you thought about the rules that you need this implementation to satisfy? Every Equals implementation must conform to (at least) the next semantic rules:
- If
a.Equals(b), thenb.Equals(a) - If
a.Equals(b)andb.Equals(c), thena.Equals(c)must betrueas well a.Equals(a)must always returntrue- If
a.Equals(b), thena.GetHashCode() == b.GetHashCode() - If
ais notnull, thena.Equals(null)must returnfalse - If a type
Binherits fromA, then things get interesting.b.Equals(a)must returnfalsebecauseacannot always be substituted forb, hencea.Equals(b)must also returnfalse. In short:a.Equals(b)must only return ifaandbare the exact same type.