Much Ado About Null

(The code samples in this post can also be found in this github repo. You can also click on any code sample to open it in Scastie.)

My name is Brad and I’m a software engineer at Axon. Something that’s interesting about engineering at Axon is that we write most of our backend services in Scala.

A little while ago, I made a comment on a new engineer’s pull request saying, “Hey, this value is being returned from a Java library that we’re using and could be null under some circumstances. You may want to consider wrapping it in an Option to handle that case.” They responded, “Sure thing, but out of curiosity, why do Scala developer hate null values so badly?”

I was honestly a little taken aback by that question. During my first week at the company, I also came across the No Null Values guidance while writing a pull request and, even though that advice felt quite intuitive, I had just kind of accepted it without really questioning where that norm came from.

I had to go do some research to answer the question. This post is about what I found out.

Let’s Talk About Type Systems

Before we get into the nitty gritty about null specifically, we need to understand the type ecosystem that it fits into.

In the Scala type system (and in the underlying JVM type system), one way that we can think of a type is as a collection of data and the operations we can perform on that data. For example, the Int type has 32 bits of data representing a whole number and has operations like +, -, >, <, and == among others. At a slightly higher level of abstraction, the Array type has a buffer of data representing a sequence of elements and methods like size, head, tail, append, and others which allow us to read and modify the elements.

A key feature of object oriented type systems is polymorphism, which allows us to extend types to (maybe naively) share behavior and avoid duplicating code:

In this example, we have a base class, Animal, which is extended by a child class, Dog. We’re obviously able to assign a Dog instance to a value whose type is Dog, spot, and invoke the methods of the type Dog (i.e. bark) on that value. This makes sense because spot’s type is Dog. We’re also able to access the methods and properties of the Animal type, since any instance of Dog is also an instance of Animal because of polymorphism. This is why we’re able to invoke spot.eat, even though an eat method isn’t (directly) defined on the Dog type.

Additionally, we’re also able to assign Dog instances to values whose type is Animal, like someAnimal. Again, this is because any instance of Dog is also an instance of Animal. There’s an important caveat to note though. Even though the actual type of the instance assigned to someAnimal is Dog, we only have access to the properties and methods of Animal because the type of the value someAnimal is only Animal. This is why spot.bark works but someAnimal.bark causes an error, even though both of those values are pointing at the same instance.

Let’s Talk About Type Hierarchies

In Scala, all types descend (directly or indirectly) from a common base class called Any, which is very similar to Java’s Object:

A simplified diagram of the Scala type system

Most of the primitive types extend AnyVal (short for Any Value Type) and nearly everything else, including most of the custom classes that you’ve ever written, extends AnyRef (short for Any Reference Type).

It’s time for a pop quiz. Without looking, where do you think Null fits into this type hierarchy?

(Side note - Null with a capital ‘N’ is the type of the null with a lowercase ‘n’ and null with a lowercase ‘n’ is the singleton instance of of the Null type.)

Tricky, right? My first guess would be maybe off in the corner by itself. It doesn’t have any data, methods, or properties, so it seems unlikely that it would be a subtype of either AnyVal or AnyRef. However, we’re also given that everything descends from Any at some point, so it has to exist somewhere in this hierarchy. Maybe it’s just dangling off of Any?

If that were true though, how do we explain the following?

The value myList has a type of List[Int], and yet we’ve somehow assigned its value to null. The rules of polymorphism say that you can only assign value of type A to an instance of A or a subtype of A. Null isn’t a List, so the only remaining possibility is that it must be a subtype of List.

Before you think about that too hard, it gets worse. Null is somehow also a subtype of both Either and this throwaway case class too:

Thankfully, there is a limit to this insanity. If we try to assign null to an Int, Boolean, or other types that extend AnyRef, we start seeing compilation errors:

This means that Null fits into our type hierarchy roughly like this:

And it turns out that our intuition here is right on the money. From the Scala docs:

Null is […] at the bottom of the Scala type hierarchy.

Null is the type of the null literal. It is a subtype of every type except those of value classes. Value classes are subclasses of AnyVal, which includes primitive types such as Int, Boolean, and user-defined value classes.

Since Null is not a subtype of value types, null is not a member of any such type. For instance, it is not possible to assign null to a variable of type scala.Int.

The is the first half of our answer: Null is a subtype of any AnyRef.

The L in SOLID

There’s a rule in object oriented programming called the Liskov Substitution Principle. In layman’s terms, it says (among other things) that if we have a parent type P and a child type C, C has to provide at least as much behavior as P does.

In our earlier example, Dog extends Animal, and so Dog gets Animal’s eat method for free. If we overrode the eat method on Dog to to also print “*wags tail*”, we have to be careful to still preserve the original behavior of the base method:

This is because if we don’t preserve the behavior of the base method, any other part of our program which depends on that base behavior could break. In other words, the child type Dog wouldn’t be substitutable with it’s parent type Animal.

The Contradiction of Null

Let’s again consider the Null type. We’ve already established that the type system considers Null to be a subtype of any reference type. Behaviorally though, it’s a different story. Null doesn’t provide any of the behavior of any of the methods of any of it’s parent types. (In fact, the only behavior it provides is throwing NullPointerExceptions.)

In effect, Null subtypes all reference types syntactically but not semantically. This contradiction, together with the fact that there’s no way to opt out of this behavior, is the reason that functional programmers have such a beef with null.

To give a concrete example, lets say that I’m writing a library and one of my functions takes a reference type as a parameter. There’s a special a value, null, that my function will accept as input syntactically but which will also cause my function to implode if it isn’t specifically handled. The only thing I can do about that is to litter my code with expressions that check for that one value and then throw an exception to communicate that whoever called my method with null shouldn’t have done that:

So What Do You Do About That?

The general solution to the problems presented by null pointers is a design pattern called the Null Object Pattern. To implement the Null Object Pattern, you define an interface and extend it with Real Object and Null Object classes, representing the presence and absence of data respectively:

The Null Object Pattern makes it easy to specify whether your function accepts nullable values or not. If you want your method to accept potentially nullable values, you accept an argument of type Object Interface, which can then either be a Real Object or Null Object. If you do not want to accept nullable values, your method should only accept an argument of type Real Object.

The Null Object Pattern also gives us much greater flexibility around how the Null Object behaves. Whereas the null pointer only throws a NullPointerException any time we try to dereference it, the properties and methods of the Null Object can be implemented however we please. We could throw an exception, but we could also return another Null Object, emit logs or metrics, do nothing at all, or anything else that’s helpful in our context.

Let’s Talk About Options

Scala implements the Null Object Pattern using a generic type called Option[A]. Option[A] is implemented by two child types, Some[A] and None, which are analogous to the Real Object and Null Object types.

To operate on the data contained within an Option, we have to handle the possibility that the data doesn’t exist. The way that we do that is by using the map and getOrElse methods:

The map method is a higher order function which takes another function as input. If the Option is defined (i.e. is Some instead of None), it applies the given function to the wrapped value and returns a new Option with the result. If the Option is undefined, it just stays None with no transformation being applied. At the end of the calculation (which could be chained through multiple map invocations), we use getOrElse, which either unwraps the transformed value or returns the default value that we provide if the Option is undefined.

Like any good successor, the Option type also provide a convenience method for adapting it’s predecessor. The Option constructor can wrap arbitrary values, converting a non-null values to Somes while converting null to None:

If we ever find a method that returns nullable values, we can wrap the output in an Option so that we can handle the potentially missing data in a functional way.

Scala’s Null Containment Strategy

Some of you are now probably asking, “So if Options are the new hotness, why does Scala bother with Null at all? Why did the language designers bother including it in the type system when they have a vastly superior alternative?”

The answer to that question is that Null is part of the price we pay for compatibility with the JVM ecosystem. Not having to re-write every Java library is a critical feature for the Scala language. A necessary cost of that compatibility is accepting the baggage that comes with it. Part of that baggage is null.

The Scala community hasn’t taken this problem lying down though. Rather, the community has settled on the following approach for handling this piece of baggage:

Do not introduce any new nullable values
Immediately wrap existing nullable values in Options

Ultimately, this containment strategy usually takes the form of Pull Request comments that say “Hey, this value could be null. You should consider wrapping it in an Option.”

And with that, we’ve come full circle.

Wrapping Up

To summarize what we’ve learned today:

Null is a subtype of every reference type syntactically but not semantically
This contradiction breaks polymorphism, meaning that we have to litter our code will null checks.
The problems of null pointers are solved generally by the Null Object pattern and in Scala by the Option type
Scala includes the Null type to maintain cross compatibility with the JVM ecosystem
Scala’s Null Containment Strategy is to never introduce new nullable values and to convert existing nullable values to Options as soon as they are encountered.

I hope y’all learned something today.

Happy Scala-ing!