Much Ado About Null
(The code samples in this post can also be found in this github repo. You can also click on any code sample to open it in Scastie.)
My name is Brad and I’m a software engineer at Axon. Something that’s interesting about engineering at Axon is that we write most of our backend services in Scala.
A little while ago, I made a comment on a new engineer’s pull request saying, “Hey, this value is being returned from a Java library that we’re using and could be null
under some circumstances. You may want to consider wrapping it in an Option
to handle that case.” They responded, “Sure thing, but out of curiosity, why do Scala developer hate null
values so badly?”
I was honestly a little taken aback by that question. During my first week at the company, I also came across the No Null Values guidance while writing a pull request and, even though that advice felt quite intuitive, I had just kind of accepted it without really questioning where that norm came from.
I had to go do some research to answer the question. This post is about what I found out.
Let’s Talk About Type Systems
Before we get into the nitty gritty about null
specifically, we need to understand the type ecosystem that it fits into.
In the Scala type system (and in the underlying JVM type system), one way that we can think of a type is as a collection of data and the operations we can perform on that data. For example, the Int
type has 32 bits of data representing a whole number and has operations like +
, -
, >
, <
, and ==
among others. At a slightly higher level of abstraction, the Array type has a buffer of data representing a sequence of elements and methods like size
, head
, tail
, append
, and others which allow us to read and modify the elements.
A key feature of object oriented type systems is polymorphism, which allows us to extend types to (maybe naively) share behavior and avoid duplicating code:
In this example, we have a base class, Animal
, which is extended by a child class, Dog
. We’re obviously able to assign a Dog
instance to a value whose type is Dog
, spot
, and invoke the methods of the type Dog
(i.e. bark
) on that value. This makes sense because spot
’s type is Dog
. We’re also able to access the methods and properties of the Animal
type, since any instance of Dog
is also an instance of Animal
because of polymorphism. This is why we’re able to invoke spot.eat
, even though an eat
method isn’t (directly) defined on the Dog
type.
Additionally, we’re also able to assign Dog
instances to values whose type is Animal
, like someAnimal
. Again, this is because any instance of Dog
is also an instance of Animal
. There’s an important caveat to note though. Even though the actual type of the instance assigned to someAnimal
is Dog
, we only have access to the properties and methods of Animal
because the type of the value someAnimal
is only Animal
. This is why spot.bark
works but someAnimal.bark
causes an error, even though both of those values are pointing at the same instance.
Let’s Talk About Type Hierarchies
In Scala, all types descend (directly or indirectly) from a common base class called Any, which is very similar to Java’s Object
:
Most of the primitive types extend AnyVal
(short for Any Value Type) and nearly everything else, including most of the custom classes that you’ve ever written, extends AnyRef
(short for Any Reference Type).
It’s time for a pop quiz. Without looking, where do you think Null
fits into this type hierarchy?
(Side note - Null
with a capital ‘N’ is the type of the null
with a lowercase ‘n’ and null
with a lowercase ‘n’ is the singleton instance of of the Null
type.)
Tricky, right? My first guess would be maybe off in the corner by itself. It doesn’t have any data, methods, or properties, so it seems unlikely that it would be a subtype of either AnyVal
or AnyRef
. However, we’re also given that everything descends from Any
at some point, so it has to exist somewhere in this hierarchy. Maybe it’s just dangling off of Any
?
If that were true though, how do we explain the following?
The value myList
has a type of List[Int]
, and yet we’ve somehow assigned its value to null
. The rules of polymorphism say that you can only assign value of type A to an instance of A or a subtype of A. Null
isn’t a List
, so the only remaining possibility is that it must be a subtype of List
.
Before you think about that too hard, it gets worse. Null
is somehow also a subtype of both Either
and this throwaway case class too:
Thankfully, there is a limit to this insanity. If we try to assign null
to an Int
, Boolean
, or other types that extend AnyRef
, we start seeing compilation errors:
This means that Null
fits into our type hierarchy roughly like this:
And it turns out that our intuition here is right on the money. From the Scala docs:
Null is […] at the bottom of the Scala type hierarchy.
Null is the type of the null literal. It is a subtype of every type except those of value classes. Value classes are subclasses of AnyVal, which includes primitive types such as Int, Boolean, and user-defined value classes.
Since Null is not a subtype of value types, null is not a member of any such type. For instance, it is not possible to assign null to a variable of type scala.Int.
The is the first half of our answer: Null is a subtype of any AnyRef
.
The L in SOLID
There’s a rule in object oriented programming called the Liskov Substitution Principle. In layman’s terms, it says (among other things) that if we have a parent type P and a child type C, C has to provide at least as much behavior as P does.
In our earlier example, Dog
extends Animal
, and so Dog
gets Animal
’s eat
method for free. If we overrode the eat
method on Dog
to to also print “*wags tail*
”, we have to be careful to still preserve the original behavior of the base method:
This is because if we don’t preserve the behavior of the base method, any other part of our program which depends on that base behavior could break. In other words, the child type Dog
wouldn’t be substitutable with it’s parent type Animal
.
The Contradiction of Null
Let’s again consider the Null
type. We’ve already established that the type system considers Null
to be a subtype of any reference type. Behaviorally though, it’s a different story. Null doesn’t provide any of the behavior of any of the methods of any of it’s parent types. (In fact, the only behavior it provides is throwing NullPointerExceptions.)
In effect, Null subtypes all reference types syntactically but not semantically. This contradiction, together with the fact that there’s no way to opt out of this behavior, is the reason that functional programmers have such a beef with null
.
To give a concrete example, lets say that I’m writing a library and one of my functions takes a reference type as a parameter. There’s a special a value, null
, that my function will accept as input syntactically but which will also cause my function to implode if it isn’t specifically handled. The only thing I can do about that is to litter my code with expressions that check for that one value and then throw an exception to communicate that whoever called my method with null
shouldn’t have done that:
So What Do You Do About That?
The general solution to the problems presented by null pointers is a design pattern called the Null Object Pattern. To implement the Null Object Pattern, you define an interface and extend it with Real Object and Null Object classes, representing the presence and absence of data respectively:
The Null Object Pattern makes it easy to specify whether your function accepts nullable values or not. If you want your method to accept potentially nullable values, you accept an argument of type Object Interface, which can then either be a Real Object or Null Object. If you do not want to accept nullable values, your method should only accept an argument of type Real Object.
The Null Object Pattern also gives us much greater flexibility around how the Null Object behaves. Whereas the null pointer only throws a NullPointerException
any time we try to dereference it, the properties and methods of the Null Object can be implemented however we please. We could throw an exception, but we could also return another Null Object, emit logs or metrics, do nothing at all, or anything else that’s helpful in our context.
Let’s Talk About Options
Scala implements the Null Object Pattern using a generic type called Option[A]
. Option[A]
is implemented by two child types, Some[A]
and None
, which are analogous to the Real Object and Null Object types.
To operate on the data contained within an Option
, we have to handle the possibility that the data doesn’t exist. The way that we do that is by using the map
and getOrElse
methods:
The map
method is a higher order function which takes another function as input. If the Option
is defined (i.e. is Some
instead of None
), it applies the given function to the wrapped value and returns a new Option
with the result. If the Option
is undefined, it just stays None
with no transformation being applied. At the end of the calculation (which could be chained through multiple map
invocations), we use getOrElse
, which either unwraps the transformed value or returns the default value that we provide if the Option
is undefined.
Like any good successor, the Option
type also provide a convenience method for adapting it’s predecessor. The Option
constructor can wrap arbitrary values, converting a non-null values to Somes
while converting null
to None
:
If we ever find a method that returns nullable values, we can wrap the output in an Option
so that we can handle the potentially missing data in a functional way.
Scala’s Null Containment Strategy
Some of you are now probably asking, “So if Options are the new hotness, why does Scala bother with Null
at all? Why did the language designers bother including it in the type system when they have a vastly superior alternative?”
The answer to that question is that Null
is part of the price we pay for compatibility with the JVM ecosystem. Not having to re-write every Java library is a critical feature for the Scala language. A necessary cost of that compatibility is accepting the baggage that comes with it. Part of that baggage is null
.
The Scala community hasn’t taken this problem lying down though. Rather, the community has settled on the following approach for handling this piece of baggage:
Do not introduce any new nullable values
Immediately wrap existing nullable values in Options
Ultimately, this containment strategy usually takes the form of Pull Request comments that say “Hey, this value could be null. You should consider wrapping it in an Option.”
And with that, we’ve come full circle.
Wrapping Up
To summarize what we’ve learned today:
Null is a subtype of every reference type syntactically but not semantically
This contradiction breaks polymorphism, meaning that we have to litter our code will null checks.
The problems of null pointers are solved generally by the Null Object pattern and in Scala by the
Option
typeScala includes the Null type to maintain cross compatibility with the JVM ecosystem
Scala’s Null Containment Strategy is to never introduce new nullable values and to convert existing nullable values to Options as soon as they are encountered.
I hope y’all learned something today.
Happy Scala-ing!