WTF (What the Function) is a For-Comprehension

(An earlier version of this post had embedded Scastie windows. While really cool, this was greatly affecting page performance and I had to replace them with static images. You can still reach the Scastie windows by clicking on each image.)

(The code samples in this post can also be found in this github repo.)

My name is Brad and I’m a software engineer at Axon. Something that’s interesting about engineering at Axon is that we use Scala to build our backend services. I did not know Scala before starting here a year ago, and I had to pick it up along the way.

In the last couple of months, I’ve been helping onboard a couple of new engineers. Due to our atypical software stack, that means a large part of being an on-boarding buddy is also being a Scala teacher. As I’ve been reviewing what I learned about Scala the first time around, I’ve deepened my knowledge and I want to write down what I’ve (re-)learned so that others can benefit as well.

This post is about a topic that took a long time to click for me: For-Comprehensions. The big idea that I want to leave you with is that for-comprehensions are a multiplication-like operation.

Let’s Talk about For-Comprehensions

For-comprehensions were a pretty un-intuitive concept for me. A large part of my confusion grew from the partial naming collision that exists between for-comprehensions and for-loops. (It certainly didn’t help that the official Scala getting-started docs explicitly describe for-comprehensions as iterative, looping structures either.)

There are definitely cases where you could safely think about for-comprehensions as if they were nested for-loops:

Fundamentally, for-comprehensions are not for-loops, and thinking about them as if they were stops being helpful the moment you come across code that looks like this:

I think that a much better way to think about for-comprehensions is a tool that we can use to ‘multiply’ monads together.

If you’re new to Scala or functional programming, you’re probably now asking questions like “What the hell is a monad?” and “Why would you ever want to multiply them together?”

The answer to the first question is a doozy, but the (highly) condensed version is that a monad is anything with a flatMap method. Think of collections like Lists, Sets, Vectors, and Sequences as well as more abstract data types like Options, Eithers, and Trys.

As for the second question, that’s what the rest of the article is for 🙂.

Let’s talk about Sequences

To explain what I mean by ‘multiplying’ monads together, I want to walk through some examples, and I want to start with the one that I consider most intuitive: multiplying Sequences.

‘Multiplying’ sequences (and collections more generally) means permuting them, or alternatively, taking the Cartesian Product.

For example, if we wanted to list the squares of a chess board, we could do that succinctly with a for-comprehension:

If we wanted to make a 3D chess board, that’s easy too. All we would need to do is add another dimension to the comprehension and include it in the yield clause:

I want to take a moment here and really drill into the idea that for-comprehensions are a multiplication-like operation. It’s not just that the permutation of collections (like rows, columns, and layers) is formally called a Cartesian Product. It’s also that the size of the result is the product of the sizes of our input dimensions:

If we shortened the layers list to two or one elements, we would see 8 X 8 X 2 = 128 and 8 X 8 X 1 = 64 respectively.

(You’re free to try this in the linked Scastie editor, which you can reach by clicking on the image.)

What do you think would happen if we set layers to the empty list though? (Think about it for a second before trying it out yourself.)

If you said that the output size would collapse to 0, you are correct! When we empty out our layers list, its size becomes 0 and our expression for the result size becomes 8 X 8 X 0 = 0.

Guards and Assignments: A Brief Tangent

Before we get to Options and Eithers, I want to take a moment to discuss the two other important capabilities of for-comprehensions.

There are essentially three things we can do inside the braces of for-comprehensions:

  1. We can use the ‘<-‘ (or generator operator) to extract the elements of our collection (or monad).

  2. We can use ‘guards’ to filter our result set.

  3. We can bind expressions to values with the assignment operator, ‘=’.

We’ve already looked at generators in the previous sections, so let’s look at guards next.

There are frequently cases where we only want to look at some subset of a Cartesian Product. We can use guards to filter our result set based on some condition.

In our chess board example, if we only wanted to look at the squares in every other row, we could do that like this:

Similarly, if we wanted to only consider the black squares of our board, we could do that like this:

(On a stylistic note, if your guard only filters on a single dimension or generator, it’s common to put the guard after that generator as in row <- 1 to 8 if row % 2 == 0. If the guard filters on multiple dimensions, then it’s common to put it on it’s own line as in if isBlack(row, column).)

There’s a caveat to note with guards. Guards are only supported on types with a withFilter method. For example, Eithers do not have this method and therefore do not support guards in for-comprehensions.

It’s also possible to do value assignments within a for-comprehension as well. This works pretty similarly to how value assignments behave outside of for-comprehensions, but they exist within the context of each element of the Cartesian Product. For (an admittedly contrived) example, if we wanted to output the chess squares with lowercase columns instead of upper case ones, we could do that like this:

I want to call out a pretty important distinction between the generator operator, <-, and the assignment operator, =. The generator operator is used to ‘unpack’ collections (and monads generally), while the assignment operator only evaluates an expression and saves the result to a named value. This means that the generator operator adds a new dimension to our multiplication, but the assignment operator does not. Let’s look at an example:

In this example, we declare several lists, letters, numbers, and colors. In our for-comprehension, we use the <- to unpack letters into a letter value. This unwrapped the List[Char] on the right of the <- into a plain old Char on the left side. Similarly for numbers.

However, we do not unpack colors with <-. Rather, we just assign it to the value color. This has two effects:

  1. It doesn’t unwrap the type value or change it’s type in anyway. It’s a List[String] on the right side of the = and it remains a List[String] on the left.

  2. Additionally it doesn’t add a dimension to our ‘multiplication’ either. As we can see in the output, because we’re only unpacking letters (size 3) and numbers (size 3) with <- but not colors, the size of our output is just 3 X 3 = 9 instead of 3 X 3 X 3 = 27.

This distinction between <- and = is important and I have been bitten by it on multiple occasions.

Let’s talk about Options

Now back to the main thread. You may now be thinking to yourself, “Cool story, but I still have no clue what’s happening in the Either example that you gave at the top”, and that’s fair. We’re still not quite there yet, but I want to use Options in for-comprehensions as a bridge to help us get there.

If you’re new to Scala, an Option is a type that is used to represent the potential presence or absence of a value. They’re similar to null pointers but they’re much safer to use. Option[A] is extended by two child types, Some[A] and None, which represent the presence and absence of the value respectively:

If you’re still not feeling confident with Options, I would recommend checking out some resources like the Scala docs and Rock the JVM’s tutorial on Options before continuing on.

The key to using Options in a for-comprehension is understanding that Options can be conceptualized as a special type of List, one with exactly 0 or 1 elements at any time. For the purposes of a for-comprehension, Some(x) behaves exactly the same as List(x) and None behaves exactly the same as List() (also known as Nil). Let’s look at a another example.

Let’s say that we have a business that allows people to order bottles of wine through the mail. Let’s also say that we’re using a micro-service architecture. We have an Order Placement service that is responsible for validating and placing orders. This service calls out to an Inventory service to make sure we have sufficient stock to fill the order, a Payments service to place a hold on the customers card and verify they can pay, and a Compliance service which helps us navigate the web of liquor laws in the US.

Because this is a micro-service architecture, the web calls that Order Placement makes to the other three services could potentially fail for any number of reasons. In this (overly) simplified example, we’re going to represent those potentially fail-able web calls as Options, and the return type as a Boolean which indicates if we should allow the order to be placed. A real-world service would obviously send back much more data, but’s let’s keep it simple for now.

Allowing for a few artistic liberties, the code for our Order Placement service might look something like this:

Let’s walk through what’s happening here. In this example, each of the requests we sent to our downstream services returned successfully (coming back as Some instead of None to represent that there were no network problems) and all three of our services gave the go ahead to place the order (since the responses were Some(true) instead of Some(false)). In this example, this means that the customer’s preferred bottle of wine was in stock, that they had sufficient money to pay for it, and that they were the legal drinking age and live in a state where we’re licensed to do business.

Some(true) behaves the same in a for-comprehension as List(true), so our for-comprehension evaluated as Some(true) X Some(true) X Some(true) = (true, true, true). That gets piped into our yield expression to become Some(true). If we calculate the size of our result set using the same method that we used for Sequences, we get 1 X 1 X 1 = 1, meaning that our result set has one element, and that element is the value true.

Next, let’s consider what would happen if one of our services had given a “no go” response instead of a “go” response. Let’s say that the customer was over their credit limit and we couldn’t place a preauthorization hold on the card.

In this case, we would be evaluating Some(true) X Some(false) X Some(true) since all services returned responses but one of the responses was “do not place this order”. This would evaluate to (true, false, true), which after being piped through our yield expression would evaluate to false. Calculating the size here, we have 1 X 1 X 1 = 1, meaning that our result set still had one element, but it was false in this case.

What would have happened if our Compliance service was unhealthy and wasn’t able to send a response back? In this case, we would be evaluating Some(true) X Some(true) X None, which is very similar to List(true) X List(true) X List(). If we compute size of this result, we get 1 X 1 X 0 = 0, meaning that that Some(true) X Some(true) X None = None. The result will not get piped through the yield expression in this case because there is nothing to send to the yield expression.

Here’s the key takeaway: In a for-comprehension, None behaves like 0 does in a multiplication. Anything multiplied by 0 is still 0, and any for-comprehension taking in None will itself result in None. In other words, if all inputs to our for-comprehension are defined (i.e. are Some instead of None), the result will be to unwrap each of the Some’s and to pass the values to the yield clause. If any of the inputs are undefined (i.e. None) then the result is None. This behavior is essentially the same as a Boolean AND operation, which, in my next tie-in to the “for-comprehension is multiplication” metaphor, is also known as Boolean Multiplication.

Now let’s talk about Eithers

There’s a operational problem with our Options example.

Let’s say that you’re the on-call engineer supporting the Orders service and you get an alert saying that orders aren’t being created when they should be. Sure enough, you check the logs and can see that the above for-comprehension is only outputting None`. What’s the cause of our problem though? In our example, we’re representing all possible failure cases, from 4XX to 5XX to timeouts, with the same value: None. To make things even worse, we can’t tell from the output of the for-comprehension alone which of the inputs is causing the problem.

The way that we handle this problem is to use a sister-type to the Option: the Either. Whereas Options are used to represent the presence or absence of a value, Eithers are used to represent one of two possible values. These two alternatives are called Left[A] and Right[B]. In this case, we could use an Either to represent either the data that we requested from our downstream services or an error message explaining why the client wasn’t able to get it for us. This would look like Either[String, Boolean], where the String is an error message and the Boolean is the go/no-go decision from Inventory, Payments, or Compliance.

(If you want more help getting comfortable with Eithers, check out the official docs and Alvin Alexander’s guide here.)

In for-comprehensions, Eithers behave very similarly to Options, with Right[A] acting like Some[A] and Left[B] acting like None, even though it has a value. If we re-wrote our wine company example using Eithers instead of Options, it might look something like this:

Try out different values for each service response to get the hang of it.

Just like in the Options example, Right(true) X Right(true) X Right(true) evaluates to (true, true, true), which gets piped into our yield clause and becomes Right(true). Similarly, Right(true) X Right(false) X Right(true) becomes (true, false, true), which our yield turns into Right(false).

The moment that we add a Left(“Error Message”) to our for-comprehension, it’s just like multiplying by 0, and our expression collapses down to Left(“Error Message”). Even though it contains an error message value, it behaves just like None or List().

Astute observers may now be asking, “What happens if multiple services error out at the same time?”, or more specifically, “What happens if our for-comprehension takes in two different Lefts at the same time?” This is an excellent question. If we try out a couple examples in the last code box, we see that:

  • Left("400 BAD REQUEST") X Left("500 INTERNAL SERVER ERROR") X Left("EXCEEDED TIMEOUT LIMIT") evaluates to Left("400 BAD REQUEST")

  • Left("500 INTERNAL SERVER ERROR") X Right(true) X Left("400 BAD REQUEST") evaluates to Left("500 INTERNAL SERVER ERROR")

  • Right(true) X Left("EXCEEDED TIMEOUT LIMIT") X Left("400 BAD REQUEST") evaluates to Left("EXCEEDED TIMEOUT LIMIT")

In general, a for-comprehension of Eithers will evaluate to the first Left that it comes across. This is because even though two Left’s may appear very differently to us, from the perspective of the for-comprehension, they’re all equivalent (just like two Nones) and it could reasonably return any of them. In practice though, it ends of short-circuiting and returning whatever comes up Left first.

Bonus Round: Let’s talk about Trys

Another type that behaves similarly to Eithers and Options in a for-comprehension is the Try.

Trys are used in Scala to handle exceptions in a more functional way than regular try-catch blocks. Like Eithers and Options, Trys are also extended by two child classes: Success[A] and Failure[A]. A Try is used to wrap a block of code which may potentially throw an exception. If an exception is thrown, it results in a Failure which contains the exception. If the code block runs without error, the result is a Success which contains the result of the code block:

(Here are the Scala docs for Trys and a Rock the JVM guide if you want to review them more before moving on.)

As you have probably guessed by now, we can also use for-comprehensions to ‘multiply’ Trys, with Successes behaving like Rights, Somes, and non-empty Lists and Failures behaving like Lefts, Nones, and Nils. Let’s say that we need to validate the input parameters of a web-request. They come in as Strings but we need to need to verify that they’re valid Integers. (We would obviously want to perform more thorough and complex validations in a real world application, but bear with me):

As before, try out different values in the code block to see how they behave.

Let’s Wrap Up

(Good job for making it this far 🙂. This post ended up being a bit longer than I anticipated.)

To wrap up, our major takeaway is that for-comprehensions represent a multiplication-like operation for monadic types.

  • For collection monads like Sequences and Sets, this means calculating the Cartesian Product.

  • For two-state monads like Options and Eithers, this means doing a Boolean AND. If all inputs are in the ‘happy path state’ (e.g. Some, Right, Success), the values are unwrapped and passed to the yield clause. If any values are ‘unhappy path’ values (e.g. None, Left, or Failure), the expression collapses to the first ‘unhappy’ input value.

I hope this guide to for-comprehensions has been helpful.

Happy Scala-ing!

Previous
Previous

The Memoize Function