Length-Indexed Lists

module Tutorial.Dependent.LengthIndexedLists

%default total

The answer to the issues described above is of course: Dependent types. Before we proceed to our example, first consider how Idris recursively defines the natural numbers (here affixed with apostrophes to avoid introducing a conflict with the actual definition of Nat, which you can find here for reference)):

data Nat' : Type where
  Z' : Nat'
  S' : Nat' -> Nat'

In this scheme, 0 is represented by Z, 1 is represented by S Z, 2 is represented by S (S Z), and so on. Idris does this automatically so if you enter Z or S Z into the REPL, it will return 0 or 1. Note that the only function inherently available to act on a value of type Nat is our data constructor S, which represents the successor function, i.e. adding 1.

Also note that in Idris, every Nat can be represented as either a Z or an S n where n is another Nat. Much as every List a can be represented as either a Nil or an x :: xs (where x is an a and xs is a List a), this informs our pattern matching when solving problems.

Now we can consider the textbook introductory example of dependent types, the vector, which is a list indexed by its length:

public export
data Vect : (len : Nat) -> (a : Type) -> Type where
  Nil  : Vect 0 a
  (::) : (x : a) -> (xs : Vect n a) -> Vect (S n) a

Before we move on, please compare this with the implementation of Seq in the section about algebraic data types. The constructors are exactly the same: Nil and (::). But there is an important difference: Vect, unlike Seq or List, is not a function from Type to Type, it is a function from Nat to Type to Type. Go ahead! Open the REPL and verify this! The Nat argument (also called an index) represents the length of the vector here. Nil has type Vect 0 a: A vector of length zero. Cons has type a -> Vect n a -> Vect (S n) a: It is exactly one element longer (S n) than its second argument, which is of length n.

Let's experiment with this idea to gain a better understanding. There is only one way to come up with a vector of length zero:

ex1 : Vect 0 Integer
ex1 = Nil

The following, on the other hand, leads to a type error (a pretty complicated one, actually):

failing "Mismatch between: S ?n and 0."
  ex2 : Vect 0 Integer
  ex2 = [12]

The problem: [12] gets desugared to 12 :: Nil, but this has the wrong type! Since Nil has type Vect 0 Integer here, 12 :: Nil has type Vect (S 0) Integer, which is identical to Vect 1 Integer. Idris verifies, at compile time, that our vector is of the correct length!

ex3 : Vect 1 Integer
ex3 = [12]

So, we found a way to encode the length of a list-like data structure in its type, and it is a type error if the number of elements in a vector does not agree with then length given in its type. We will shortly see several use cases, where this additional piece of information allows us to be more precise in the types and rule out additional programming mistakes. But first, we need to quickly clarify some terminology.

Type Indices versus Type Parameters

Vect is not only a generic type, parameterized over the type of elements it holds, it is actually a family of types, each of them associated with a natural number representing it's length. We also say, the type family Vect is indexed by its length.

The difference between a type parameter and an index is, that the latter can and does change across data constructors, while the former is the same for all data constructors. Or, put differently, we can learn about the value of an index by pattern matching on a value of the type family, while this is not possible with a type parameter.

Let's demonstrate this with a contrived example:

data Indexed : Nat -> Type where
  I0 : Indexed 0
  I3 : Indexed 3
  I4 : String -> Indexed 4

Here, Indexed is indexed over its Nat argument, as values of the index changes across constructors (I chose some arbitrary value for each constructor), and we can learn about these values by pattern matching on Indexed values. We can use this, for instance, to create a Vect of the same length as the index of Indexed:

fromIndexed : Indexed n -> a -> Vect n a

Go ahead, and try implementing this yourself! Work with holes, pattern match on the Indexed argument, and learn about the expected output type in each case by inspecting the holes and their context.

Here is my implementation:

fromIndexed I0     va = []
fromIndexed I3     va = [va, va, va]
fromIndexed (I4 _) va = [va, va, va, va]

As you can see, by pattern matching on the value of the Indexed n argument, we learned about the value of the n index itself, which was necessary to return a Vect of the correct length.

Length-Preserving `map`

Function bogusMapList behaved unexpectedly, because it always returned the empty list. With Vect, we need to be true to the types here. If we map over a Vect, the argument and output type contain a length index, and these length indices will tell us exactly, if and how the lengths of our vectors are modified:

map3_1 : (a -> b) -> Vect 3 a -> Vect 1 b
map3_1 f [_,y,_] = [f y]

map5_0 : (a -> b) -> Vect 5 a -> Vect 0 b
map5_0 f _ = []

map5_10 : (a -> b) -> Vect 5 a -> Vect 10 b
map5_10 f [u,v,w,x,y] = [f u, f u, f v, f v, f w, f w, f x, f x, f y, f y]

While these examples are quite interesting, they are not really useful, are they? That's because they are too specialized. We'd like to have a general function for mapping vectors of any length. Instead of using concrete lengths in type signatures, we can also use variables as already seen in the definition of Vect. This allows us to declare the general case:

mapVect' : (a -> b) -> Vect n a -> Vect n b

This type describes a length-preserving map. It is actually more instructive (but not necessary) to include the implicit arguments as well:

mapVect : {0 a,b : _} -> {0 n : Nat} -> (a -> b) -> Vect n a -> Vect n b

We ignore the two type parameters a, and b, as these just describe a generic function (note, however, that we can group arguments of the same type and quantity in a single pair of curly braces; this is optional, but it sometimes helps making type signatures a bit shorter). The implicit argument of type Nat, however, tells us that the input and output Vect are of the same length. It is a type error to not uphold to this contract. When implementing mapVect, it is very instructive to follow along and use some holes. In order to get any information about the length of the Vect argument, we need to pattern match on it:

mapVect _ Nil       = ?impl_0
mapVect f (x :: xs) = ?impl_1

At the REPL, we learn the following:

Tutorial.Dependent> :t impl_0
 0 a : Type
 0 b : Type
 0 n : Nat
------------------------------
impl_0 : Vect 0 b


Tutorial.Dependent> :t impl_1
 0 a : Type
 0 b : Type
   x : a
   xs : Vect n a
   f : a -> b
 0 n : Nat
------------------------------
impl_1 : Vect (S n) b

The first hole, impl_0 is of type Vect 0 b. There is only one such value, as discussed above:

mapVect _ Nil       = Nil

The second case is again more interesting. We note, that xs is of type Vect n a, for an arbitrary length n (given as an erased argument), while the result is of type Vect (S n) b. So, the result has to be one element longer than xs. Luckily, we already have a value of type a (bound to variable x) and a function from a to b (bound to variable f), so we can apply f to x and prepend the result to a yet unknown remainder:

mapVect f (x :: xs) = f x :: ?rest

Let's inspect the new hole at the REPL:

Tutorial.Dependent> :t rest
 0 a : Type
 0 b : Type
   x : a
   xs : Vect n a
   f : a -> b
 0 n : Nat
------------------------------
rest : Vect n b

Now, we have a Vect n a and need a Vect n b, without knowing anything else about n. We could learn more about n by pattern matching further on xs, but this would quickly lead us down a rabbit hole, since after such a pattern match, we'd end up with another Nil case and another cons case, with a new tail of unknown length. Instead, we can invoke mapVect recursively to convert the remainder (xs) to a Vect n b. The type checker guarantees, that the lengths of xs and mapVect f xs are the same, so the whole expression type checks and we are done:

mapVect f (x :: xs) = f x :: mapVect f xs

Zipping Vectors

Let us now have a look at bogusZipList: We'd like to pairwise merge two lists holding elements of (possibly) distinct types through a given binary function. As discussed above, the most reasonable thing to do is to expect the two lists as well as the result to be of equal length. With Vect, this can be expressed and implemented as follows:

export
zipWith : (a -> b -> c) -> Vect n a -> Vect n b -> Vect n c
zipWith f []        []         = Nil
zipWith f (x :: xs) (y :: ys)  = f x y :: zipWith f xs ys

Now, here is an interesting thing: The totality checker (activated throughout this source file due to the initial %default total pragma) accepts the above implementation as being total, although it is missing two more cases. This works, because Idris can figure out on its own, that the other two cases are impossible. From the pattern match on the first Vect argument, Idris learns whether n is zero or the successor of another natural number. But from this it can derive, whether the second vector, being also of length n, is a Nil or a cons. Still, it can be informative to add the impossible cases explicitly. We can use keyword impossible to do so:

zipWith _ [] (_ :: _) impossible
zipWith _ (_ :: _) [] impossible

It is - of course - a type error to annotate a case in a pattern match with impossible, if Idris cannot verify that this case is indeed impossible. We will learn in a later section what to do, when we think we are right about an impossible case and Idris is not.

Let's give zipWith a spin at the REPL:

Tutorial.Dependent> zipWith (*) [1,2,3] [10,20,30]
[10, 40, 90]
Tutorial.Dependent> zipWith (\x,y => x ++ ": " ++ show y) ["The answer"] [42]
["The answer: 42"]
Tutorial.Dependent> zipWith (*) [1,2,3] [10,20]
... Nasty type error ...

Simplifying Type Errors

It is amazing to experience the amount of work Idris can do for us and the amount of things it can infer on its own when things go well. When things don't go well, however, the error messages we get from Idris can be quite long and hard to understand, especially for programmers new to the language. For instance, the error message in the last REPL example above was pretty long, listing different things Idris tried to do together with the reason why each of them failed.

If this happens, it often means that a combination of a type error and an ambiguity resulting from overloaded function names is at work. In the example above, the two vectors are of distinct length, which leads to a type error if we interpret the list literals as vectors. However, list literals are overloaded to work with all data types with constructors Nil and (::), so Idris will now try other data constructors than those of Vect (the ones of List and Stream from the Prelude in this case), each of which will again fail with a type error since zipWith expects arguments of type Vect, and neither List nor Stream will work.

If this happens, prefixing overloaded function names with their namespaces can often simplify things, as Idris no longer needs to disambiguate these functions:

Tutorial.Dependent> zipWith (*) (Dependent.(::) 1 Dependent.Nil) Dependent.Nil
Error: When unifying:
    Vect 0 ?c
and:
    Vect 1 ?c
Mismatch between: 0 and 1.

Here, the message is much clearer: Idris can't unify the lengths of the two vectors. Unification means: Idris tries to at compile time convert two expressions to the same normal form. If this succeeds, the two expressions are considered to be equivalent, if it doesn't, Idris fails with a unification error.

As an alternative to prefixing overloaded functions with their namespace, we can use the to help with type inference:

Tutorial.Dependent> zipWith (*) (the (Vect 3 _) [1,2,3]) (the (Vect 2 _) [10,20])
Error: When unifying:
    Vect 2 ?c
and:
    Vect 3 ?c
Mismatch between: 0 and 1.

It is interesting to note, that the error above is not "Mismatch between: 2 and 3" but "Mismatch between: 0 and 1" instead. Here's what's going on: Idris tries to unify integer literals 2 and 3, which are first converted to the corresponding Nat values S (S Z) and S (S (S Z)), respectively. The two patterns match until we arrive at Z vs S Z, corresponding to values 0 and 1, which is the discrepancy reported in the error message.

Creating Vectors

So far, we were able to learn something about the lengths of vectors by pattern matching on them. In the Nil case, it was clear that the length is 0, while in the cons case the length was the successor of another natural number. This is not possible when we want to create a new vector:

failing "Mismatch between: S ?n and n."
  fill : a -> Vect n a

You will have a hard time implementing fill. The following, for instance, leads to a type error:

  fill va = [va,va]

The problem is, that the callers of our function decide about the length of the resulting vector. The full type of fill is actually the following:

fill' : {0 a : Type} -> {0 n : Nat} -> a -> Vect n a

You can read this type as follows: For every type a and for every natural number n (about which I know nothing at runtime, since it has quantity zero), given a value of type a, I'll give you a vector holding exactly n elements of type a. This is like saying: "Think about a natural number n, and I'll give you n apples without you telling me the value of n". Idris is powerful, but it is not a clairvoyant.

In order to implement fill, we need to know what n actually is: We need to pass n as an explicit, unerased argument, which will allow us to pattern match on it and decide - based on this pattern match - which constructors of Vect to use:

export
replicate : (n : Nat) -> a -> Vect n a

Now, replicate is a dependent function type: The output type depends on the value of one of the arguments. It is straight forward to implement replicate by pattern matching on n:

replicate 0     _  = []
replicate (S k) va = va :: replicate k va

This is a pattern that comes up often when working with indexed types: We can learn about the values of the indices by pattern matching on the values of the type family. However, in order to return a value of the type family from a function, we need to either know the values of the indices at compile time (see constants ex1 or ex3, for instance), or we need to have access to the values of the indices at runtime, in which case we can pattern match on them and learn from this, which constructor(s) of the type family to use.

Keyboard shortcuts