Do Blocks, Desugared

module Tutorial.IO.DoUnsugared

import Data.List1
import Data.String
import Data.Vect

import Tutorial.IO.PureSideEffects

%default total

Here's an important piece of information: There is nothing special about do blocks. They are just syntactic sugar, which is converted to a sequence of operator applications. With syntactic sugar, we mean syntax in a programming language that makes it easier to express certain things in that language without making the language itself any more powerful or expressive. Here, it means you could write all the IO programs without using do notation, but the code you'll write will sometimes be harder to read, so do blocks provide nicer syntax for these occasions.

Consider the following example program:

sugared1 : IO ()
sugared1 = do
  str1 <- getLine
  str2 <- getLine
  str3 <- getLine
  putStrLn (str1 ++ str2 ++ str3)

The compiler will convert this to the following program before disambiguating function names and type checking:

desugared1 : IO ()
desugared1 =
  getLine >>= (\str1 =>
    getLine >>= (\str2 =>
      getLine >>= (\str3 =>
        putStrLn (str1 ++ str2 ++ str3)
      )
    )
  )

There is a new operator ((>>=)) called bind in the implementation of desugared1. If you look at its type at the REPL, you'll see the following:

Main> :t (>>=)
Prelude.>>= : Monad m => m a -> (a -> m b) -> m b

This is a constrained function requiring an interface called Monad. We will talk about Monad and some of its friends in the next chapter. Specialized to IO, bind has the following type:

Main> :t (>>=) {m = IO}
>>= : IO a -> (a -> IO b) -> IO b

This describes a sequencing of IO actions. Upon execution, the first IO action is being run and its result is being passed as an argument to the function generating the second IO action, which is then also being executed.

You might remember, that you already implemented something similar in an earlier exercise: In Algebraic Data Types, you implemented bind for Maybe and Either e. We will learn in the next chapter, that Maybe and Either e too come with an implementation of Monad. For now, suffice to say that Monad allows us to run computations with some kind of effect in sequence by passing the result of the first computation to the function returning the second computation. In desugared1 you can see, how we first perform an IO action and use its result to compute the next IO action and so on. The code is somewhat hard to read, since we use several layers of nested anonymous function, that's why in such cases, do blocks are a nice alternative to express the same functionality.

Since do block are always desugared to sequences of applied bind operators, we can use them to chain any monadic computation. For instance, we can rewrite function eval by using a do block like so:

evalDo : String -> Either Error Integer
evalDo s = case forget $ split isSpace s of
  [x,y,z] => do
    v1 <- readInteger x
    op <- readOperator y
    v2 <- readInteger z
    Right $ op v1 v2
  _       => Left (ParseError s)

Don't worry, if this doesn't make too much sense yet. We will see many more examples, and you'll get the hang of this soon enough. The important thing to remember is how do blocks are always converted to sequences of bind operators as shown in desugared1.

Binding Unit

Remember our implementation of friendlyReadHello? Here it is again:

friendlyReadHello' : IO ()
friendlyReadHello' = do
  _ <- putStrLn "Please enter your name."
  readHello

The underscore in there is a bit ugly and unnecessary. In fact, a common use case is to just chain effectful computations with result type Unit (()), merely for the side effects they perform. For instance, we could repeat friendlyReadHello three times, like so:

friendly3 : IO ()
friendly3 = do
  _ <- friendlyReadHello
  _ <- friendlyReadHello
  friendlyReadHello

This is such a common thing to do, that Idris allows us to drop the bound underscores altogether:

friendly4 : IO ()
friendly4 = do
  friendlyReadHello
  friendlyReadHello
  friendlyReadHello
  friendlyReadHello

Note, however, that the above gets desugared slightly differently:

friendly4Desugared : IO ()
friendly4Desugared =
  friendlyReadHello >>
  friendlyReadHello >>
  friendlyReadHello >>
  friendlyReadHello

Operator (>>) has the following type:

Main> :t (>>)
Prelude.>> : Monad m => m () -> Lazy (m b) -> m b

Note the Lazy keyword in the type signature. This means, that the wrapped argument will be lazily evaluated. This makes sense in many occasions. For instance, if the Monad in question is Maybe the result will be Nothing if the first argument is Nothing, in which case there is no need to even evaluate the second argument.

Do, Overloaded

Because Idris supports function and operator overloading, we can write custom bind operators, which allows us to use do notation for types without an implementation of Monad. For instance, here is a custom implementation of (>>=) for sequencing computations returning vectors. Every value in the first vector (of length m) will be converted to a vector of length n, and the results will be concatenated leading to a vector of length m * n:

flatten : Vect m (Vect n a) -> Vect (m * n) a
flatten []        = []
flatten (x :: xs) = x ++ flatten xs

(>>=) : Vect m a -> (a -> Vect n b) -> Vect (m * n) b
as >>= f = flatten (map f as)

It is not possible to write an implementation of Monad, which encapsulates this behavior, as the types wouldn't match: Monadic bind specialized to Vect has type Vect k a -> (a -> Vect k b) -> Vect k b. As you see, the sizes of all three occurrences of Vect have to be the same, which is not what we expressed in our custom version of bind. Here is an example to see this in action:

modString : String -> Vect 4 String
modString s = [s, reverse s, toUpper s, toLower s]

testDo : Vect 24 String
testDo = DoUnsugared.do
  s1 <- ["Hello", "World"]
  s2 <- [1, 2, 3]
  modString (s1 ++ show s2)

Try to figure out how testDo works by desugaring it manually and then comparing its result with what you expected at the REPL. Note, how we helped Idris disambiguate, which version of the bind operator to use by prefixing the do keyword with part of the operator's namespace. In this case, this wasn't strictly necessary, although Vect k does have an implementation of Monad, but it is still good to know that it is possible to help the compiler with disambiguating do blocks.

Of course, we can (and should!) overload (>>) in the same manner as (>>=), if we want to overload the behavior of do blocks.

Modules and Namespaces

Every data type, function, or operator can be unambiguously identified by prefixing it with its namespace. A function's namespace typically is the same as the module where it was defined. For instance, the fully qualified name of function eval would be Tutorial.IO.eval. Function and operator names must be unique in their namespace.

As we already learned, Idris can often disambiguate between functions with the same name but defined in different namespaces based on the types involved. If this is not possible, we can help the compiler by prefixing the function or operator name with a suffix of the full namespace. Let's demonstrate this at the REPL:

Tutorial.IO> :t (>>=)
Prelude.>>= : Monad m => m a -> (a -> m b) -> m b
Tutorial.IO.>>= : Vect m a -> (a -> Vect n b) -> Vect (m * n) b

As you can see, if we load this module in a REPL session and inspect the type of (>>=), we get two results as two operators with this name are in scope. If we only want the REPL to print the type of our custom bind operator, is is sufficient to prefix it with IO, although we could also prefix it with its full namespace:

Tutorial.IO> :t IO.(>>=)
Tutorial.IO.>>= : Vect m a -> (a -> Vect n b) -> Vect (m * n) b
Tutorial.IO> :t Tutorial.IO.(>>=)
Tutorial.IO.>>= : Vect m a -> (a -> Vect n b) -> Vect (m * n) b

Since function names must be unique in their namespace and we still may want to define two overloaded versions of a function in an Idris module, Idris makes it possible to add additional namespaces to modules. For instance, in order to define another function called eval, we need to add it to its own namespace (note, that all definitions in a namespace must be indented by the same amount of whitespace):

namespace Foo
  export
  eval : Nat -> Nat -> Nat
  eval = (*)

-- prefixing `eval` with its namespace is not strictly necessary here
testFooEval : Nat
testFooEval = Foo.eval 12 100

Now, here is an important thing: For functions and data types to be accessible from outside their namespace or module, they need to be exported by annotating them with the export or public export keywords.

The difference between export and public export is the following: A function annotated with export exports its type and can be called from other namespaces. A data type annotated with export exports its type constructor but not its data constructors. A function annotated with public export also exports its implementation. This is necessary to use the function in compile-time computations. A data type annotated with public export exports its data constructors as well.

In general, consider annotating data types with public export, since otherwise you will not be able to create values of these types or deconstruct them in pattern matches. Likewise, unless you plan to use your functions in compile-time computations, annotate them with export.

Bind, with a Bang

Sometimes, even do blocks are too noisy to express a combination of effectful computations. In this case, we can prefix the effectful parts with an exclamation mark (wrapping them in parentheses if they contain additional whitespace), while leaving pure expressions unmodified:

getHello : IO ()
getHello = putStrLn $ "Hello " ++ !getLine ++ "!"

The above gets desugared to the following do block:

getHello' : IO ()
getHello' = do
  s <- getLine
  putStrLn $ "Hello " ++ s ++ "!"

Here is another example:

bangExpr : String -> String -> String -> Maybe Integer
bangExpr s1 s2 s3 =
  Just $ !(parseInteger s1) + !(parseInteger s2) * !(parseInteger s3)

And here is the desugared do block:

bangExpr' : String -> String -> String -> Maybe Integer
bangExpr' s1 s2 s3 = do
  x1 <- parseInteger s1
  x2 <- parseInteger s2
  x3 <- parseInteger s3
  Just $ x1 + x2 * x3

Please remember the following: Syntactic sugar has been introduced to make code more readable or more convenient to write. If it is abused just to show how clever you are, you make things harder for other people (including your future self!) reading and trying to understand your code.

Keyboard shortcuts

Idris 2 Tutorial

Do Blocks, Desugared

Binding Unit

Do, Overloaded

Modules and Namespaces

Bind, with a Bang