Mon­ads are one of the hot­test top­ics in func­tional pro­gram­ming, and argu­ably sim­plify the con­struc­tion of a whole class of sys­tems. Which makes it sur­pris­ing that they’re so opaque and hard to under­stand to people who’s main exper­i­ence is in imper­at­ive or object-oriented languages.

There are a lot of explan­a­tions of, and tutori­als on, mon­ads, but most of them seem to take one of two per­spect­ives: either start with a con­crete example, usu­ally in I/O hand­ling, and work back, or start from the abstract math­em­at­ical for­mu­la­tion and work for­wards. This sounds reas­on­able, but appar­ently neither works well in prac­tice — at least, judging from the com­ments one receives from  intel­li­gent and able pro­gram­mers who hap­pen not to have an extens­ive func­tional pro­gram­ming or abstract math­em­at­ical back­ground. Such a core concept shouldn’t be hard to explain, though, so I thought I’d try a dif­fer­ent tack: mon­ads from the per­spect­ive of lan­guage design.

In Pas­cal, C or Java, state­ments are sep­ar­ated (or ter­min­ated) by semi­colons. This is usu­ally regarded as a piece of syn­tax, but let’s look at it slightly dif­fer­ently. Think of the semi­colon as an oper­ator that takes two pro­gram frag­ments and com­bines them together to form a big­ger frag­ment. For example:


int x = 4;
int y = x * 3;
printf("%d", y);

We have three pro­gram frag­ments. The semi­colon oper­ator at the end of the first line takes the frag­ment on its left-hand side and com­bines it with the frag­ment on its right-hand side. Essen­tially the semi­colon defines how the RHS is affected by the code on the LHS: in this case the RHS code is eval­u­ated in an envir­on­ment that includes a bind­ing of vari­able x, effect­ively resolv­ing what is oth­er­wise a free vari­able. Sim­il­arly, the semi­colon at the end of the second line causes the third line to be eval­u­ated in an envir­on­ment that include y. The mean­ing of the semi­colon is hard-wired into the lan­guage (C, in this case) and defines how code frag­ments are sequenced and their effects propagated.

Now from this per­spect­ive, a monad is a pro­gram­mable semi­colon. A monad allows the applic­a­tion pro­gram­mer, rather than the lan­guage designer, to determ­ine how a sequence of code is put together, and how one frag­ment affects those that come later.

Let’s revert to Haskell. In a slightly sim­pli­fied form, a monad is a type class with the fol­low­ing signature:


class Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b

So a monad is a con­struc­ted type wrapping-up some under­ly­ing ele­ment type that defines two func­tions, return and (»=). The first func­tion injects an ele­ment of the ele­ment type into the mon­adic type. The second takes an ele­ment of the mon­adic type and a func­tion that maps an ele­ment that monad’s ele­ment type to some other mon­adic type, and returns an ele­ment of this second mon­adic type.

The simplest example of a monad is Haskell’s Maybe type, which rep­res­ents either a value of some under­ly­ing ele­ment type or the absence of a value:


data Maybe a = Just a
| Nothing

Maybe is an instance of Monad, simply by vir­tue of defin­ing the two func­tions that the type class needs:


instance Monad Maybe where
return a = Just a
Just a >>= f = f a
Nothing >>= _ = Nothing

return injects an ele­ment of a into an ele­ment of Maybe a. (»=) takes an ele­ment of Maybe a and a func­tion from a to Maybe b. If the ele­ment of Maybe a it’s passed is of the form Just a, it applies the func­tion to the ele­ment value a. If, how­ever, the ele­ment is Noth­ing, it returns Noth­ing without eval­u­at­ing the function.

It’s hard to see what this type has to do with sequen­cing, but bear with me. Haskell provides a do con­struc­tion which gives rise to code like the following:


do v <- if b == 0 then Nothing
else Just (a / b)
return 26 / v

Intu­it­ively this looks like a sequence of code frag­ments, so we might infer that the con­di­tional executes first and binds a value to v, and then the next line com­putes with that value — which is in fact what hap­pens, but with a twist. The way in which the frag­ments relate is not pre-defined by Haskell. Instead, the rela­tion­ship between the frag­ments is determ­ined by the val­ues of which monad the frag­ments manip­u­late (usu­ally expressed as which monad the code executes in). The do block is just syn­tactic sugar for a styl­ised use of the two monad func­tions. The example above expands to:


(if b == 0 then Nothing else Just (a / b)) >>= (\v -> return (26 / v))

So the do block is syn­tax that expands into user-defined code, depend­ing on the monad that the expres­sions within it use. In this case, we execute the first expres­sion and then com­pose it with the func­tion on the right-hand side of the (»=) oper­ator. The defin­i­tion says that, if the left-hand side value is Just a, the res­ult is that we call the RHS passing the ele­ment value; if the LHS is Noth­ing, we return Noth­ing imme­di­ately. The res­ult is that, if any code frag­ment in the com­pu­ta­tion returns Noth­ing, then the entire com­pu­ta­tion returns Noth­ing, since all sub­sequent com­pos­i­tions will imme­di­ately short-circuit: the Maybe type acts like a simple excep­tion that escapes from the com­pu­ta­tion imme­di­ately Noth­ing is encountered. So the monad struc­ture intro­duces what’s nor­mally regarded as a con­trol con­struct, entirely within the lan­guage. It’s fairly easy to see that we could provide “real” excep­tions by hanging an error code off the fail­ure value. It’s also fairly easy to see that the monad sequences the code frag­ments and aborts when one of the “fails”. In C we can think of the same func­tion being provided by the semi­colon “oper­ator”, but with the cru­cial dif­fer­ence that it is the lan­guage, and not the pro­gram­mer, that decides what hap­pens, one and for all. Mon­ads reify the con­trol of sequen­cing into the language.

To see how this can be made more gen­eral, let’s think about another monad: the list type con­structor. Again, to make lists into mon­ads we need to define return and (»=) with appro­pri­ate types. The obvi­ous injec­tion is to turn a singleton into a list:


instance Monad [] where
return a = [a]

The defin­i­tion of (»=) is slightly more inter­est­ing: which func­tion of type [a] -> (a -> [b]) -> [b] is appro­pri­ate? One could choose to select an ele­ment of the [a] list at ran­dom and apply the func­tion to it, giv­ing a list [b] — a sort of non-deterministic applic­a­tion of a func­tion to a set of pos­sible argu­ments. (Actu­ally this might be inter­est­ing in the con­text of pro­gram­ming with uncer­tainty, but that’s another topic.) Another defin­i­tion — and the one that Haskell actu­ally chooses — is to apply the func­tion to all the ele­ments of [a], tak­ing each a to a list [b], and then con­cat­en­at­ing the res­ult­ing lists together to form one big list:


l >>= f = concat (map f l)

What hap­pens to the code frag­ments in a do block now? The monad threads them together using the two basic func­tions. So if we have code such as:


do x <- [1..10]
y <- [20..30]
return (x, y)

What hap­pens? The first and second frag­ments clearly define lists, but what about the third, which seems to define a pair? To see what hap­pens, we need to con­sider all the frag­ments together. Remem­ber, each frag­ment is com­bined with the next by apply­ing con­cat (map f l). If we expand this out, we get:


concat (map (\x -> concat (map (\y -> return (x, y)) [20..30])) [1..10])

So to sum­mar­ise, Haskell provides a do block syn­tax that expands to a nes­ted sequence of mon­adic func­tion calls. The actual func­tions used depend on the mon­adic type in the do block, so the pro­gram­mer can define how the code frag­ments relate to one another. Com­mon mon­ads include some simple types, but also I/O oper­a­tions and state bind­ings, allow­ing Haskell to per­form oper­a­tions that are typ­ic­ally regarded as imper­at­ive without los­ing its lazi­ness. The Haskell tutorial explains the I/O syn­tax.

What can we say about mon­ads from the per­spect­ive of lan­guage design? Essen­tially they reify sequen­cing, in a func­tional style. They only work as seam­lessly as they do because of Haskell’s flex­ible type sys­tem (allow­ing the defin­i­tion of new mon­ads), and also because of the do syn­tax: without the syn­tactic sugar, most mon­adic code is incom­pre­hens­ible. The real power is that they allow some very com­plex func­tion­al­ity to be abstrac­ted into func­tions and re-used. Con­sider the Maybe code we used earlier: without the “escape” provided by the Maybe monad, we’d have to guard each state­ment with a con­di­tional to make sure there wasn’t a Noth­ing returned at any point. This quickly gets tire­some and error-prone: the monad encap­su­lates and enforces the desired beha­viour. When you real­ise that one can also com­pose mon­ads using monad trans­formers, lay­er­ing mon­adic beha­viours on top of each other (albeit with some con­tor­tions to keep the type sys­tem happy), it becomes clear that this is a very power­ful capability.

I think one can also eas­ily identify a few draw­backs, though. One that imme­di­ately springs to mind is that mon­ads reify one con­struc­tion, of the many that one might choose. A more gen­eral meta-language, like the use of meta-objects pro­to­cols or aspects, or struc­tured lan­guage and com­piler exten­sions, would allow even more flex­ib­il­ity. A second — per­haps with wider impact — is that one has to be intim­ately famil­iar with the monad being used before one has the slight­est idea what a piece of code will do. The list example above is not obvi­ously a list com­pre­hen­sion, until one recog­nises the “idiom” of the list monad. Thirdly, the choice of mon­adic func­tion defin­i­tions isn’t often canon­ical, so there can be a cer­tain arbit­rar­i­ness to the beha­viour. It’d be inter­est­ing to con­sider gen­er­al­isa­tions of mon­ads and lan­guage con­structs to address these issues, but for the mean­time one can use them to abstract a whole range of func­tion­al­ity in an inter­est­ing way. Good luck!