Older blog entries for johnw (starting at number 76)

Emacs: Pattern Matching with pcase

Emacs: Pattern Matching with pcase

This is a tutorial on how to use the pcase macro in modern flavors of GNU Emacs.

Exact matches

All data fits into some kind of pattern. The most explicit pattern is a description of the data itself. Let’s consider the following value as a running example:

'(1 2 (4 . 5) "Hello")

Explicitly stated, this is a list of four elements, where the first two elements are the integers 1 and 2, the third is a cons consisting of a car of 4 and a cdr of 5, and the fourth is the string "Hello". This states an explicit pattern we can match against using an equality test:

(equal value '(1 2 (4 . 5) "Hello"))

Pattern matches

Where patterns become useful is when we want to generalize a bit. Let’s say we want to do a similar equality test, but we don’t care what the final string’s contents are, only that it’s a string. Even though it’s simply state, this becomes quite difficult using an equality test:

(and (equal (subseq value 0 3) '(1 2 (4 .5)))
     (stringp (nth 3 value)))

What we would prefer is a more direct language for encoding our description of the family of values we’d like to match against. The way we said in English was: the first three elements exactly so, and the last element, any string. This is how we’d phrase that using `pcase’:

(pcase value
  (`(1 2 (4 . 5) ,(pred stringp))
    (message "It matched!")))

Think of pcase as a form of cond, where instead of evaluating each test for non-nil, it compares a series of patterns against the value under consideration (often called the “scrutinee” in the literature). There can be many patterns, and the first one wins, as with cond.

Capturing matches

But pcase can go one step further: Not only can we compare a candidate value against a family of possible values described by their pattern, we can also “capture” sub-values from that pattern for later use. Continuing from the last example, let’s say we want to print the string that match, even though we didn’t care about the contents of the string for the sake of the match:

(pcase value
  (`(1 2 (4 . 5) ,(and (pred stringp) foo))
    (message "It matched, and the string was %s" foo)))

Whenever a naked symbol like foo occurs as a logical pattern (see next section), the part of the value being matched at that position is bound to a local variable of the same name.

Logical and literal patterns

To master pcase, there are two types of patterns you must know: Logical patterns, and literal, or quoted, patterns. Logical patterns describe the kind of data we’d like to match against, and other special actions to take when it matches; and quoted patterns are the “literal” aspect, stating the exact form of a particular match.

Literal patterns are by far the easiest to think about. To match against any atom, string, or list of the same, the corresponding literal pattern is that exact value. So the literal pattern "foo" matches the string "foo", 1 matches the atom 1, etc.

pcase matches against a list of logical patterns, so to use a literal pattern, we must quote it, unless it consists entirely of self-quoting atoms:

(pcase value
  ('sym (message "Matched the symbol `sym'"))
  ((1 2) (message "Matched the list (1 2)")))

Literal patterns may also be introduced using a backquote, in which case commas may be used to place logical patterns within them, in exactly the same way that quoting and anti-quoting works for macros. For example:

(pcase value
  (`(1 2 ,(or 3 4))
   (message "Matched either the list (1 2 3) or (1 2 4)")))

More on logical patterns

There are many special logical patterns. Let’s consider them one by one.

Underscore _

To match against anything whatsoever, no matter its type or value, use underscore. Thus to match against a list containing anything at all at its head, we’d use:

(pcase value
  (`(,_ 1 2)
   (message "Matched a list of anything followed by (1 2)")))

Symbol

When performing a match, if a symbol occurs within a logical pattern, it binds whatever was found at that position to a local symbol of the same name. Some examples will help to make this clearer:

(pcase value
  (`(1 2 ,foo 3)
   (message "Matched 1, 2, something now bound to foo, and 3"))
  (foo
   (message "Match anything at all, and bind it to foo!"))
  (`(,the-car . ,the-cdr))
   (message "Match any cons cell, binding the car and cdr locally"))

The reason for doing this is two-fold: Either to refer to a previous match later in the pattern (where it is compared using eq), or to make use of a matched value within the related code block:

(pcase value
  (`(1 2 ,foo ,foo 3)
   (message "Matched (1 2 %s %s 3)" foo)))

(or PAT ...) and (and PAT ...)

We can express boolean logic within a pattern match using the or and and Patterns:

(pcase value
  (`(1 2 ,(or 3 4)
     ,(and (pred stringp)
           (pred (string> "aaa"))
           (pred (lambda (x) (> (length x) 10)))))
   (message "Matched 1, 2, 3 or 4, and a long string "
            "that is lexically greater than 'aaa'")))

pred predicates

Arbitrary predicates can be applied to matched elements, where the predicate will be passed the object that matched. As in the previous example, lambdas can be used to form arbitrarily complex predicates, with their own logic. See above for examples.

guard expressions

At any point within a match, you may assert that something is true by inserting a guard. This might consult some other variable to confirm the validity of a pattern at a given time, or it might reference a local symbol that was earlier bound by the match itself, as described above:

(pcase value
  (`(1 2 ,foo ,(guard (and (not (numberp foo)) (/= foo 10)))
   (message "Matched 1, 2, anything, and then anything again, "
            "but only if the first anything wasn't the number 10"))))

Note that in this example, the guard occurs at a match position, so even though the guard doesn’t refer to what is being matched, if it passes, then whatever occurs at that position (the fourth element of the list), would be an unnamed successful matched. This is rather bad form, so we can be more explicit about the logic here:

(pcase value
  (`(1 2 ,(and foo (guard (and (not (numberp foo)) (/= foo 10)))) _)
   (message "Matched 1, 2, anything, and then anything again, "
            "but only if the first anything wasn't the number 10"))))

This means the same, but associates the guard with the value it tests, and makes it clear that we don’t care what the fourth element is, only that it exists.

Pattern let bindings

Within a pattern we can match sub-patterns, using a special form of let that has a meaning specific to `pcase’:

(pcase value
  (`(1 2 ,(and foo (let 3 foo)))
   (message "A weird way of matching (1 2 3)")))

This example is a bit contrived, but it allows us to build up complex guard patterns that might match against values captured elsewhere in the surrounding code:

(pcase value1
  (`(1 2 ,foo)
   (pcase value2
     (`(1 2 ,(and (let (or 3 4) foo) bar))
      (message "A nested pcase depends on the results of the first")))))

Here the third value of value2 – which must be a list of exactly three elements, starting with 1 and 2 – is being bound to the local variable bar, but only if foo was a 3 or 4. There are many other ways this logic could be expressed, but this gives you a test of how flexibly you can introduce arbitrary pattern matching of other values within any logical pattern.

pcase-let and pcase-let*

That’s all there is to know about pcase! The other two utilities you might like to use are pcase-let and pcase-let*, which do similar things to their logical pattern counter-part let, but as regular Lisp forms:

(pcase-let ((`(1 2 ,foo) value1)
            (`(3 4 ,bar) value2))
  (message "value1 is a list of (1 2 %s); value2 ends with %s"
           foo bar))

Note that pcase-let does not fail, and always executes the correspond forms unless there is a type error. That is, value1 above is not required to fit the form of the match exactly. Rather, every binding that can paired is bound to its corresponding element, but every binding that cannot is bound to nil:

(pcase-let ((`(1 2 ,foo) '(10)))
  (message "foo = %s" foo))   => prints "foo = nil"

(pcase-let ((`(1 2 ,foo) 10))
  (message "foo = %s" foo))   => Lisp error, 10 is not a list

(pcase-let ((`(1 2 ,foo) '(3 4 10)))
  (message "foo = %s" foo))   => prints "foo = 10"

Thus, pcase-let can be thought of as a more expressive form of destructuring-bind.

The pcase-let* variant, like let*, allows you to reference bound local symbols from prior matches.

(pcase-let* ((`(1 2 ,foo) '(1 2 3))
             (`(3 4 ,bar) (list 3 4 foo)))
  (message "foo = %s, bar = %s" foo bar))  => foo = 3, bar = 3

However, if you name a symbol with same name in a later logical pattern, it is not used as an eq test, but rather shadows that symbol:

(pcase-let* ((`(1 2 ,foo) '(1 2 3))
             (`(3 4 ,foo) '(3 4 5)))
  (message "1 2 %s" foo))

This prints out "1 2 5", rather than the current match.

Syndicated 2016-01-21 00:00:00 from Lost in Technopolis

Simpler conduit library based on monadic folds

Simpler conduit library based on monadic folds

Recently I was playing around with the core types in the conduit library (attempting to change leftovers so you could only unget values you had read), when I stumbled across a formulation of those types that lead to some interesting simplifications.

Before I jump in, let’s review what any effectful streaming library should aim to accomplish. The basics are:

  1. Iterate over values within a structure, or produced by a computation.
  2. Cleanup resources involved in that computation once they are no longer needed.
  3. Allow processing to be composed nicely, forming a “pipeline” from the initial source to a final sink.
  4. It would be nice if any part of the pipeline could decide when to terminate.

What I discovered during my exploration is that all four of these requirements can be captured using simple, monadic folds, like foldM. Here is the type of foldM:

foldM :: Monad m => (a -> b -> m a) -> a -> [b] -> m a

We can obtain a slightly easier function type for our needs by reversing the arguments:

sourceList :: Monad m => [b] -> a -> (a -> b -> m a) -> m a

This says that given a list of elements of type b, sourceList returns a function that knows how to generate a result type a from a starting value by folding over every element of that list. We might trivially sum lists of integers as follows:

sourceList [1..10] 0 $ \acc x -> return $ acc + x

We can abstract our summing function into a sink that works on any source of integers:

sumC :: (Num a, Monad m)
     => (a -> (a -> a -> m a) -> m a) -> m a
sumC await = await 0 $ \acc x -> return $ acc + x

sumC is a higher-order function that takes a fold closure obtained from sourceList [1..10] above. (I call the closure await, although it’s behavior is a lot closer to a folding-variant of the awaitForever function from conduit). await wants a starting state, and a function to fold that state over the incoming elements.

Both of these are regular, higher-order functions, so we can build a pipeline using nothing more than function application:

sumC (sourceList [1..10])

Notice how close this is to the non-streaming version sum (id [1..10]); and if we execute the pipeline using runIdentity, the two are identical.

Adding type synonyms

Since the “fold closure” argument is cumbersome to restate, let’s restate it as a type synonym:

type Source m a r = r -> (r -> a -> m r) -> m r

With this synonym, the example source and sink become:

sourceList :: Monad m => [a] -> Source m a r
sumC :: (Num a, Monad m) => Source m a a -> m a

Another pattern we’ll start noticing pretty shortly is that every “sink” is a fold from a Source down to its result type. We can capture this using another type synonym:

type Sink a m r = Source m a r -> m r

It’s not really necessary, but it advertises to the reader that we’re defining a sink. Likewise, a “conduit” is always a mapping from one source to another where the result type is common:

type Conduit a m b r = Source m a r -> Source m b r

In cases where the result types must differ (for example, the dropC function in simple-conduit), we cannot use these type synonyms, but they are handy in the majority of cases.

With these synonyms, the types of our sources and sinks should start looking familiar to users of the regular conduit library (mapC here is based on conduit-combinators):

sourceList :: Monad m => [a] -> Source m a r
mapC :: Monad m => (a -> b) -> Conduit a m b r
sumC :: (Num a, Monad m) => Sink a m a

Conduit has special operators for connecting sources with sinks, and for mapping sources to sources. We don’t need them, since we’re just applying functions to functions, but we can define them as synonyms easily enough:

infixl 1 $=
($=) :: a -> (a -> b) -> b
($=) = flip ($)

infixr 2 =$
(=$) :: (a -> b) -> (b -> c) -> a -> c
(=$) = flip (.)

infixr 0 $$
($$) :: a -> (a -> b) -> b
($$) = flip ($)

We can now express the pipeline in three different ways:

sumC (mapC (+1) (sourceList [1..10]))

sumC $ mapC (+1) $ sourceList [1..10]

sumC $= mapC (+1) $$ sourceList [1..10]

This will perhaps seem more compelling if we use a file:

mapM_C putStrLn (sourceFile "hello.hs")

This action prints the contents of the given file, doing so in constant space and without employing lazy I/O. It handles opening and closing of the file for us, and deals properly cleanup in the case of exceptions.

Early termination

There is just one detail we haven’t implemented yet, and that is the ability for segments in the pipeline to abort processing early. To encode this, we need some short-circuiting behavior, which sounds like a job for Either:

type Source m a r =
    r -> (r -> a -> m (Either r r)) -> m (Either r r)

Once we start implementing sources and sinks, it will be much more convenient to use EitherT instead of returning an Either value:

type Source m a r =
    r -> (r -> a -> EitherT r m r) -> EitherT r m r

This way the monadic action of EitherT provides the short-circuiting behavior, rather than having to encode that explicitly in various places.

And that’s it! As simple as it is, this set of types is expressive enough to implement many of the combinators from the original conduit library. Of course, it’s not nearly as capable, but it’s leaner, easier to understand the core types, and significantly faster in some situations (computation of simple pipelines over Identity on my machine were about 45% faster).

Consumers and producers

One thing that conduit makes very easy to do is to abstract Sinks and Conduits as Consumers, and Sources and Conduits as Producers. Based on our presentation above such an abstraction is not possible. However, we can regain some of the generality with a helper function: You can turn sinks into conduits using a new combinator, returnC:

sinkList $ returnC $ sumC $ mapC (+1) $ sourceList [1..10]

Syndicated 2014-06-06 00:00:00 from Lost in Technopolis

Notes on Free monads

Notes on Free monads

The following article is just a few notes on the nature of the Free monad.

> {-# LANGUAGE DeriveFunctor #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
> {-# LANGUAGE UndecidableInstances #-}
>
> module FreeMaybe where
>
> import Control.Monad (join)
> import Control.Monad.Writer.Class

There can be just two values of type Maybe a: Nothing and Just a. Now let’s look at the free monad of Maybe a, Free Maybe a:

> data Free f a = Pure a | Free (f (Free f a))
>
> instance Functor f => Functor (Free f) where
>     fmap f (Pure a)   = Pure (f a)
>     fmap f (Free ffa) = Free $ fmap (fmap f) ffa
>
> instance Functor f => Monad (Free f) where
>     return = Pure
>     Pure a >>= f = f a
>     Free ffa >>= f = Free $ fmap (>>= f) ffa
>
> instance (Show a, Show (f (Free f a))) => Show (Free f a) where
>     showsPrec d (Pure a) = showParen (d > 10) $
>         showString "Pure " . showsPrec 11 a
>     showsPrec d (Free m) = showParen (d > 10) $
>         showString "Free " . showsPrec 11 m

There are four “shapes” that values of Free Maybe a can take:

Pure a
Free Nothing
Free (Just (Free (Just (... (Free Nothing)))))
Free (Just (Free (Just (... (Free (Pure a))))))

In terms of whether a Free Maybe a represents an a or not, Free Maybe a is equivalent to Maybe a. However, Maybe a is right adjoint to Free Maybe a, meaning that it forgets the structure of Free Maybe a – namely, which of the four shapes above the value was, and how many occurences of Free (Just there were.

Why would you ever use Free Maybe a? Precisely if you cared about the number of Justs. Now, say we had a functor that carried other information:

> data Info a = Info { infoExtra :: String, infoData :: a }
>     deriving (Show, Functor)

Then Free Info a is isomorphic to if infoExtra had been [String]:

> main :: IO ()
> main = do
>     print $ Free (Info "Hello" (Free (Info "World" (Pure "!"))))

Which results in:

>>> main
Free (Info {infoExtra = "Hello",
            infoData = Free (Info {infoExtra = "World", infoData = Pure "!"})})
it :: ()

But now it’s also a Monad, even though we never defined a Monad instance for Info:

> main :: IO ()
> main = do
>     print $ do
>         x <- Free (Info "Hello" (Pure "!"))
>         y <- Free (Info "World" (Pure "!"))
>         return $ x ++ y

This outputs:

>>> foo
Free (Info {infoExtra = "Hello",
            infoData = Free (Info {infoExtra = "World", infoData = Pure "!!"})})
it :: ()

This works because the Free monad simply accumulates the states of the various functor values, without “combining” them as a real monadic join would have done. Free Info a has left it up to us to do that joining later.

Syndicated 2013-09-23 00:00:00 from Lost in Technopolis

Using monad-control with monad transformers

Using monad-control with monad transformers

This article assumes familiarity with monads and monad transformers. If you’ve never had an occasion to use lift yet, you may want to come back to it later.

The Problem

What is the problem that monad-control aims to solve? To answer that, let’s back up a bit. We know that a monad represents some kind of “computational context”. The question is, can we separate this context from the monad, and reconstitute it later? If we know the monadic types involved, then for some monads we can. Consider the State monad: it’s essentially a function from an existing state, to a pair of some new state and a value. It’s fairly easy then to extract its state and later use it to “resume” that monad:

import Control.Applicative
import Control.Monad.Trans.State

main = do
    let f = do { modify (+1); show <$> get } :: StateT Int IO String
    
    (x,y) <- runStateT f 0
    print $ "x = " ++ show x   -- x = "1"
    
    (x',y') <- runStateT f y
    print $ "x = " ++ show x'  -- x = "2"

In this way, we interleave between StateT Int IO and IO, by completing the StateT invocation, obtaining its state as a value, and starting a new StateT block from the prior state. We’ve effectively resumed the earlier StateT block.

Nesting calls to the base monad

But what if we didn’t, or couldn’t, exit the StateT block to run our IO computation? In that case we’d need to use liftIO to enter IO and make a nested call to runStateT inside that IO block. Further, we’d want to restore any changes made to the inner StateT within the outer StateT, after returning from the IO action:

import Control.Applicative
import Control.Monad.Trans.State
import Control.Monad.IO.Class

main = do
    let f = do { modify (+1); show <$> get } :: StateT Int IO String

    flip runStateT 0 $ do
        x <- f
        y <- get
        y' <- liftIO $ do
            print $ "x = " ++ show x   -- x = "1"

            (x',y') <- runStateT f y
            print $ "x = " ++ show x'  -- x = "2"
            return y'
        put y'

A generic solution

This works fine for StateT, but how can we write it so that it works for any monad tranformer over IO? We’d need a function that might look like this:

foo :: MonadIO m => m String -> m String
foo f = do
    x <- f
    y <- getTheState
    y' <- liftIO $ do
        print $ "x = " ++ show x

        (x',y') <- runTheMonad f y
        print $ "x = " ++ show x'
        return y'
    putTheState y'

But this is impossible, since we only know that m is a Monad. Even with a MonadState constraint, we would not know about a function like runTheMonad. This indicates we need a type class with at least three capabilities: getting the current monad tranformer’s state, executing a new transformer within the base monad, and restoring the enclosing transformer’s state upon returning from the base monad. This is exactly what MonadBaseControl provides, from monad-control:

class MonadBase b m => MonadBaseControl b m | m -> b where
    data StM m :: * -> *
    liftBaseWith :: (RunInBase m b -> b a) -> m a
    restoreM :: StM m a -> m a

Taking this definition apart piece by piece:

  1. The MonadBase constraint exists so that MonadBaseControl can be used over multiple base monads: IO, ST, STM, etc.

  2. liftBaseWith combines three things from our last example into one: it gets the current state from the monad transformer, wraps it an StM type, lifts the given action into the base monad, and provides that action with a function which can be used to resume the enclosing monad within the base monad. When such a function exits, it returns a new StM value.

  3. restoreM takes the encapsulated tranformer state as an StM value, and applies it to the parent monad transformer so that any changes which may have occurred within the “inner” transformer are propagated out. (This also has the effect that later, repeated calls to restoreM can “reset” the transformer state back to what it was previously.)

Using monad-control and liftBaseWith

With that said, here’s the same example from above, but now generic for any transformer supporting MonadBaseControl IO:

{-# LANGUAGE FlexibleContexts #-}

import Control.Applicative
import Control.Monad.Trans.State
import Control.Monad.Trans.Control

foo :: MonadBaseControl IO m => m String -> m String
foo f = do
    x <- f
    y' <- liftBaseWith $ \runInIO -> do
        print $ "x = " ++ show x   -- x = "1"

        x' <- runInIO f
        -- print $ "x = " ++ show x'

        return x'
    restoreM y'

main = do
    let f = do { modify (+1); show <$> get } :: StateT Int IO String

    (x',y') <- flip runStateT 0 $ foo f
    print $ "x = " ++ show x'   -- x = "2"

One notable difference in this example is that the second print statement in foo becomes impossible, since the “monadic value” returned from the inner call to f must be restored and executed within the outer monad. That is, runInIO f is executed in IO, but it’s result is an StM m String rather than IO String, since the computation carries monadic context from the inner transformer. Converting this to a plain IO computation would require calling a function like runStateT, which we cannot do without knowing which transformer is being used.

As a convenience, since calling restoreM after exiting liftBaseWith is so common, you can use control instead of restoreM =<< liftBaseWith:

y' <- restoreM =<< liftBaseWith (\runInIO -> runInIO f)

-- becomes...
y' <- control $ \runInIO -> runInIO f

Another common pattern is when you don’t need to restore the inner transformer’s state to the outer transformer, you just want to pass it down as an argument to some function in the base monad:

foo :: MonadBaseControl IO m => m String -> m String
foo f = do
    x <- f
    liftBaseDiscard forkIO $ f

In this example, the first call to f affects the state of m, while the inner call to f, though inheriting the state of m in the new thread, but does not restore its effects to the parent monad transformer when it returns.

Now that we have this machinery, we can use it to make any function in IO directly usable from any supporting transformer. Take catch for example:

catch :: Exception e => IO a -> (e -> IO a) -> IO a

What we’d like is a function that works for any MonadBaseControl IO m, rather than just IO. With the control function this is easy:

catch :: (MonadBaseControl IO m, Exception e) => m a -> (e -> m a) -> m a
catch f h = control $ \runInIO -> catch (runInIO f) (runInIO . h)

You can find many function which are generalized like this in the packages lifted-base and lifted-async.

Syndicated 2013-09-21 00:00:00 from Lost in Technopolis

A whirlwind tour of conduits

A whirlwind tour of conduits

While talking with people on IRC, I’ve encountered enough confusion around conduits to realize that people may not know just how simple they are. For example, if you know how to use generators in a language like Python, then you know pretty much everything you need to know about conduits.

The basics

Let’s take a look at them step-by-step, and I hope you’ll see just how easy they are to use. We’re also going to look at them without type signatures first, so that you get an idea of the usage patterns, and then we’ll investigate the types and see what they mean.

Everything in conduit begins with the Source, which yields data as it is demanded. The dumbest possible form of source is an empty source:

empty = return ()

The next dumbest is a source that yields only a single value:

single = yield 1

In order to use any Source, I must ultimately connected it with a Sink. Sinks are nothing more than code which awaits values from a Source. Let’s look at an example in Python, where these concepts are features of the language itself:

def my_generator():
    for i in range(1, 10):
        yield i

for j in my_generator():
    print j

Here we have a generator (aka Source): a function which simply yields values. This generator is being passed to for statement that consumes the values from it and binds them one by one to a variable j. It then prints each value after it is consumed.

The equivalent code using conduit employs a different syntax, but the general “shape” of the code is the same:

import Control.Monad
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Loops (whileJust_)
import Data.Conduit

myGenerator = forM_ [1..9] yield

main = myGenerator $$
           whileJust_ await $ \j -> 
               liftIO $ print j

I can make the code a little bit closer to Python’s example (making the call to await implicit) if I use Data.Conduit.List:

import Control.Monad
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Loops (whileJust_)
import Data.Conduit
import qualified Data.Conduit.List as CL

myGenerator = forM_ [1..9] yield

main = myGenerator $$ 
           CL.mapM_ $ \j -> 
               liftIO $ print j

Just regular code

Neither Sources nor Sinks have to be special functions, however. They are just regular code written in the ConduitM monad transformer:

import Data.Conduit
import Control.Monad.IO.Class (liftIO)

main = do
    (do yield 10
        yield 20
        yield 30)
        $$
        (do liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await)

Each time await is called, it returns a value that was yielded by the source wrapped in Just, or it returns Nothing to indicate the source has no more values to offer.

There, now you know the basics of the conduit library.

Conduits

Between sources and sinks, there is a third kind of conduit, which is actually called just Conduit. A Conduit sits between sources and sinks, and is able to call both yield and await, applying some kind of transformation or filter to the data coming from the source, before it reaches the sink. In order to use a Conduit, you must fuse it to either a source or a sink, creating a new source/sink which has the action of the Conduit bound to it. For example:

import Data.Conduit
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Loops (whileJust_)

main = do
    (do yield 10
        yield 20
        yield 30)
        $=
        (do whileJust_ await $ \x ->
                yield (x * 2))
        $$
        (do liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await
            liftIO . print =<< await)

This example fuses a conduit that doubles the incoming values from the source to its left. We could equivalently have fused it with the sink to the right. In most cases it doesn’t matter whether you fuse to sources or to sinks; it mainly comes into play when you are using such fusion to create building blocks that will be used later.

Use the types, Luke

Now that we have the functionality of conducts down, let’s take a look at their types so that any errors you may encounter are less confusing.

A source has the type Source m Foo, where m is the base monad and Foo is the type of what you want to pass to yield.

A sink has the corresponding type Sink m Foo a, to indicate that await returns values of type Maybe Foo, while the monadic operation of the sink returns a value of type a.

A conduit between these two would have type Conduit Foo m Foo.

You’re probably going to see the type ConduitM in your types errors too, since the above three are all synonyms for it. It’s a more general type that these three specialized types. The correspondences are:

type Source m o    = ConduitM () o m ()
type Sink i m r    = ConduitM i Void m r
type Conduit i m o = ConduitM i o m ()

The Void you see in there is just enforcing the fact that sinks cannot call yield.

What’s next?

Beyond this, most of the conduit library is a bunch of combinators to make them more convenient to use. In a lot of cases, you can reduce conduit code down to something which is just as brief and succinct as what you might write in languages with native support for such operations. It’s a testiment to Haskell, rather, that it doesn’t need to be a syntactic feature to be both useful and concise.

And what about pipes, and the other competing libraries in this space? In many ways they are each equivalent to what I’ve described above. If you want to use pipes, just write respond and request instead of yield and await, and you’re pretty much good to go! The operators for binding and fusing are different too, but what they accomplish is likewise the same.

If you’re interested in learning more about conduit and how to use it, check out the author’s own tutorial.

Syndicated 2013-07-16 00:00:00 from Lost in Technopolis

Update of gitlib libraries on Hackage, plus git-monitor

Update of gitlib libraries on Hackage, plus git-monitor

I’ve decided after many months of active development to release version 1.0.1 of gitlib and its related libraries to Hackage. There is still more code review to done, and much documentation to be written, but this gets the code out there, which has been working very nicely at FP Complete for about six months now.

The more exciting tool for users may be the git-monitor utility, which passively and efficiently makes one-minute snapshots of a single Git working tree while you work. I use it continually for the repositories I work on during the day. Just run git-monitor -v in a terminal window, and start making changes. After about a minute you should see commit notifications appearing in the terminal window.

Syndicated 2013-06-30 00:00:00 from Lost in Technopolis

Nightly builds of GHC HEAD for Ubuntu 12.04.2 LTS

Nightly builds of GHC HEAD for Ubuntu 12.04.2 LTS

Chatting with merijn on #haskell, I realized I have a file server running Ubuntu in a VM that’s idle most of the time, so I decided to set up a jenkins user there and make use of it as a build slave in the evenings. This means that at http://ghc.newartisans.com, you’ll now find nightly builds of GHC HEAD for Ubuntu as well (64-bit). It also includes fulltest and nofib results for each build.

Syndicated 2013-06-19 00:00:00 from Lost in Technopolis

Temporary mirror of comonad.com

Temporary mirror of comonad.com

Until the Comonad Reader comes back online, I have a temporary mirror setup at http://comonad.newartisans.com. It’s a bit old (Sep 2012), but has some classics like “Free Monads for Less”. It is missing the “Algebra of Applicatives”, though, since I hadn’t run the mirror in a while.

Syndicated 2013-06-19 00:00:00 from Lost in Technopolis

15 Jul 2010 (updated 15 Jul 2010 at 17:07 UTC) »

A word on Haskell Monads and C++

After spending a good while trying to understand monads in Haskell, and why the Haskell world is so fascinated by them, I finally understand why they aren’t as exciting to other languages, or why they are completely missing from languages like C++: because they’re mostly already there.

At its simplest, a monad is an abstraction of a value which knows how to apply functions to that value, returning a new monad. In other words, it’s a way to turn values into little packages that wrap additional functionality around that value. Sounds a lot like what an object does…

But this doesn’t tell you what’s exciting about them, from Haskell’s point of view. Another way of looking at them, without going into the wheres and whys, is this: In a lazily-evaluated, expression-based language, monads let you express sequenced, interdependent computation.

Consider the following two code examples. First, in C++:

  #include <iostream>
int main() {
  std::cout << "Hello, world!"
            << "  This is a sample"
            << " of using a monad in C++!"
            << std::endl;
  return 0;
}

And the same code in Haskell:

  module Main where
main :: IO ()
main = do putStr "Hello, world!"
          putStr "  This is a sample"
          putStr " of using a monad in C++!"
          putStr "\n"

What the IO monad in the second example is doing is making the sequenced evaluation of the print statements possible using a nice, normal looking syntax. The C++ code doesn’t need monads to do this, because it already embodies the concept of abstracted values (here, the iostream passed between insertion operators) and sequenced computation (because it’s not lazy).

To compare Monads with C++:

  1. Monads are abstractions of values. So are most C++ objects.

  2. Monads permit functions to be applied to the “contained” value, returning a a new version of the monad. C++ objects provide methods, where the mutated object is the new version.

  3. Monads provide a way to encapsulate values in new monads. C++ objects have constructors.

As another example, consider the case where you have to call five functions on an integer, each using the return value of the last:

  j(i(h(g(f(10))))

This is an identical operation in both Haskell and C++. But what if the return value of each function wasn’t an integer, but an “object” that could either be an integer, or an uninitialized value? In most languages, there’s either a type, or syntax, for this concept:

  C++      boost::optional<int>
C#       int?
Java     Integer
Haskell  Maybe Int

If each function returns one of these, but takes a real integer, it means we have to check the “null” status of each return value before calling the next function. In C++ this leads to a fairly common idiom:

  if (boost::optional<int> x1 = f(10))
  if (boost::optional<int> x2 = g(*x1))
    if (boost::optional<int> x3 = h(*x2))
      if (boost::optional<int> x4 = i(*x3))
        j(*x4);

Note that not only are these calls sequential, but due to the meaning of optionality, they are also inherently short-circuiting. If f returns none, none of the other functions get called.

Haskell can do this type of thing natively as well, and it looks similar:

  case f 10 of
  Nothing -> Nothing
  Just x1 -> 
    case g x1 of
      Nothing -> Nothing
      Just x2 -> 
        case h x2 of
          Nothing -> Nothing
          Just x3 -> 
            case i x3 of
              Nothing -> Nothing
              Just x4 -> j x4

But it’s ugly as sin. In C++, we can be evil and flatten things out using basic features of the language, assuming we pre-declare the variables:

  (   (x1 = f(10))
 && (x2 = g(*x1))
 && (x3 = h(*x2))
 && (x4 = i(*x3))
 && (x5 = j(*x4)), x5)

Or you can eliminate the use of temporaries altogether by creating a wrapper class:

  template <typename T> struct Maybe {
  boost::optional<T> value;

  Maybe() {}
  Maybe(const T& t) : value(t) {}
  Maybe(const Maybe& m) : value(m.value) {}

  Maybe operator>>(boost::function<Maybe<T>(const T&)> f) const {
    return value ? f(*value) : *this;
  }
};

If we change our functions to return Maybe<int> instead of just boost::optional<T>, it allows us to write this:

  f(10) >> g >> h >> i >> j

Which in Haskell is written almost the same way:

  f 10 >>= g >>= h >>= i >>= j

But where Haskell needs Monads to make this type of thing reasonable and concise, C++ doesn’t. We get passing around of object state between function calls as part of the core language, and there are many different ways to express it. However, if you confined C++ to function definitions and return statements only – where all function arguments were pass-by-value – then things like Monads would become an essential technique for passing knowledge between calls.

So it’s not that you can’t use Monads in C++, it’s just that they require enough extra machinery, and aren’t unique enough compared to core features of the language, that there isn’t the same level of motivation for them as there is in Haskell, where they can really add to the expressiveness of code.

Syndicated 2010-07-15 12:20:06 (Updated 2010-07-15 16:50:38) from Lost in Technopolis

30 Oct 2009 (updated 10 Nov 2009 at 09:12 UTC) »

A C++ gotcha on Snow Leopard

I’ve seen this issue mentioned in some random and hard to reach places on the Net, so I thought I’d re-express it here for those who find Google sending them this way.

On Snow Leopard, Apple decided to build g++ and the standard C++ library with “fully dynamic strings” enabled. What this means for you relates to the empty string.

When fully dynamic strings are off (as was true in Leopard), there exists a single global variable representing the empty string. This variable lives in the data segment of libstdc++, and so it does not exist on the heap. Whenever a string is deconstructed, the standard library would check whether that string’s address matches matches the empty string’s: if so, it does nothing; if not, it calls free.

With fully dynamic strings on, there is no global empty string. All strings are on the heap, and once their reference count goes to zero, they get deallocated. Where this creates a problem is if you mix and match code. If a library that does have fully dynamic strings enabled (aka the standard library) receives an empty string from code which does not have it enabled (aka, the app you just built), it will try to free it and your application will crash.

Here’s a reproducible case for this issue using Boost:

  #include <string>
#include <sstream>
#include <boost/variant.hpp>

int main()
{
  std::ostringstream buf;
  boost::variant<bool, std::string> data;
  data = buf.str();
  data = false;
  return 0;
}

In this case – which really happened to me – I created an empty string by calling ostringstream::str(). Since I don’t have fully dynamic string on, its address is in data space, not on the heap. I pass this string to boost::variant, which makes a copy of that address. Later, when the variant is reassigned false, it calls ~basic_string to deconstruct the string. Since my standard library is compiled with fully dynamic strings, the destructor for basic_string doesn’t recognize that its the “special” empty string, so it tries to free it.

The solution to this problem is three-fold:

  1. You must be using the g++ that comes with Xcode, or if you build your own (say, via MacPorts), you must configure it using --enable-fully-dynamic-string. I’ve already submitted a patch to this effect to the MacPorts crew.

  2. All libraries must be compiled with -D_GLIBCXX_FULLY_DYNAMIC_STRING.

  3. Your own code must be compiled with -D_GLIBCXX_FULLY_DYNAMIC_STRING.

You’ll know if this issue is biting you by looking at a stack trace in gdb. You’ll see a crash somewhere inside basic_string’s _M_destroy (which calls free). Move up the trace a bit and check whether the string it’s trying to free is 0 bytes long.

To recap: what’s happened is that an empty string constructed by code without fully dynamic strings got deallocated by code that was. That is, most likely you, or a library you built, handed an empty std::string to the system library.

Syndicated 2009-10-30 09:35:32 (Updated 2009-11-10 08:12:39) from Lost in Technopolis

67 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!