Wednesday, September 5, 2012

Useful pure functional programming

I guess all programmers used to the imperative model gets stunned wondering how crazy the pure functional model is. After all, how could we do anything useful without side effects, printing to the terminal, opening a window, producing logs and variables? OMG.

I did get stunned with Haskell and because of a happy insistency I realized that the problem was in my head and not in the pure functional model.

Differences

Living for a long time in the context of an imperative world made me get used to think in a specific sequential way. I always needed to tell the computer how to do every computational step, what states to hold, what variables to update, when to return some value, etc.

On the other hand, in the pure functional world, I'm forced to think in a way to transform data. Instead of thinking on each step, now I need to think how to transform some data into another, i.e. how to extract the thing I want from the thing I have.

The classical examples to show this difference involves list manipulation.

Example #1

To calculate the sum of a list in the imperative style, you need to control the current sum state, which needs to be updated at each element.

int sum(int[] list) {
  int result = 0;
  for (int i : list)
    result += i;
  return result;
}

In the functional world, you need to transform a list into a single number that represents the sum of all elements, i.e. the sum of an empty list will always be zero and the sum of any other list can be extracted by adding its first value and the sum of the rest of the list.

sum [] = 0
sum (x:xs) = x + sum xs

Example #2

When we need to maintain control over two lists in the imperative world, one for input and one for output, we need to include the current index of each list in the "bag of things we need to reason about in order to avoid silly errors".

String[] toText(int[] list) {
    String[] result = new String[list.length];
    for (int i = 0; i < list.length; ++i)
        result[i] = Integer.toString(list[i]);
    return result;
}

The same example in the functional world stays simple as the sum of the list, the only difference is that we transform the list using the list constructor instead of the plus operator.

toText [] = []
toText (x:xs) = show x : toText xs

Motivation

The pure functional model can do many more things beyond these classical examples in a way that I consider to be pretty and with free benefits. This post is yet another try to demonstrate this. As I'm quite new to the pure functional world and have no profound knowledge of Monads, Functors, Arrows and all that stuff that looks brilliant on the hand of those who knows how to use it, I guess this post will be easy to follow, as long as you have a basic understanding of Haskell.

Calculator

We'll create a calculator in Haskell. All code shown will be pure, with no side effects, including a DSL to write tests, the test executor itself and two other test case transformations, one to write the test in text form and one to write the same text in JUnit form.

Test cases

First, let's define the test cases' format. They are a sequence of actions and assertions:

data TestSequence =
    Do Action TestSequence
  | Check Assertion TestSequence
  | Done

With that data definition, the test cases can be written like this:

test =
  Do a.
  Do b.
  Do c.
  Check d$
  Done

The need to write "Done" at the end of the test case bothers me. To avoid that, we can create a type synonym to open-ended tests:

type Test = TestSequence -> TestSequence

Now the tests can be written as:

test =
  Do a.
  Do b.
  Do c.
  Check d

To me, this is a valid small DSL to write our tests cases.

The only action available to our user will be to push the buttons and the only information that could be checked is the display:

data Action =
    Press Button

data Assertion =
    DisplayHasNumber Int

data Button =
    Zero
  | One
  | Two
  | Three
  | Four
  | Five
  | Six
  | Seven
  | Eight
  | Nine
  | Plus
  | Minus
  | Times
  | Divide
  | Equals
  | Clear
    deriving (Show)

First test

The first step is done. We already have a DSL to write tests that type-check. Time to write our first test:

sample :: Test
sample =
    Do (Press One).
    Do (Press Plus).
    Do (Press One).
    Do (Press Equals).
    Check (DisplayHasNumber 2).

    Do (Press Clear).
    Check (DisplayHasNumber 0).

    Do (Press Two).
    Do (Press Zero).
    Check (DisplayHasNumber 20).
    Do (Press Divide).
    Do (Press Two).
    Check (DisplayHasNumber 2).
    Do (Press Equals).
    Check (DisplayHasNumber 10)

Test transformation

Note that the test case is actually a data structure, giving us the possibility to transform it. To make the transformation easier, let's create a function that applies some generic function to each step, creating a list with the results:

unroll :: (TestSequence -> a) -> Test -> [a]
unroll f t = g (t Done)
  where g Done = [f Done]
        g v@(Do _ next) = f v : g next
        g v@(Check _ next) = f v : g next

Test case -> Text

We can transform the test case into a text:

prettyPrint :: Test -> String
prettyPrint = unlines . unroll prettyPrintTestSequence

prettyPrintTestSequence :: TestSequence -> String
prettyPrintTestSequence s =
    case s of
      Done              -> "end"
      Do action _       -> prettyPrintAction action
      Check assertion _ -> prettyPrintAssertion assertion

prettyPrintAction :: Action -> String
prettyPrintAction (Press button) = 
    "press " ++ prettyPrintButton button

prettyPrintButton :: Button -> String
prettyPrintButton = map toLower . show

prettyPrintAssertion :: Assertion -> String
prettyPrintAssertion (DisplayHasNumber number) = 
    "the display should be showing the number " ++ 
    show number

Let's try in GHCi:

\> putStr $ prettyPrint sample
press one
press plus
press one
press equals
the display should be showing the number 2
press clear
the display should be showing the number 0
press two
press zero
the display should be showing the number 20
press divide
press two
the display should be showing the number 2
press equals
the display should be showing the number 10
end

Test case -> JUnit

We can translate our test case into JUnit:

generateJUnit :: Test -> String
generateJUnit = 
    ("@Test\npublic void test() {\n" ++) . 
    unlines .
    unroll generateJUnitTestSequence

generateJUnitTestSequence :: TestSequence -> String
generateJUnitTestSequence s =
    case s of
      Done      -> "}"
      Do a _    -> generateJUnitAction a
      Check a _ -> generateJUnitAssertion a

generateJUnitAction :: Action -> String
generateJUnitAction (Press b) =
    generateJUnitButton b ++ ".press();"

generateJUnitButton :: Button -> String
generateJUnitButton b = "getButton" ++ show b ++ "()"

generateJUnitAssertion :: Assertion -> String
generateJUnitAssertion (DisplayHasNumber n) =
    "assertEquals(" ++ show n ++ ", getDisplayNumber());"

At GHCi:

\> putStr $ generateJUnit sample
@Test
public void test() {
getButtonOne().press();
getButtonPlus().press();
getButtonOne().press();
getButtonEquals().press();
assertEquals(2, getDisplayNumber());
getButtonClear().press();
assertEquals(0, getDisplayNumber());
getButtonTwo().press();
getButtonZero().press();
assertEquals(20, getDisplayNumber());
getButtonDivide().press();
getButtonTwo().press();
assertEquals(2, getDisplayNumber());
getButtonEquals().press();
assertEquals(10, getDisplayNumber());
}

Concrete test

And of course, we can use our test case data structure to actually test a calculator implementation:

data TestResult =
    Ok
  | Failed FailureMessage

type FailureMessage = String

instance Show TestResult where
    show Ok = "Test passed"
    show (Failed m) = "Test failed: " ++ m

checkTest :: Test -> TestResult
checkTest t = 
    evalState 
    (threadCheckState . unroll checkTestSequence $ t)
    mkCalculator

threadCheckState :: [State Calculator TestResult] ->
                    State Calculator TestResult 
threadCheckState = go 0
  where go _ [] = return Ok
        go n (x:xs) = x >>= (f n xs)
        f n xs Ok = go (n + 1) xs
        f n _ (Failed m) = 
          return . Failed $
            "Step " ++ show n ++ ". " ++ m

checkTestSequence :: TestSequence -> 
                     State Calculator TestResult
checkTestSequence Done = return Ok
checkTestSequence (Do a _) = checkAction a
checkTestSequence (Check a _) = checkAssertion a

checkAction :: Action -> State Calculator TestResult
checkAction (Press b) = do
    modify $ pressButton b 
    return Ok

checkAssertion :: Assertion -> State Calculator TestResult
checkAssertion (DisplayHasNumber n) =
    get >>= \c ->
      if displayNumber c == n
        then return Ok
        else return . Failed $ 
          "Wrong number in display, should be " ++
          show n ++ " but was " ++ show (displayNumber c)

Calculator's core

Finally, to type-check our tester, we need to define a first version of our calculator. Here is your opportunity to try your own. I've produced the following code to pass the test:

data Calculator = Calculator {
    displayNumber :: Int
  , operation :: Maybe (Int -> Int -> Int)
  , savedNumber :: Int
  }

pressButton :: Button -> Calculator -> Calculator
pressButton b =
  case b of
    Zero    -> appendNumber 0
    One     -> appendNumber 1
    Two     -> appendNumber 2
    Three   -> appendNumber 3
    Four    -> appendNumber 4
    Five    -> appendNumber 5
    Six     -> appendNumber 6
    Seven   -> appendNumber 7
    Eight   -> appendNumber 8
    Nine    -> appendNumber 9
    Plus    -> saveOperation (+)
    Minus   -> saveOperation (-)
    Times   -> saveOperation (*)
    Divide  -> saveOperation div
    Equals  -> performOperation
    Clear   -> clear

appendNumber :: Int -> Calculator -> Calculator
appendNumber i c = 
  c { 
    displayNumber = (displayNumber c) * 10 + i 
  }

saveOperation :: (Int -> Int -> Int) -> 
                 Calculator -> Calculator
saveOperation f c = 
  c { 
    savedNumber = (displayNumber c)
  , displayNumber = 0
  , operation = Just f
  }

performOperation :: Calculator -> Calculator
performOperation c = 
  c { 
    savedNumber = newNumber
  , displayNumber = newNumber 
  }
  where newNumber = 
          case (operation c) of
            Nothing -> displayNumber c
            Just f  -> let a = savedNumber c
                           b = displayNumber c 
                        in
                           f a b

clear :: Calculator -> Calculator
clear = const mkCalculator

mkCalculator :: Calculator
mkCalculator = 
  Calculator { 
    displayNumber = 0
  , operation = Nothing
  , savedNumber = 0
  }

Running the tests

Now we can run the tester against our implementation at GHCi, let's do it:

\> checkTest sample
Test passed

This means our calculator is good enough for our use case. To assure the test case can capture an unexpected behavior, let's introduce a bug changing the value of the multiplication on the function "appendNumber" from "10" to "100" and run the test again:

\> checkTest sample
Test failed: Step 9. Wrong number in display, should be 20 but was 200

Wonderful.

Summary

We've built a basic code for a working calculator that is good enough for a simple use case. We can add new tests and keep factoring our code, TDD style. All that was made in a 100% pure way, with no dependency upon IO, state, variables or logs.

Beyond the benefit to be able to do any kind of transformation in the test cases, we can also run as many transformations and as many tests we want in parallel, because the test and the calculator are thread-safe by the simple fact that they are pure.

We did this in only 251 lines of code. How many classes or lines are needed to do the same in your favorite language?

I hope this post can enlighten you to think outside the box of the imperative world. There are many advantages in learning a pure functional language, even if you use imperative languages at work.

The code is available at GitHub: https://gist.github.com/3354394.

4 comments:

  1. I believe that your comparison of functional/imperative isn't fair : your imperative algorithms use constant stack, while the functional ones don't. You should write the tail recursive functions, and you'll get slightly uglier code.
    Something like:


    sum = sum' 0
    sum' r 0 = r
    sum' r n = sum' (r+n) (n-1)

    ReplyDelete
    Replies
    1. The main point is that you need to think in terms of data transformation. Even if you accumulate the sum in the function argument, you still declare it in a "transformation recipe" instead of a step-by-step computation:

      sumA [] = 0
      sumA (x:xs) = sumA' x xs
      sumA' a [] = 0
      sumA' a (x:xs) = sumA' (a + x) xs

      Still, when you want to transform a list of things into a single thing, you may use the existing higher order functions:

      import Data.List (foldl1')
      sumB = foldl1' (+)

      Delete
  2. Usually I like it when Haskell beats Java :) But in this case, the comparison seems unfair.

    In Example #2, in the beginning of your post, you claim that "when we need to maintain control over two lists in the imperative world, one for input and one for output, we need to include the current index of each list in the 'bag of things we need to reason about in order to avoid silly errors'". I think this is just the difference between low-level arrays and high-level collections. With collections, one would write the following in Java:

      List<String> result = new ArrayList<String>();
      for (int i : list) {
        result.add(Integer.toString(i));
      }
      return result;

    This code has the same structure as your code for sum in Example #1: new List() corresponds to 0, and the call of add corresponds to the use of +=. The implementation does not contain any indexes. (In Scala, mutable collections even provide an operator += as an alternative to add, which makes this pattern even more visible).


    The design of your Java example is actually somewhat functional: The method accepts an array of integers, and returns an entirely different array of strings. Ok, the method uses mutation internally to create the resulting array, but the overall idea is functional.


    I would say that the following variant catches the essence of imperative programming better:

      public class Global {
        public static Object[] data;

        void toText(Object[] data) {
          for (int i = 0; i < data.length; i++) {
            data[i] = Integer.toString((Integer)data[i]);
          }
        }
      }

    ReplyDelete
    Replies
    1. Agreed.

      It seems that a fair comparison is hard to achieve. The Haskell code can get more complicated if you use accumulating arguments and want to deal with lazy thunks overflow. The Java version can get easier using collections or even worse using global state.

      In my point of view, the culture of the imperative programmer is to accept state and step-by-step recipes as natural. Haskell forced me to change: stop thinking in steps, start thinking in transformations.

      The O.O. has a nice touch with that too, but in a much more subtle way (as Java does not make a big push to stop you from using what you already know): stop thinking in procedures, start thinking in message passing.

      Delete