Archive for April, 2009

Sudoku Zipper

April 23, 2009

I will post about an interesting idiom in Functional Programming , that I came across recently – Zipper. zipper provides us with
the ability to traverse pure functional programming data structures with surprising agility and efficiency , without losing the purity.

module Main where

import Sudoku.Parser as P

import Sudoku.Data as D

import System.Environment(getArgs)

import Data.Tree

import Data.Tree.Zipper

The imports are similar to the earlier one but with an additional module loaded – the zipper.

There’s just a minor change in the main, we pass the sudoku to zipperSol, which solves the sudoku and prints the solution.

main = do args <- getArgs

          case args of

            (f:[])   -> do x <- parseSudokuFromFile f

                           zipperSol x

            _        -> do usage

usage = do print "Usage:"

           print "STest <file>"

We create the tree the way we did earlier.

sudokuTree s = Node s (map sudokuTree (explore s))

explore s =  case (getUnfilledCell s) of

               Just (c,pos) -> filter validSudoku [ (fillSudoku s pos val) |  val <- getCells c ]

               Nothing      -> []

getCells (Ttv  lst) = map Sld lst

getCells  x =     [x]

Then, pass it on to the fromTree function of Data.Tree.Zipper module to get a zipper. Thus the below function takes a Sudoku and
constructs a zipper out of it.

sudokuZipper s = fromTree $ sudokuTree s

Now the solution is using Depth First Search. Search for nodes in the tree in depth first manner and if a solution is found, print it and otherwise
report inability.

zipperSol s = case (dfs $ sudokuZipper s) of

                Nothing -> do print "No solution found :-(  "

                (Just x) ->do print  x

The function to perform depth first search is simple too. Given a treeLocation, we check if it is a leaf. If it is a leaf, we need to handle it differently.
Otherwise, we call dfs on the first child of the tree in a recursive manner.
The different handling of the leaf node is this: see, if it is a solution. If it is, report it. Otherwise, delete this leaf and perform the depth first
search on the new tree formed.
Data.Tree.Zipper.delete function returns a May be TreeLoc. The new location will be the right sibling of the treeLoc that we deleted or, if
there’s no right sibling, the parent (which will become a leaf).
it is a leaf node.

dfs :: TreeLoc Sudoku -> Maybe Sudoku

dfs treeLoc = case firstChild treeLoc of

                 Nothing  -> handleLeafNode treeLoc

                 Just x   -> dfs x

handleLeafNode treeLoc =

    case (solvedLoc treeLoc) of

      Just solution -> Just solution

      Nothing ->  case (delete treeLoc) of

                   Nothing -> Nothing

                   (Just x) -> dfs x

solvedLoc :: TreeLoc Sudoku -> Maybe Sudoku

solvedLoc (Loc (Node x _) _ _ _) = if (hasEmptyCells x) then

                                      Nothing

                                   else (Just x)

hasEmptyCells s = case (getUnfilledCell s) of

                    Nothing  -> False

                    Just _   -> True

Elegant and cool. This program is slightly slower than my earlier one ( it takes 9 seconds to solve a sudoku that was solved in 8 seconds by the
earlier, tree based one) and the memory consumption is slightly on the higher side as well. However I can choose my search strategy.
I would be able to move from depth first search to breadth first search or best first search without breaking a sweat.

Lazy Evaluation

April 22, 2009

While I understood the idea of lazy evaluation for a while now, today I saw it in action for the first time. I intend to share
what I found in this post. In the process, I hope to make amends for the ridiculous code that I had posted earlier.  Much of the credit for improvement in my haskellfu goes to  Real World Haskell and to Haskell School of Expression.

The problem is a relatively simple one – to solve a sudoku.

1 Preliminaries

module Main where

import Sudoku.Parser as P
import Sudoku.Data as D
import System.Environment(getArgs)
import Data.Tree

Out of the above imports, Sudoku.Parser is an un-important module whose responsibility is to parse a given text file containing
the input sudoku and to create a Sudoku data structure, which is defined in the next module in the list – the Sudoku.Data.
Apart from defining the data , the Data module also exports the functions needed to manipulate the in-memory representation of the sudoku.
These two modules are not really relevant for this post. That is, the current algorithm is “independant” of the way the Sudoku is
represented in memory.

However, in order to give the context, (especially for the later sections where I present the profiling data),
I’ve used an Array of Cells and an integer representing the unit of the Sudoku. So a normal, 9 X 9 , sudoku will be an Array
of (Int,Int) ranging from (0,0) to (80,80) and the unit is 3 to indicate the size of the array. (The same representation can deal
with a 16 X 16 or a 25 X 25 or higher sudoku as well ( although – I could not find test data to try it).
And a cell is defined somewhat like below:

data Cell = Given Int
            | ONEOF [Int]
            | Sld Int

An empty Cell will be represented as a list of the values it can take. Given cell is represented as an Int. In order to perform trial and error,
the cell with a list will be replaced by a Sld n where n is a number taken from ONEOF’s list.

Going back to the imports,
System.Environment is imported for accessing the name of the file from command line arguments.
The next one that was imported – Data.Tree, is a part of the standard Haskell libraries and forms the crux of this algorithm.
It provides a (lazy) tree data type and related functions.

2 Data.Tree

The tree is defined as

data Tree a = Node { rootLabel :: a, subForest :: Forest}

where, Forest is a synonym for a list of Trees.

The brute-force approach to solve the problem is to create a (lazy) tree of sudokus, such that the given tree is at the root. The children of a given node,
(that is, the subForest of the Tree) are generated by taking one of the unfilled cell in the sudoku of the node , and substituting with one of the possible
values that this cell can take. In other words, each child of a node represents a trial for a particular cell in the sudoku.
However, we ensure that the sudokus at each node are “well-formed” sudokus, which follow the rules of the sudoku. Consequently, we have , at the leaves,
either a solved sudoku or a wrong choice that’s been taken.

The creation of such a tree is surprisingly pithy

sudokuTree s = Node s (map sudokuTree (explode s))

explode s =  case (getUnfilledCell s) of
               Just (c,pos) -> filter validSudoku
                          [ (fillSudoku s pos val) |  val <- getCells c ]
               Nothing      -> []

As can be seen from the use of filter, we are ensuring that only valid Sudokus can form the subforest of a node.
And the process is repeated recursively.

A side note on the functions from Sudoku.Data module:

  1. The getUnfilledCell returns Maybe (Cell,(Int,Int)) that, is a cell and its position if there are empty cells , nothing otherwise.
  2. validSudoku is a function which takes a sudoku and returns false if it breaks any of the rules of sudoku , true otherwise.

Since we are interested mainly in the leaves of the tree, we now define

getLeaves (Node a []) = [a]
getLeaves (Node a xs) = concatMap getLeaves xs

which is again a simple straight forward function.

Thats it. The things left are the main function and a couple of helper functions.
The main function has to perform the following actions:

  1. reading the file name from the command line argument
  2. passing the file name to a parser combinator that parses the file and returns a sudoku
  3. creating a tree using this sudoku, and from the leaves of which, filtering out those sudokus that are filled
  4. printing the list of such sudokus.
main = do args <- getArgs
          case args of
            (f:[])   -> do x <- parseSudokuFromFile f
                           print  $ filter
                                  (\x -> not $ hasEmptyCells x)
                                  (getLeaves $ sudokuTree x)
            _        -> do usage

usage = do print "Usage:"
           print "STest <file>"

getCells (Ttv  lst) = map Sld lst
getCells  x =     [x]

hasEmptyCells s = case (getUnfilledCell s) of
                    Nothing  -> False
                    Just _   -> True

3 Running the code

I compiled the code with −O2 option (ghc) and then ran the code with −S option on a few puzzles from WebSudoku.
I here present some interesting data related to one of the puzzle for which it took 19 seconds to solve (the only one that took more than 5
seconds to solve).
The problem is as below (it is an evil category puzzle taken from the same site).

805000000
920000007
000650090
400810020
000204000
060079005
070086000
500000069
000000402

here 0 represents a space. The output was as follows:

:~/src/hascode> time ./SudokuTree Test/Sudoku/test3
"Parsed Sudoku"
[8,3,5,7,9,2,6,1,4
9,2,6,1,4,8,5,3,7
7,4,1,6,5,3,2,9,8
4,9,7,8,1,5,3,2,6
3,5,8,2,6,4,9,7,1
1,6,2,3,7,9,8,4,5
2,7,4,9,8,6,1,5,3
5,8,3,4,2,1,7,6,9
6,1,9,5,3,7,4,8,2
]

real    0m19.607s
user    0m19.589s
sys     0m0.016s

and when I ran it with the −S option of RTS, it contained the following astonishing data:

   5,488,579,640 bytes allocated in the heap
      49,910,436 bytes copied during GC
          45,804 bytes maximum residency (39 sample(s))
          66,900 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 10449 collections,     0 parallel,  0.35s,  0.42s elapsed
  Generation 1:    39 collections,     0 parallel,  0.00s,  0.01s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   17.74s  ( 17.94s elapsed)
  GC    time    0.35s  (  0.42s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   18.09s  ( 18.37s elapsed)

  %GC time       1.9%  (2.3% elapsed)

  Alloc rate    309,370,701 bytes per MUT second

  Productivity  98.0% of total user, 96.6% of total elapsed

The number of bytes allocated seemed staggering! Incidentally, I computed the number of nodes in the tree, using the Data.Tree.flatten function
and for the above problem, it is 68480! That is, 68480 sudokus – each of which has 9 X 9 Cells !
It ofcourse is lazy evaluation (and the garbage collection), in action – At no point in time, the whole tree is in memory. Only the portion of the
tree that is needed is built and the other nodes are not (yet) evaluated. And ofcourse those nodes that are not needed anymore
are garbage collected.
A further proof of the impact of lazy evaluation comes when I modify the program such that it prints the first solution that it finds (that is, we stop
when we get the first solution). I do it by modifying only the main function

main = do args <- getArgs
          case args of
            (f:[])   -> do x <- parseSudokuFromFile f
                           print  $ filter
                                  (\x -> not $ hasEmptyCells x)
                                  (getLeaves $ sudokuTree x)
            _        -> do usage

to

main = do args <- getArgs
          case args of
            (f:[])   -> do x <- parseSudokuFromFile f
                           print $ head  $ filter
                                  (\x -> not $ hasEmptyCells x)
                                  (getLeaves $ sudokuTree x)
            _        -> do usage

That is, I did not change the way the tree is constructed nor did I interfere in the recursive function (In imperative-speak, I did not add a “break”
statement), I am only printing the first member of the list and ignoring the rest (where present).

And this time, the same sudoku is solved under 8 seconds, which is relatively unsurprising,
but the profiling data (from − S RTS option) provided me with the shock of the day

   2,565,699,092 bytes allocated in the heap
      23,344,784 bytes copied during GC
          40,828 bytes maximum residency (19 sample(s))
          62,928 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  4884 collections,     0 parallel,  0.21s,  0.20s elapsed
  Generation 1:    19 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    8.74s  (  8.84s elapsed)
  GC    time    0.21s  (  0.21s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    8.95s  (  9.05s elapsed)

  %GC time       2.3%  (2.3% elapsed)

  Alloc rate    293,539,944 bytes per MUT second

  Productivity  97.7% of total user, 96.6% of total elapsed

Perhaps it does not need an expert of such matters to say that in the second case the portion of the tree that was evaluated was only about a half of the
earlier.( Going by the figures under the sections related to GC like “bytes copied during GC” and the number of “collections”).

Although what I said here may not be new for most haskellers, it still gives me goose bumps thinking of the way evaluation must have happened and how
different it is from what I am used to.