Are the Knight Riders totally out of Semis?

May 7, 2009

Much as I hate the phrase “The Tournament is wide Open” (which ranks right up there with “DLF maximum” and “Citi moment of success” in terms of irritability and frequency), especially when I have to hear it from the likes of Ravi Shastry during one of those fugly presentation ceremonies, am enough intrigued by it to see if one can apply one’s haskellfu to throw more light on the alleged width of the tournament’s opening.

In short, I want a (haskell) program that takes as input:

  1. points table and
  2. future matches
  3. A team name

and can answer the following question :

under what conditions can the team reach the last four?

Assuming that at the end of each match the net run rate(NRR) will be incremented by 0.5 for the winner and decremented by 0.5 for the loser and that a match could also result in a draw (thanks to the the rain),in which case the teams share a point each (and with no change in the NRRs), the tree of possibilities will have a small matter of 322 nodes (considering there are 22 more matches to go , as I type this).  And 320 is just 348,67,84,401 ! I know for sure I would not be able to write the program in java/C++ as easily as  I can do it in Haskell.

So, Here I go,

module Main where

import System.IO

import Data.Tree

import System.Environment

import qualified Data.Map as M
import qualified Data.List as L

A match is a record of two fields teamA , teamB and points won by teamA and points won by teamB

data Match = Match (String,Int) (String, Int)

                deriving Show

Each team has a unique identifier to represent it, a collection of points and net-run rate

The teams can be compared on the basis of their points. If the points are identical, then their net-run rates are identified

data Team = Team{teamName:: String,

                 teamScore      :: Int,

                 teamNetRunRate :: Float


                deriving (Eq,Show)

compTeams (Team _ t1 n1) (Team _ t2 n2)

   = if (t1 > t2) then GT

     else if (t1 < t2) then LT

          else if (n1 > n2) then GT

               else if (n1 < n2) then LT

                    else EQ

instance Ord Team where

 compare  = compTeams

we can also add three functions to the team that lets us add a win, lose and a draw

addWin (Team t1 s r) = Team t1 (s + 2) (r + 0.5)

addLoss (Team t1 s r) = Team t1 s (r - 0.5)

addDraw (Team t1 s r) = Team t1 (s+1) r

A tournament can be defined as a list of matches that were already played and a list of planned clashes between teams.
However, I will also need a map between String and a team in order to use just the strings

data Tournament = Tourney { teamMap :: M.Map String Team,

                            played  :: [Match],

                            toPlay  :: [(String,String)]


                      deriving Show

Thus, this tournament presents a snapshot of the tournament at any given instant.
Given these, identifying the top 4 is trivial

topFour :: Tournament  -> [Team]

topFour t = drop  4 (L.sort $ M.elems $ teamMap t)

To conduct a match, we need to take a potential clash out of the “toPlay” list, convert it into a Match and add it to
the played list and then update the teamMap with appropriate points depending on who won the particular match.
In order to do so, we need a new data type to indicate the result of a match

data MatchRes = AWon | BWon | Draw

As mentioned earlier, conducting a match is akin to taking a given tournament and a result and getting a new snapshot of the tournament.

conductMatch :: Tournament -> MatchRes -> Tournament

conductMatch t@(Tourney _ _ []) _   = t -- should it be an error

conductMatch t@(Tourney teams pld topl) mres  

  =   Tourney (updateTeams teams newMatch) (newMatch:pld) (tail topl)

       where newMatch = clashResToMatch (head topl) mres

The two functions used above are defined below:

clashResToMatch (s1,s2) AWon = Match (s1,2) (s2, 0)

clashResToMatch (s1,s2) BWon = Match (s1,0) (s2, 2)

clashResToMatch (s1, s2) _   = Match (s1, 1) (s2, 1)

updateTeams tMap (Match (t1,p1) (t2,p2))

  = addPointsToTeam t2 p2 (addPointsToTeam t1 p1 tMap)

addPointsToTeam t1 0 tMap = M.update (\x -> Just $ addLoss x) t1 tMap

addPointsToTeam t1 1 tMap = M.update (\x -> Just $ addDraw x) t1 tMap

addPointsToTeam t1 2 tMap = M.update (\x -> Just $ addWin x) t1 tMap

addPointsToTeam _ _ _     = error "Invalid points in addPointsToTeam"

Now in order to generate all the combinations, we need a function “explode”, that takes a tournament and gives a list of tournaments such that
in each of these we have one possible result.

explode (Tourney _ _ []) = []

explode t = map (conductMatch t) [AWon, BWon, Draw]

Once the “explode” is in place, creating a tournament “tree” which provides all the possible combinations is very simple:

tourneyTree t = Node t (map tourneyTree (explode t))

getLeaves (Node a []) = [a]

getLeaves (Node a xs) = concatMap getLeaves xs

We are left with dealing with the input files and accessing the filter, which is pretty straight forward.

main = do args <- getArgs

          case args of 

            (i1:i2:t:[])   -> do tourney <- readInput i1 i2

                                 let lvs = filter (\x -> hasTeamInTopFour t x) (getLeaves $ tourneyTree tourney)

                                 case lvs of

                                   (hd:rst) -> do print hd

                                   []  -> do print "No chance, dude"

            _        -> do usage

usage = do print "Usage:"

           print "TWOpen InputFile outputFile <Team Name>"

           print "Team Name is one of : Delhi, Mumbai, Chennai, Kolkata, Hyd, Blore, Jaipur, Mohali"

hasTeamInTopFour tName t = tName `elem` (map teamName $ topFour t)

Here, readInput is an IO action that takes a couple of files and returns the tournament snapshot as depicted in the input.

It is trivial as am using the built-in “read” function.

readInput t1 t2 = do inpStr1 <- readFile t1

                     inpStr2 <- readFile t2

                     let tmMap = getteamMap $ getList1 $ lines inpStr1

                     let t2 = getList2 (lines inpStr2)

                     return $ Tourney tmMap [] t2

getteamMap ts = M.fromList $ zip (map teamName ts) ts

getList1 strs = map (getTeam . L.words) strs

getList2 strs = map (getClashes . L.words) strs

getTeam (t1:t2:t3:[]) = Team t1 (read t2) (read t3)

getTeam _             = error "Invalid team"

getClashes (t1:t2:[]) = (t1,t2)

getClashes _          = error " invalid clash"

Running the file with the following Points table (which is the latest not counting the ongoing Chennai Vs Mohali game) and trying out various
teams, we can see that even KKR still has an outside chance of reaching the game:

Delhi  10  0.158

Mumbai 7  0.516

Kolkata  3  -1.097

Chennai  9  1.36

Blore  8  -0.395

Mohali  8  -0.5

Hyd  10   0.152

Jaipur 11  0.05

and the following schedule:

Delhi Mumbai

Hyd Mohali

Chennai Jaipur

Kolkata Delhi

Blore Mumbai

Hyd Jaipur

Blore Kolkata

Mohali Mumbai

Hyd Delhi

Mumbai Jaipur

Chennai Blore

Mohali Delhi

Chennai Mumbai

Hyd Kolkata

Jaipur Delhi

Mohali Hyd

Chennai Kolkata

Delhi Blore

Chennai Mohali

Jaipur Kolkata

Hyd Blore

Mumbai Delhi

and checking out the options for Kolkata, the program runs for about 2 full minutes and presents a scenario under which the beleagured team can reach the semis. Possibly there are others as well!
If the future matches go as below

Match ("Mumbai",2) ("Delhi",0),

Match ("Hyd",2) ("Blore",0),

Match ("Jaipur",0) ("Kolkata",2),

Match ("Chennai",2) ("Mohali",0),

Match ("Delhi",0) ("Blore",2),

Match ("Chennai",0) ("Kolkata",2),

Match ("Mohali",2) ("Hyd",0),

Match ("Jaipur",2) ("Delhi",0),

Match ("Hyd",0) ("Kolkata",2),

Match ("Chennai",2) ("Mumbai",0),

Match ("Mohali",2) ("Delhi",0),

Match ("Chennai",2) ("Blore",0),

Match ("Mumbai",2) ("Jaipur",0),

Match ("Hyd",2) ("Delhi",0),

Match ("Mohali",2) ("Mumbai",0),

Match ("Blore",0) ("Kolkata",2),

Match ("Hyd",2) ("Jaipur",0),

Match ("Blore",2) ("Mumbai",0),

Match ("Kolkata",2) ("Delhi",0),

Match ("Chennai",2) ("Jaipur",0),

Match ("Hyd",2) ("Mohali",0),

Match ("Delhi",2) ("Mumbai",0)

the final points table would look like:

Hyd -> points  = 18, NetRunRate = 1.152

Chennai -> points = 17, NetRunRate = 2.8600001

Mohali -> points = 14, NetRunRate = 0.0 

Kolkata -> points = 13, NetRunRate = 1.403

Jaipur -> points = 13, NetRunRate = -1.45 

Blore -> points = 12, NetRunRate = -0.895

Delhi -> points = 12, NetRunRate = -2.342

Mumbai -> points = 11, NetRunRate = -0.48400003

So you see, The tournament is indeed wide open!

Sudoku Zipper

April 23, 2009

I will post about an interesting idiom in Functional Programming , that I came across recently – Zipper. zipper provides us with
the ability to traverse pure functional programming data structures with surprising agility and efficiency , without losing the purity.

module Main where

import Sudoku.Parser as P

import Sudoku.Data as D

import System.Environment(getArgs)

import Data.Tree

import Data.Tree.Zipper

The imports are similar to the earlier one but with an additional module loaded – the zipper.

There’s just a minor change in the main, we pass the sudoku to zipperSol, which solves the sudoku and prints the solution.

main = do args <- getArgs

          case args of

            (f:[])   -> do x <- parseSudokuFromFile f

                           zipperSol x

            _        -> do usage

usage = do print "Usage:"

           print "STest <file>"

We create the tree the way we did earlier.

sudokuTree s = Node s (map sudokuTree (explore s))

explore s =  case (getUnfilledCell s) of

               Just (c,pos) -> filter validSudoku [ (fillSudoku s pos val) |  val <- getCells c ]

               Nothing      -> []

getCells (Ttv  lst) = map Sld lst

getCells  x =     [x]

Then, pass it on to the fromTree function of Data.Tree.Zipper module to get a zipper. Thus the below function takes a Sudoku and
constructs a zipper out of it.

sudokuZipper s = fromTree $ sudokuTree s

Now the solution is using Depth First Search. Search for nodes in the tree in depth first manner and if a solution is found, print it and otherwise
report inability.

zipperSol s = case (dfs $ sudokuZipper s) of

                Nothing -> do print "No solution found :-( "

                (Just x) ->do print  x

The function to perform depth first search is simple too. Given a treeLocation, we check if it is a leaf. If it is a leaf, we need to handle it differently.
Otherwise, we call dfs on the first child of the tree in a recursive manner.
The different handling of the leaf node is this: see, if it is a solution. If it is, report it. Otherwise, delete this leaf and perform the depth first
search on the new tree formed.
Data.Tree.Zipper.delete function returns a May be TreeLoc. The new location will be the right sibling of the treeLoc that we deleted or, if
there’s no right sibling, the parent (which will become a leaf).
it is a leaf node.

dfs :: TreeLoc Sudoku -> Maybe Sudoku

dfs treeLoc = case firstChild treeLoc of

                 Nothing  -> handleLeafNode treeLoc

                 Just x   -> dfs x

handleLeafNode treeLoc =

    case (solvedLoc treeLoc) of

      Just solution -> Just solution

      Nothing ->  case (delete treeLoc) of

                   Nothing -> Nothing

                   (Just x) -> dfs x

solvedLoc :: TreeLoc Sudoku -> Maybe Sudoku

solvedLoc (Loc (Node x _) _ _ _) = if (hasEmptyCells x) then


                                   else (Just x)

hasEmptyCells s = case (getUnfilledCell s) of

                    Nothing  -> False

                    Just _   -> True

Elegant and cool. This program is slightly slower than my earlier one ( it takes 9 seconds to solve a sudoku that was solved in 8 seconds by the
earlier, tree based one) and the memory consumption is slightly on the higher side as well. However I can choose my search strategy.
I would be able to move from depth first search to breadth first search or best first search without breaking a sweat.

Lazy Evaluation

April 22, 2009

While I understood the idea of lazy evaluation for a while now, today I saw it in action for the first time. I intend to share
what I found in this post. In the process, I hope to make amends for the ridiculous code that I had posted earlier.  Much of the credit for improvement in my haskellfu goes to  Real World Haskell and to Haskell School of Expression.

The problem is a relatively simple one – to solve a sudoku.

1 Preliminaries

module Main where

import Sudoku.Parser as P
import Sudoku.Data as D
import System.Environment(getArgs)
import Data.Tree

Out of the above imports, Sudoku.Parser is an un-important module whose responsibility is to parse a given text file containing
the input sudoku and to create a Sudoku data structure, which is defined in the next module in the list – the Sudoku.Data.
Apart from defining the data , the Data module also exports the functions needed to manipulate the in-memory representation of the sudoku.
These two modules are not really relevant for this post. That is, the current algorithm is “independant” of the way the Sudoku is
represented in memory.

However, in order to give the context, (especially for the later sections where I present the profiling data),
I’ve used an Array of Cells and an integer representing the unit of the Sudoku. So a normal, 9 X 9 , sudoku will be an Array
of (Int,Int) ranging from (0,0) to (80,80) and the unit is 3 to indicate the size of the array. (The same representation can deal
with a 16 X 16 or a 25 X 25 or higher sudoku as well ( although – I could not find test data to try it).
And a cell is defined somewhat like below:

data Cell = Given Int
            | ONEOF [Int]
            | Sld Int

An empty Cell will be represented as a list of the values it can take. Given cell is represented as an Int. In order to perform trial and error,
the cell with a list will be replaced by a Sld n where n is a number taken from ONEOF’s list.

Going back to the imports,
System.Environment is imported for accessing the name of the file from command line arguments.
The next one that was imported – Data.Tree, is a part of the standard Haskell libraries and forms the crux of this algorithm.
It provides a (lazy) tree data type and related functions.

2 Data.Tree

The tree is defined as

data Tree a = Node { rootLabel :: a, subForest :: Forest}

where, Forest is a synonym for a list of Trees.

The brute-force approach to solve the problem is to create a (lazy) tree of sudokus, such that the given tree is at the root. The children of a given node,
(that is, the subForest of the Tree) are generated by taking one of the unfilled cell in the sudoku of the node , and substituting with one of the possible
values that this cell can take. In other words, each child of a node represents a trial for a particular cell in the sudoku.
However, we ensure that the sudokus at each node are “well-formed” sudokus, which follow the rules of the sudoku. Consequently, we have , at the leaves,
either a solved sudoku or a wrong choice that’s been taken.

The creation of such a tree is surprisingly pithy

sudokuTree s = Node s (map sudokuTree (explode s))

explode s =  case (getUnfilledCell s) of
               Just (c,pos) -> filter validSudoku
                          [ (fillSudoku s pos val) |  val <- getCells c ]
               Nothing      -> []

As can be seen from the use of filter, we are ensuring that only valid Sudokus can form the subforest of a node.
And the process is repeated recursively.

A side note on the functions from Sudoku.Data module:

  1. The getUnfilledCell returns Maybe (Cell,(Int,Int)) that, is a cell and its position if there are empty cells , nothing otherwise.
  2. validSudoku is a function which takes a sudoku and returns false if it breaks any of the rules of sudoku , true otherwise.

Since we are interested mainly in the leaves of the tree, we now define

getLeaves (Node a []) = [a]
getLeaves (Node a xs) = concatMap getLeaves xs

which is again a simple straight forward function.

Thats it. The things left are the main function and a couple of helper functions.
The main function has to perform the following actions:

  1. reading the file name from the command line argument
  2. passing the file name to a parser combinator that parses the file and returns a sudoku
  3. creating a tree using this sudoku, and from the leaves of which, filtering out those sudokus that are filled
  4. printing the list of such sudokus.
main = do args <- getArgs
          case args of
            (f:[])   -> do x <- parseSudokuFromFile f
                           print  $ filter
                                  (\x -> not $ hasEmptyCells x)
                                  (getLeaves $ sudokuTree x)
            _        -> do usage

usage = do print "Usage:"
           print "STest <file>"

getCells (Ttv  lst) = map Sld lst
getCells  x =     [x]

hasEmptyCells s = case (getUnfilledCell s) of
                    Nothing  -> False
                    Just _   -> True

3 Running the code

I compiled the code with −O2 option (ghc) and then ran the code with −S option on a few puzzles from WebSudoku.
I here present some interesting data related to one of the puzzle for which it took 19 seconds to solve (the only one that took more than 5
seconds to solve).
The problem is as below (it is an evil category puzzle taken from the same site).


here 0 represents a space. The output was as follows:

:~/src/hascode> time ./SudokuTree Test/Sudoku/test3
"Parsed Sudoku"

real    0m19.607s
user    0m19.589s
sys     0m0.016s

and when I ran it with the −S option of RTS, it contained the following astonishing data:

   5,488,579,640 bytes allocated in the heap
      49,910,436 bytes copied during GC
          45,804 bytes maximum residency (39 sample(s))
          66,900 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 10449 collections,     0 parallel,  0.35s,  0.42s elapsed
  Generation 1:    39 collections,     0 parallel,  0.00s,  0.01s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   17.74s  ( 17.94s elapsed)
  GC    time    0.35s  (  0.42s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   18.09s  ( 18.37s elapsed)

  %GC time       1.9%  (2.3% elapsed)

  Alloc rate    309,370,701 bytes per MUT second

  Productivity  98.0% of total user, 96.6% of total elapsed

The number of bytes allocated seemed staggering! Incidentally, I computed the number of nodes in the tree, using the Data.Tree.flatten function
and for the above problem, it is 68480! That is, 68480 sudokus – each of which has 9 X 9 Cells !
It ofcourse is lazy evaluation (and the garbage collection), in action – At no point in time, the whole tree is in memory. Only the portion of the
tree that is needed is built and the other nodes are not (yet) evaluated. And ofcourse those nodes that are not needed anymore
are garbage collected.
A further proof of the impact of lazy evaluation comes when I modify the program such that it prints the first solution that it finds (that is, we stop
when we get the first solution). I do it by modifying only the main function

main = do args <- getArgs
          case args of
            (f:[])   -> do x <- parseSudokuFromFile f
                           print  $ filter
                                  (\x -> not $ hasEmptyCells x)
                                  (getLeaves $ sudokuTree x)
            _        -> do usage


main = do args <- getArgs
          case args of
            (f:[])   -> do x <- parseSudokuFromFile f
                           print $ head  $ filter
                                  (\x -> not $ hasEmptyCells x)
                                  (getLeaves $ sudokuTree x)
            _        -> do usage

That is, I did not change the way the tree is constructed nor did I interfere in the recursive function (In imperative-speak, I did not add a “break”
statement), I am only printing the first member of the list and ignoring the rest (where present).

And this time, the same sudoku is solved under 8 seconds, which is relatively unsurprising,
but the profiling data (from − S RTS option) provided me with the shock of the day

   2,565,699,092 bytes allocated in the heap
      23,344,784 bytes copied during GC
          40,828 bytes maximum residency (19 sample(s))
          62,928 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  4884 collections,     0 parallel,  0.21s,  0.20s elapsed
  Generation 1:    19 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    8.74s  (  8.84s elapsed)
  GC    time    0.21s  (  0.21s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    8.95s  (  9.05s elapsed)

  %GC time       2.3%  (2.3% elapsed)

  Alloc rate    293,539,944 bytes per MUT second

  Productivity  97.7% of total user, 96.6% of total elapsed

Perhaps it does not need an expert of such matters to say that in the second case the portion of the tree that was evaluated was only about a half of the
earlier.( Going by the figures under the sections related to GC like “bytes copied during GC” and the number of “collections”).

Although what I said here may not be new for most haskellers, it still gives me goose bumps thinking of the way evaluation must have happened and how
different it is from what I am used to.

Snowflake fractal

May 16, 2008

In this article, I will write a haskell program that displays a snowflake haskell. This is given as a problem in one of the best books on Haskell – The Haskell School of Expression, by Proffessor Paul Hudak. One can see a sample snowflake here on the math forum.
I am using a haskell package called gtk2hs ,which contains a special module needed for the code of that book.

module Snowflake where

import Graphics.SOE.Gtk

1 Setting the stage

I will start with a “command” (haskell expressions/functions that yield actions when they are applied) that is given in the book.
This command takes a window as an input and waits for the user to press the space bar. Once it is pressed, it will close the window.
This is used in a very elegant piece of code that displays Sierpinski’s triangle

spaceClose :: Window -> IO ()
spaceClose w
    = do k<- getKey w
         if k == ' ' then closeWindow w
                     else spaceClose w

Now a function that limits the size of the smallest ’star’.An example for how to define global constants in haskell

minSize :: Int
minSize = 8

2 Recursive star

The idea is to draw a Star of David – by overlapping two equilateral triangles such that
their circum-centers coincide. Now, in this Graphics module that I am using, one can draw a polygon by invoking a
polygon command with a list of x-y co-ordinates.
But due to some sort of quirkiness with the definition of polygon this module, it is necessary that the last element in the list is the same as the first pair of co-ordinates.
Hence, in order to draw a triangle, I will have to invoke the
polygon command using a list containing 4 co-ordinates.
So, I will define my data type as a record containing 2 sets of lists (one for each triangle) and also a size.

data StarOfDavid = SoD {trA :: [(Int,Int)],trB ::[(Int,Int)],szSoD::Int}

That done, I now need a function which takes a pair of x,y co-ordinates and the size (that is the radius of the circum-circle) and give me
This function is based on elementary trignometry and is pretty straight forward.
the StarOfDavid

getVertices :: (Int,Int) ->Int -> StarOfDavid
getVertices (x,y) r = SoD {trA = [(x,y-r), (x - (mulCos30 r), y + r `div` 2),(x + (mulCos30 r), y + r `div` 2 ),(x,y-r)],
                           trB = [(x,y+r),(x - (mulCos30 r), y - r `div` 2),(x + (mulCos30 r), y - r `div` 2),(x,y+r)],
                           szSoD = r}

mulCos30 ::Int -> Int
mulCos30 x = round ( (fromIntegral x) * 0.866 )

Now I need a command that takes a Window and a record corresponding to StarOfDavid, and draw it on the window, if and only if, the size is greater than the minSize defined earlier. Also, since the fractal can be drawn by drawing six stars at one-third the size at each of the six ‘corners’ of the Star and repeating the process till we reach the minimum size.
To implement all that in haskell is simple and elegant as below.

drawStarOfDavid :: Window -> StarOfDavid -> Color -> IO()
drawStarOfDavid w sod c =
 if (szSoD sod) <= minSize then return ()
 else do drawInWindow w (withColor c (polygon (trA sod)))
         drawInWindow w (withColor c (polygon (trB sod)))
         sequence_ (map (\x-> drawStarOfDavid w x c) (nextStarCenters sod))

nextCenters :: StarOfDavid -> [(Int,Int)]
nextCenters dos = (take 3 (trA dos) ) ++ (take 3 (trB dos))

nextStarCenters :: StarOfDavid -> [StarOfDavid]
nextStarCenters dos = map (\x-> getVertices x newsiz ) (nextCenters dos)
                        where newsiz = div (szSoD dos) 3

3 The main method

Finally, the main method, a command that invokes the drawStarOfDavid command.

 = runGraphics (
     do w <- openWindow
            "Snowflake Fractal" (400,400)
        drawStarOfDavid w (getVertices (250,250) 150)  Cyan
        spaceClose w

That’s it. Of course the picture that is generated is mono-chromatic and not as colorful as the math-forum one that I linked earlier.
(Also, the site has been generated using latex and hevea).
Snowflake Fractal

Monadic combinators for Sudoku

May 11, 2008

In this blog, I try to get a few monadic combinators to solve Sudokus and see if they can get any better than what I got earlier.  Again, the focus is very narrow – only to understand monads.

I am too lazy to start afresh. , along with the array module, I will also import the Sudoku module that I defined earlier.
And of course I will have to include the Monad module.

module MonSudoku where

import Data.Array
import Sudoku
import Control.Monad

Now I will define a new data type called State to hold the array and the current position.
But I will define it as a generic record that can hold any pair of data structure and position combination. Ideally, I should also
store the access method (for example the ‘!’ operator in case of an array, and a ‘!!’ in case of list). But I will pass it one for
now. The focus of the blog is very narrow – to be able to implement a non-trivial instance of monad.
Coming back to the data types, I will define a new type called GenericSolver using the newtype construction.
This ‘newtype’ defines a new container for all the functions that take a given state and returns a list of triplets.
We will see later, what the triplet is composed of.
Also, I will define a ‘type’ called “Solver a” that has Sudoku as the data structure and (row,col) as the position.

type Solver a = GenSolver Sudoku (Int,Int) a

newtype GenSolver ds pos a = Solver ( State ds pos -> [(a,pos,ds)] )

data State ds pos = State {
       dataStruct :: ds,
       thePosition :: pos

1 The first monad

Now I will define two functions – solveReturn and SolveBind so as to make GenSolver ds pos an instance of Monad.
The very purpose of this blog (and ,in fact, for this approach for solving Sudokus) is creation of an instance of Monad.

solveReturn  :: a -> GenSolver ds pos a
solveReturn x
    = Solver (\state ->[(x,(thePosition state),(dataStruct state))] )

solveBind :: GenSolver ds pos a -> (a -> GenSolver ds pos b) -> GenSolver ds pos b
solveBind (Solver s) f
   = Solver (\state ->
      concat ( solveBind' (s state) f )

solveBind' :: [(a,pos,ds)] ->(a -> GenSolver ds pos b)->[[(b,pos,ds)]]
solveBind' []   f            = []
solveBind' ((a,pos,ds):xs) f  =  ((f2 (State ds pos)):(solveBind' xs f))
                                where Solver f2 = f a

Now, I can proudly define the GenSolver as an instance of Monad using:

instance Monad (GenSolver ds pos) where
  return x   = solveReturn x
  p >>= f    = solveBind p f

Now that I have my monad, I will be able to use the do notation.

2 Primitive solvers

A few primitive functions first.

2.1 Item solver

Item solver takes a Sudoku and a position and gives a new sudoku such that :

  1. if the cell at the position is filled, then returns the same Sudoku with an empty list
  2. if the cell is not filled, it provides a list of all the possible values that cell can take

As of now, I am not sure if there can be a solver which is more primitive than this – that which I can use to define this.
But that is beside the point.

item :: Solver [Int]
item = Solver ( \state ->
                           sudoku = dataStruct state
                           pos    = thePosition state
                      case ( sudoku ! pos) of
                            FilledCell v -> [([v],pos, sudoku)]
                            EmptyCell (x:xs) -> [((x:xs),pos,sudoku)]
                            EmptyCell []  -> []

2.2 Propogator

I will need a function that can be bound to the item solver. That is, a function of type

 [Int]->Solver ()

To help me out in that I will first write a function that takes an Int first

prop1 :: Int->Solver ()
prop1 x = Solver (\state ->
                               sudoku = dataStruct state
                               pos    = thePosition state
                               sudoku' = sudoku // [(pos,FilledCell x)]

propogate :: [Int] ->Solver()
propogate  []      = Solver (\state -> [])
propogate  [h]     = prop1 h
propogate  (x:xs)  = Solver (\state->
                              Solver f1 = prop1 x
                              Solver frest = propogate xs
                              (f1 state)  ++ (frest state)

There now, I have most things that I need.
Firstly, I have an “item” function that given a Sudoku and an unfilled position, gives me a list of values that position can take.
Secondly, I have a function ‘propogate’ that can take a list of values for a Sudoku-position pair and gives me a list
of Sudokus such that each of them have the same same position updated with one of the values of the list.
Then, one of the sudokus from this list should lead me to the solution.
Thirdly, by virtue of the fact that the “Solver” is an instance of Monad, I should be able to use the “do” notation.

3 New Algorithm

It is now easy to imagine a small enhancement to the naive algorithm for solving sudokus:

  1. Apply the naive algorithm on a given Sudoku.
  2. Check if the Sudoku is filled. If filled, that is the answer. Stop
  3. Find the position of the cell containing only two possible values.
  4. If there is no such a position, return an empty list. Either it is a wrong “thread” or I am up against some devilishly evil puzzle
    that I cannot solve.
  5. Apply the item function on the Sudoku at the position gives me a list of possible values for that position.
  6. Using propogate, we get two sudokus, each of which takes one of the possible values at that position.
  7. Take one of these sudokus and go to step 3. If it returns a Sudoku, then it is the answer. Stop.
    If it returns an empty list, then try the next one.
  8. If I have no more sudokus to try, raise your hands

3.1 A few more primitives

For the sake of simplicity, let me call the cell that can have only two possible values in an unfilled Sudoku as a “forker”.
I now need a function to get a forker. I will also write a small wrapper over loopTillDone that applies the naive algorithm as
best as possible.

getNextForker :: Sudoku->(Int,Int) -> (Int,Int)
getNextForker _ (9,0) = error "NoSuchCell"
getNextForker sudoku (row,col) = let
                                      nextCol = mod (col + 1) 9
                                      nextRow = if (nextCol == 0 ) then row + 1 else row
                                    case (sudoku ! (row,col) ) of
                                      FilledCell v -> getNextForker sudoku (nextRow,nextCol)
                                      EmptyCell l  -> if ( (length l) == 2) then (row,col)
                                                      else getNextForker sudoku (nextRow,nextCol)

naiveAlgo :: Sudoku -> (Bool,Sudoku)
naiveAlgo sudoku = let
                    sudoku' = loopTillDone sudoku
                       ( not (hasEmptyCells sudoku' (0,0)),sudoku')

naSolves :: a -> Solver Bool
naSolves    x         = Solver (\state ->
                                     sudoku = dataStruct state
                                     pos    = thePosition state
                                     (yOrN,sudoku') = naiveAlgo sudoku
                                   [(yOrN , pos, sudoku')]

resetPos :: a -> Solver ()
resetPos x = Solver (\state ->
                            sudoku = dataStruct state
                            pos = getNextForker sudoku (0,0)
                            [( (),pos,sudoku)]

The function naSolves and resetPos are Solvers that ignore the input. naSolves checks if any of the Sudoku has already been solved.
resetPos updates the state so that the position is pointing to the next forker.

4 Combinators

I am now all set to write a few combinators.
Let me imaginatively name the combinators as oneRound, twoRounds.threeRounds and fourRounds
As their corny names suggest, they apply the enhanced algorithm once, twice ,thrice and four times respectively.
They are as below:

oneRound  :: Solver Bool
oneRound              = do{
                             x <- item;
                             y <- propogate x;
                             v <- naSolves y;
                             return v

twoRounds  :: Solver Bool
twoRounds             = do{
                             v <- oneRound;
                             if v then
                                 return v
                                  resetPos v;
                                  v' <- oneRound;
                                  return v'

threeRounds ::Solver ()
threeRounds            = do{
                             v <- oneRound;
                             if v then
                                 return ()
                                  resetPos v;
                                  return ();

fourRounds :: Solver Bool
fourRounds         = do{
                             v <- twoRounds;
                             if v then
                                 return v
                                  resetPos v;
                                  v' <-twoRounds;
                                  return v';

eightRounds :: Solver Bool
eightRounds         = do{
                             v <- fourRounds;
                             if v then
                                 return v
                                  resetPos v;
                                  v' <-fourRounds;
                                  return v';

5 Testing them out

Now a function that takes the input as a list (list of list of Ints) and runs one of the combinators and returns the Sudoku

5.1 To apply the combinators

scanListForFilled :: [(a,b,Sudoku)]-> Int -> Int
scanListForFilled [] pos = pos
scanListForFilled ((_,_,sudoku):xs) pos = if (hasEmptyCells sudoku (0,0))  then scanListForFilled xs (pos + 1)
                                          else  pos

runThreeRound :: [Int] -> IO()
runThreeRound list = let
                        sudoku1             = fromList list (0,0) getInitSudoku
                        (p,sudNaive)        = naiveAlgo sudoku1
                        Solver solverFun    = threeRounds
                     if p then
                         do {
                             print sudNaive;
                             return ()
                         x = solverFun (State sudNaive (getNextForker sudNaive (0::Int,0::Int)) )
                         pos = scanListForFilled x (0)
                         (_,_,endSud) = x !! pos
                            print $ showSudoku endSud;
                            print endSud;
                            return ()

runEightRound :: [Int] -> IO()
runEightRound list = let
                        sudoku1             = fromList list (0,0) getInitSudoku
                        (p,sudNaive)        = naiveAlgo sudoku1
                        Solver solverFun    = eightRounds
                     if p then
                         do {
                             print sudNaive;
                             return ()
                         x = solverFun (State sudNaive (getNextForker sudNaive (0::Int,0::Int)) )
                         pos = scanListForFilled x (0)
                         (_,_,endSud) = x !! pos
                            print $ showSudoku endSud;
                            print endSud;
                            return ()

runFourRound :: [Int] -> IO()
runFourRound list = let
                        sudoku1             = fromList list (0,0) getInitSudoku
                        (p,sudNaive)        = naiveAlgo sudoku1
                        Solver solverFun    = fourRounds
                     if p then
                         do {
                             print sudNaive;
                             return ()
                         x = solverFun (State sudNaive (getNextForker sudNaive (0::Int,0::Int)) )
                         pos = scanListForFilled x (0)
                         (_,_,endSud) = x !! pos
                            print $ showSudoku endSud;
                            print endSud;
                            return ()

5.2 Testdata

A few sample puzzles from

hard1 ::[[Int]]
hard1 = [[0, 2, 9, 0, 7, 0, 0, 0, 6],
         [0, 0, 0, 0, 0, 1, 9, 8, 0],
         [0, 0, 0, 9, 0, 8, 0, 0, 0],
         [0, 0, 0, 5, 0, 0, 0, 7, 4],
         [5, 0, 0, 0, 0, 0, 0, 0, 1],
         [4, 7, 0, 0, 0, 2, 0, 0, 0],
         [0, 0, 0, 3, 0, 9, 0, 0, 0],
         [0, 3, 5, 1, 0, 0, 0, 0, 0],
         [6, 0, 0, 0, 4, 0, 5, 2, 0]]

hard2 :: [[Int]]
hard2 = [[0,4,8,0,0,6,0,0,0],

evil1 ::[[Int]]
evil1 = [[0, 0, 0, 0, 9, 4, 0, 8, 3],
         [0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 8, 6, 7, 0, 0, 0, 2, 0],
         [1, 3, 2, 0, 0, 0, 6, 0, 0],
         [0, 0, 9, 0, 0, 0, 3, 0, 0],
         [0, 0, 4, 0, 0, 0, 9, 5, 8],
         [0, 4, 0, 0, 0, 2, 5, 3, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0],
         [5, 9, 0, 8, 4, 0, 0, 0, 0]]

evil2 :: [[Int]]
evil2 = [[0,0,4,0,3,0,0,8,0],

It did manage to give a correct answer for evil1 above and also the evil puzzle that I used in the earlier blog.
However It gave a wrong answer (containing two 4s in a row!) and it thinks it solves the problem. I haven’t investigated it completely.
Am more interested in understanding the monads than solving Sudokus.

Simple Sudoku Solver

May 9, 2008

In this post and the next, I will document my attempts in writing a sudoku solver in haskell. In the current post, I will present a native implementation..

1 Problem statement

Sudoku needs no introduction. A sample is given below.

C0 C1 C2 C3 C4 C5 C6 C7 C8
r0 5 4 7 3 9
r1 7 6 9 5
r2 7 6 8
r3 8 7 3
r4 6 7 1 2 8
r5 3 2 1
r6 6 9 3
r7 8 1 4 7
r8 2 1 4 6 8

1.1 constraints

The idea is to fill the empty spaces with digits 1 to 9 so that the grid satisfies the following conditions:

  1. each row contains exactly one occurence of each of the digits from 1 to 9.
  2. each column contains exactly one occurence of each of the digits from 1 to9.
  3. each block contains exactly one occurence of each of the digits from 1 to 9, where block is a 3X3 grid. An example block
    is the one at the top left corner containing the following cells – (0,0),(0,1),(0,2),(1,0),(1,1),
    There are nine such blocks. The ninth block comprising (6,6),(6,7),(6,8),
    (7,6),(7,7),(7,8),(8,6),(8,7) and (8,8).

2 Solution approach

2.1 Cell

The basic unit here is a cell. So let us consider a data-type Cell defined as below:

data Cell =  FilledCell Int
           | EmptyCell [Int]

where, the empty cell represents a list of possible values that cell can hold. Initially, the list of possible values is the list
[1,2,3,4,5,6,7,8,9] and when it becomes a single-celled list, the cell moves into a filledCell.

A two-dimensional array (or a list of lists) is an almost obvious choice as the data structure for representing the whole grid at various stages of the solution.

So, the type for Sudoku can be:

 type Sudoku = Array (Int,Int) Cell

We will be given with an initial Sudoku containing a lots of EmptyCells and a few FilledCells.

2.2 Algorithm

A naive algorithm for solving the Sudoku can be described as:

  1. check the type of a cell.
  2. if it is a filled cell, move to the next one
  3. if the cell is an empty cell with a unary list, convert it into a FilledCell and move to the next one.
  4. if the cell has more than one possible values,get the values from all the related cells.
    Narrow the scope of the cell by removing all these values from the list of the cell. And move on to the next cell.
  5. repeat the step for all the cells.

3 Implementation

Starting off with a few lines about module and data definitions,

3.1 Data types and modules

module Sudoku where

import Data.Array

data Cell =  FilledCell Int
           | EmptyCell [Int]
           deriving Show

type Sudoku = Array (Int,Int) Cell

3.2 Input

Let’s say that we get the input in the form of a list of lists. An example is given below:

puzzle ::[[Int]]
puzzle =[[0, 0, 5, 0, 4, 0, 7, 3, 9],
         [0, 7, 6, 0, 0, 9, 5, 0, 0],
         [0, 0, 0, 0, 7, 0, 6, 8, 0],
         [0, 8, 0, 0, 0, 7, 3, 0, 0],
         [6, 0, 7, 0, 1, 0, 2, 0, 8],
         [0, 0, 3, 2, 0, 0, 0, 1, 0],
         [0, 6, 9, 0, 3, 0, 0, 0, 0],
         [0, 0, 8, 1, 0, 0, 4, 7, 0],
         [2, 1, 4, 0, 6, 0, 8, 0, 0]] 

evil ::[[Int]]
evil = [[0, 0, 0, 0, 0, 0, 4, 0, 0],
        [1, 0, 8, 3, 0, 0, 2, 0, 5],
        [0, 6, 0, 0, 0, 9, 0, 0, 3],
        [0, 0, 0, 7, 0, 5, 0, 8, 0],
        [0, 0, 9, 0, 0, 0, 7, 0, 0],
        [0, 7, 0, 8, 0, 1, 0, 0, 0],
        [9, 0, 0, 4, 0, 0, 0, 2, 0],
        [5, 0, 1, 0, 0, 3, 8, 0, 7],
        [0, 0, 4, 0, 0, 0, 0, 0, 0]]

hard :: [[Int]]
hard = [[0,0,7,0,2,0,0,1,0],

medium :: [[Int]]
medium = [[0, 0, 0, 2, 0, 0, 7, 0, 0],
          [2, 5, 7, 0, 0, 0, 4, 6, 0],
          [0, 6, 0, 7, 0, 9, 0, 0, 0],
          [0, 2, 0, 3, 0, 7, 6, 0, 0],
          [5, 0, 8, 0, 0, 0, 3, 0, 4],
          [0, 0, 6, 4, 0, 1, 0, 5, 0],
          [0, 0, 0, 9, 0, 8, 0, 3, 0],
          [0, 3, 9, 0, 0, 0, 8, 4, 7],
          [0, 0, 2, 0, 0, 4, 0, 0, 0]]

I need a function to convert this into a Sudoku.

But before that I need to construct a basic Grid containing all empty cells

getInitSudoku :: Sudoku
getInitSudoku = array ((0,0),(8,8)) ([((i,j),EmptyCell [1..9]) | i <- [0..8], j<-[0..8] ])

Now a function that takes a single dimensional list of Integers (easily obtained from something like puzzle above using a foldr and a ++ ).

fromList :: [Int]->(Int,Int)->Sudoku->Sudoku
fromList [] _ sudoku = sudoku
fromList (x:xs) (row,col) sudoku = let
                                    nextCol = mod (col + 1) 9
                                    nextRow = if (nextCol == 0 ) then row +1 else row
                                  in if x == 0  then fromList xs (nextRow,nextCol) sudoku
                                      else fromList xs (nextRow,nextCol) (sudoku // [((row,col),(FilledCell x))])

Here,I will define a function that takes two lists A and B as input and returns a list containing only those elements of A that are not there in B.
For example, if the input is [1,8,7,6,3] and [9,8,3] then the output should be [1,7,6] (8 and 3 being members in second list too.)
Since the lists that this function will be dealing with will have lengths of about 9 or 10, I will put a blind eye on
the fact that elem is of O(n) and is slow for large values of n. I am defining the function here as an operator.
So, here goes the function:

(#) :: Eq a =>[a] ->[a]->[a]
(#) [] b = []
(#) a [] = a
(#) (ah:atail) b = if ( ah `elem` b) then ((#) atail b)
       else (ah: ((#) atail b))

3.3 Identifying related cells

Now I’ll write a function to identify the related cells. That is given a pair of coordinates, I will get a list of (Int,Int)s containing
the indices of all the relevant cells.
For example, if my input is (0,0) the expected output is
[(0,1),(0,2),(0,3)…(0..8),(1,0),(2,0),(3,0) …(8,0),(0,0),(0,1),(0,2),(1,0),(1,1),

Let me divide into three simpler steps :

  1. A function that takes the row-column pair and gives the list of all indices in the current row.
  2. A function that takes the row-column pair and gives the list of all indices in the current row
  3. A function that takes the row-column pair and gives the list of all indices in the current block

Functions for the first two are beautiful in a lazy functional language like haskell.

interestedRow :: (Int,Int) -> [(Int,Int)]
interestedRow (row,col) = zip [row,row..row] [0..8]

interestedColumn :: (Int,Int) -> [(Int,Int)]
interestedColumn (row,col) = zip [0..8] [col,col..col]

The third function is a bit agricultural compared to the above two. The idea here is to identify the “block-indices”.
In other words, there are nine blocks in a 3X3 matrix. Their indices are (0,0),(0,1),(0,2) …(2,1),(2,2).
If we know the block indices, the list is straight forward – (3*br,3*bc), (3*br + 1,3*bc) … (3*br +2,3*bc+2)

interestedBlock :: (Int,Int) -> [(Int,Int)]
interestedBlock (row,col) = let
                               br = 3 * (  row `div` 3 )
                               bc = 3 *( col `div` 3 )
                            in [(br,bc),(br,bc+1),(br,bc+2),

With the above three functions in place, the function which gives all the relevant indices is straight forward.
We concatenate the lists from above functions and remove the current index.

relevantIndices = \x -> ( ( (interestedRow x) ++ (interestedColumn x) ++ (interestedBlock x) ) # [x] )

Now it is time to define a function that takes the Sudoku and a list of indices and returns a list of non-zero values at the indices

relatedVals :: Sudoku->[(Int,Int)]->[Int]
relatedVals sudoku []      = []
relatedVals sudoku (x:xs)  = case (sudoku ! x ) of
                                  (FilledCell v) -> ( v : (relatedVals sudoku xs) )
                                  (EmptyCell l)  -> (relatedVals sudoku xs)

4 Reduction

In this section, I’ll define the functions that try to reduce the scope of the empty cells.

4.1 At the cellular level

The first reduction function, simpleReduction operates at the level of a cell.
It takes the Sudoku and an index and returns a new Sudoku such that the cell at the index is updated either with a FilledCell or an emptyCell with reduced scope.

simpleReduction :: Sudoku ->(Int,Int) -> Sudoku
simpleReduction sudoku (row,col) = case (sudoku ! (row,col)) of
                                      FilledCell v       -> sudoku
                                      EmptyCell (solo:[])-> sudoku // [( (row,col),FilledCell solo)]
                                      EmptyCell list     -> sudoku // (getNewList sudoku (row,col) list)

getNewList :: Sudoku -> (Int,Int)-> [Int] -> [((Int,Int),Cell)]
getNewList sudoku (row,col) lst =  let
                                     neededIndices  = relevantIndices (row,col)
                                     otherValues    = relatedVals sudoku neededIndices
                                     newlst         = lst # otherValues
                                   in  case newlst of
                                          (h:[]) -> [((row,col),(FilledCell h))]
                                          (x:xs) -> [((row,col),(EmptyCell (x:xs)))]

4.2 A recursive function on the whole

Now I need a function that applies the function, simpleReduction on all the cells.

reduceSudoku1 sudoku (9,0)    = sudoku
reduceSudoku1 sudoku (row,col)= let
                                       nextCol = mod (col + 1) 9
                                       nextRow = if (nextCol == 0 ) then row + 1 else row
                                      reduceSudoku1 (simpleReduction sudoku (row,col)) (nextRow,nextCol)

Now a function to check if there are any empty cells still left in the grid.

hasEmptyCells sudoku (9,0)     = False
hasEmptyCells sudoku (row,col) = let
                                       nextCol = mod (col + 1) 9
                                       nextRow = if (nextCol == 0 ) then row + 1 else row
                                      case (sudoku ! (row,col) ) of
                                        FilledCell v -> hasEmptyCells sudoku (nextRow,nextCol)
                                        EmptyCell l  -> True

numEmptyCells :: Sudoku ->(Int,Int)->Int->Int
numEmptyCells sudoku (9,0) count    = count
numEmptyCells sudoku (row,col)count = let
                                         nextCol = mod (col + 1) 9
                                         nextRow = if (nextCol == 0 ) then row + 1 else row
                                       case (sudoku ! (row,col) ) of
                                          FilledCell v -> numEmptyCells sudoku (nextRow,nextCol) count
                                          EmptyCell l  -> numEmptyCells sudoku (nextRow,nextCol) (count + 1)

That leaves me with a function that applies the reduceSudoku1 function, a few times, say 10 times.
It breaks, if it encounters a completely filled sudoku before.

loop10Max sudoku count
         | (count < 10) &&  (hasEmptyCells sudoku (0,0))  = loop10Max (reduceSudoku1 sudoku (0,0)) (count + 1)
         | otherwise                                      = (sudoku,count)

A better function would be something that would apply reduceSudoku1 till it is useful.
Such a function will continue applying the naive algorithm as long as there is some improvement of applying the algo.The moment it finds no improvement, it breaks.

loopTillDone :: Sudoku -> Sudoku
loopTillDone sudoku = loopTillDoneHelper sudoku (numEmptyCells sudoku (0,0) 0)

loopTillDoneHelper :: Sudoku->Int -> Sudoku
loopTillDoneHelper sudoku count = let
                                     nextRound = reduceSudoku1 sudoku (0,0)
                                     countNext = numEmptyCells nextRound (0,0) 0
                                    if (countNext < count) then loopTillDoneHelper nextRound countNext
                                    else nextRound

5 Trying it out

Now I can try the stuff out on some sample Sudokus. I will load the current file (Sudoku.lhs) in ghci and run the functions using:

let list = foldr (++) []  puzzle
let sudoku1 = fromList list (0,0) getInitSudoku
let (s,c) = loop10Max sudoku1 0

Just typing ‘s’ on the command prompt will print out the solved Sudoku. However, all is not well. Trying the problem on harder puzzles
(using ‘evil’ or ‘hard’ in place of ‘puzzle’ in the above), I can see that the naive algorithm I used so far is pretty inadequate for
complicated puzzles. We have been filling only those cells that can be filled because of data in related cells. This does not know how to make a guess and see if it is correct and backtrack if the guess happens to be a wrong one.
In the next chapter, I’ll see how I can rectify this.

PS: This page has been written in literate haskell and has been converted into html using hevea

First steps into haskell Type Classes

May 4, 2008

A Generic Z Algorithm

As mentioned earlier, the biggest problem with the ZAlgo funtion defined in the previous blog is
that it works for only a fixed types of inputs. It takes a String and produces a ByteString (apart from ZAlgoDS). And that is a gross injustice to the algorithm which works for just about any sequences.
The algorithm works for sequences of all types – sequences of integers, characters, even sequences of Strings or any ADT for that matter.
In this chapter, we will make the ZAlgo function handle all types of data.

In this blog, I will make the ZAlgo function, generic enough to handle other types of sequences also.
That is, the new module which I will define now can be used for processing any kinds of sequences.

I’ll continue to use the Data.Array for holding the Z Array(and not consider any generic-ness here), for the sake of simplicity although there are other options available in Standard libraries of the haskell to hold similar structure with perhaps better performance.

module GenZ where

import Data.Array

import Data.ByteString.Char8 as B

The second import(Data.ByteString.Char8 ) is used only for the instance declaration (in the usage section). It is not used in the main algo

type Zs = Array Int Int

data ZAlgoDS = ZAlgoDS{  zeds ::Zs, rSoFar ::Int, lSoFar::Int} deriving Show

Now we need a class for which the functions can be defined. A class in haskell is a collection of related types.

Class Declaration

class MyString m  where
           zEqAtIndex  :: (m,Int)->(m,Int)->Bool
           zLength     :: m -> Int

This is the type we will use while defining the functions to create a generic algorithm.

Comparison of the sequences

compareStr :: (MyString m)=> (m,Int)->(m,Int)->Int->Int
compareStr (s1,pos1) (s2,pos2) soFar
            |  (pos1 < zLength  s1) || (pos2 < zLength s2) =  let
                                                                      compRes = zEqAtIndex (s1,pos1) (s2,pos2)
                                                                      if compRes    then
                                                   compareStr (s1,(pos1 + 1)) (s2, (pos2 + 1)) (soFar + 1)
                                                                      else soFar

            |  otherwise                                        = soFar

As can be seen above, we barely made two changes :

  1. We used zEqAtIndex to compare the items at respective positions
  2. we are using zLength instead of B.length


Moving on, getZAt method can be modified in similar fashion

getZAt :: (MyString m)=> m -> Int ->Int
getZAt str 0 =  zLength str
getZAt str pos
         |  (pos >0) && (pos < zLength str) = compareStr (str,0) (str,pos) 0
         |  otherwise = error Ïndex out of range"

Initializing the z array

There need not be any change in this function.

getInitZArray :: Int -> Zs
getInitZArray lngth = array (0,lngth) ( (0,(lngth+1)) : [(i,0)  -  i <- [1..lngth]])

zAlgo and its helper functions

In zAlgo and its helper functions, the only change we need to do is using the zLength instead of the library function

zAlgo str (ZAlgoDSzeds = zds,rSoFar=r,lSoFar = l)  pos
	 -  (pos >= zLength str) = ZAlgoDSzeds = zds, rSoFar =r , lSoFar =l
	 -  (pos >= r ) = zAlgo str (helper1 str pos zds r l) (pos +1)
	 -  otherwise = zAlgo str (helper2 str pos zds r l) (pos + 1)

Both the first of the helper functions

helper1 str pos zs r l = let
			   zpos = getZAt str pos
                           r' = pos + zpos - 1
                           if (zpos == 0 ) then ZAlgoDSzeds = zs,rSoFar =r,lSoFar =l
	                   else ZAlgoDS zeds = (zs // [(pos,zpos)]), rSoFar = r',lSoFar = pos

helper2 str pos zs r l = let
			   k' = pos - l
                           zk' = zs ! k'
                           beta = r - pos +1
                           if ( zk' <= beta ) then ZAlgoDSzeds = (zs//[(pos,zk')]),rSoFar =r,lSoFar =l
				q  = compareStr (str,beta) (str,r) 0
                                r' = r + q
                                l' = pos
	                        zpos = r' - l' + 1
                               in ZAlgoDSzeds = (zs //[(pos,zpos)]),rSoFar =r',lSoFar=l'

remain largely unaltered.

That completes generalizing the algorithm, so that it can process any sequence of characters.Now comes the complicated part.


I’llo use the same functions above for two different types of sequences.

I’ll first define a couple of dummy functions to compare at indices – one for Binary Strings and one for Lists-Of-Strings

compAtIndices :: (B.ByteString,Int)->(B.ByteString,Int) -> Bool
compAtIndices (s1,pos1) (s2,pos2) = (B.index s1 pos1) == (B.index s2 pos2)

data LoS = LoS [String] deriving (Show)

compStrAtIndices :: (LoS,Int)->(LoS,Int)->Bool
compStrAtIndices (LoS sList1,pos1) (LoS sList2,pos2) = ((sList1 !! pos1) == (sList2 !! pos2))

getLength :: LoS -> Int
getLength (LoS slist) = Prelude.length slist

Now a couple of instances

instance MyString  B.ByteString  where
      zEqAtIndex    = compAtIndices
      zLength       = B.length

instance MyString  LoS  where
      zEqAtIndex    = compStrAtIndices
      zLength       = getLength

Wrapper Function

Now, the wrapper functions. One for ByteString and one for LoS

zMain :: String -> (B.ByteString,ZAlgoDS)
zMain str = let
              byteString  = B.pack str
              zinit       = getInitZArray ((B.length byteString) - 1)
              zDataStruct = ZAlgoDS zeds = zinit,rSoFar = 0,lSoFar = 0
              (byteString,(zAlgo byteString zDataStruct 1))

zMain' :: [String] -> (LoS,ZAlgoDS)
zMain' strList = let
                   zinit       = getInitZArray ((Prelude.length strList) - 1)
                   zDataStruct = ZAlgoDS zeds = zinit,rSoFar = 0,lSoFar = 0
                     (LoS strList,(zAlgo (LoS strList) zDataStruct 1))

Now, we are done.
we can use zMain to process strings and zMain’ to process lists of strings.

In general, we can thus use the same algorithm to process a linear list/sequence of any data types. The OOPs equivalent of this is , perhaps the strategy design pattern.

Z Algorithm in Haskell

April 29, 2008

I tried to implement, in haskell, the Z Algorithm given in Dan Gusfield’s Algorithms on Strings,trees and Sequences.

It is not the prettiest, fastest and definitely not the most elegant implementation of this algorithm in haskell,but it sure taught me a trick or two. In this post, I’ll limit myself with a naive-implementation of the algorithm. And in later blogs, I will attempt at improving the same. Here’s the code – warts and all.

module ZAlgo where

import Data.ByteString.Char8 as B
import Data.Array

type Zs = Array Int Int

data ZAlgoDS = ZAlgoDS { zeds ::Zs, rSoFar ::Int, lSoFar::Int} deriving Show

compareStr :: (B.ByteString,Int)->(B.ByteString,Int)->Int ->Int
compareStr (s1,pos1) (s2,pos2) soFar
           | (pos1 < B.length  s1) &&  (pos2 < B.length s2) =  let
                                                                  c1 = B.index s1 pos1
                                                                  c2 = B.index s2 pos2
                                                                  if(c1 == c2)    then
                                      compareStr (s1,(pos1 + 1)) (s2, (pos2 + 1)) (soFar + 1)
                                                                  else soFar

           | otherwise                                        = soFar

getZAt ::B.ByteString -> Int-> Int
getZAt str 0 =  B.length str
getZAt str pos
        | (pos >0) && (pos < B.length str) = compareStr (str,0) (str,pos) 0
        | otherwise = error "Index out of range"

getInitZArray :: Int -> Zs
getInitZArray lngth = array (0,lngth) ( (0,(lngth+1)) : [(i,0) | i <- [1..lngth]])

zAlgo :: B.ByteString -> ZAlgoDS -> Int -> ZAlgoDS
zAlgo str (ZAlgoDS{zeds = zds,rSoFar=r,lSoFar = l})  pos
	| (pos >= B.length str) = ZAlgoDS{zeds = zds, rSoFar =r , lSoFar =l}
	| (pos >= r ) = zAlgo str (helper1 str pos zds r l) (pos +1)
	| otherwise = zAlgo str (helper2 str pos zds r l) (pos + 1)

helper1 :: B.ByteString -> Int -> Zs -> Int ->Int -> ZAlgoDS
helper1 str pos zs r l = let
			   zpos = getZAt str pos
                           r' = pos + zpos - 1
                           if (zpos == 0 ) then ZAlgoDS{zeds = zs,rSoFar =r,lSoFar =l}
	                   else ZAlgoDS{ zeds = (zs // [(pos,zpos)]), rSoFar = r',lSoFar = pos}

helper2 :: B.ByteString -> Int -> Zs -> Int ->Int -> ZAlgoDS
helper2 str pos zs r l = let
			   k' = pos - l
                           zk' = zs ! k'
                           beta = r - pos +1
                           if ( zk' <= beta ) then ZAlgoDS{zeds = (zs//[(pos,zk')]),rSoFar =r,lSoFar =l}
				q  = compareStr (str,beta) (str,r) 0
                                r' = r + q
                                l' = pos
	                        zpos = r' - l' + 1
                               in ZAlgoDS{zeds = (zs //[(pos,zpos)]),rSoFar =r',lSoFar=l'}

zMain :: String -> (B.ByteString,ZAlgoDS)
zMain str = let
              byteString  = B.pack str
              zinit       = getInitZArray ((B.length byteString) - 1)
              zDataStruct = ZAlgoDS {zeds = zinit,rSoFar = 0,lSoFar = 0}
              (byteString,(zAlgo byteString zDataStruct 1))

A small Main.hs – that takes the string to be preprocessed and prints it out:

module Main where

import ZAlgo
import System

main = do
      args <- getArgs
      print $ show (zMain (head args))

To compile the code : ghc –make Main.hs

and to run ./Main <string-to-be-preprocessed>

I’ve also played around with a few interesting options are to use the “-prof” option of the ghc to see some profiling data and using the -fhpc option to get the code coverage for a given input. More about them later.
Not counting the amateurish function names and the very-inelegant-for-haskell-standards code (and even potential bugs!), there’s a major drawback with the implementation. Wheras, the algorithm is pretty generic – it can handle any types of sequences, the above code can handle only a sequence of characters (that too a single byte characters, I think). In the next post, I will rectify that.

The Fourth Virtue – Virtuous Programmer 6

October 15, 2007

Like his language, Larry’s virtuous programmer needs an upgrade.Apart from laziness,impatience and hubris, a good programmer should inculcate a fourth virtue – Infidelity. Infidelity towards one’s favourite programming language. One should not be too faithful when it comes to programming languages.(To use a possible mixed metaphor, you can roughly translate it as “Thou shalt covet thy neighbour’s language”. )
  I wasn’t always like this. I was brought up on the shores of C where the C God was revered.There was no other God but the C God. And I was fairly happy structuring (and even object-orienting) and building my little sand-castles using those asterisks and ampersands – generally from scratch. Every one around me were honest, hard-working fishermen endowed with enormous patience and rippling muscles. I was fairly skillfull with my asterisk and was taught to always clean up after am done using it  – I was proud of many such good habits the religion gave me.
 Then Java came to our scene. Like a five-star sea-front resort. With a bunch of promises – platform independance, of automatic memory management and a monopoly on the ‘internet-ional’ market.
 Although there was some initial suspicion, it was tough not to get curious about it. Sure enough, the platform independance was the real thing, there were applets, there was swing, there was networking on a totally different level, and many more. And there were no core dumps. That you don’t have to clean the mess you created did not seem that bad a habit, after all. I was soon a convert. I swore in the name of Its Holiness ,The Java Virtual Machine. I got into ecstasy everytime I wrote a class loader. Every sunday I went to church to listen to the Gang of four and their design patterns.I sincerely believed that the path to heaven was paved with eclipse plugins, byte-code engineering (asm), classloaders, aspectjs, springs, PMDs and Jameleons.
 However, all along, I indulged in perl – but it wasn’t a religion per se. I thought every one who believed in God should know how to sing in perl, so that you can actively participate in the bhajans, keertans and the choir group. I did come across some artistes par excellence who were deeply religious about Perl itself, but I believed in only the Java God and did not give perl the status of religion. However I was faithful enough about it to not to go anywhere near the Python.
   I was leading such a religious and serene life when I suddenly came  to know about this enlightened one, from Japan – Ruby.It seemed as if, Bodhidharma came to west from the east, this time.Ruby is a bubbly, charming, youthful, friendly and a humourous zen master who seemed to have an answer for every thing. He was born in Japan but seemed to have seen a lot of world already and he’s well versed in english and computer science. And is he able? Man, He could do everything a perl programmer could do – only more elegantly. And when it comes to sophistication, even Java paled before him, one has to admit. There are already a lot of people who hang around him and who consider him to be the greatest God. But he definitely seems to be one guy who should not be judged by the company he keeps. The only trouble was that, the more you interact with this guy,even on a friendly level, the farther you go astray from the Java God. Religion start looking like a foolish sentiment.
 One day curiosity got the better of me, and I went to meet Ruby’s parents. His father , Mr SmallTalk, was an articulate, expressive, strong, principled and talented man who should have been a lot more famous than what he is. Probably he should have talked more.
  However, it is his mother’s side that am currently obsessed with.Ruby’s mother, Lady Lisp is an elegant, rich and a very creative Goddess. Although a tad old, she is very active and has quite a few talented devotees. And she has a whole bunch of close and not-so-close cousins. Each of them is a very beautiful, skillful, talented, lovely goddess worth dying for. After a few fumbling, but nevertheless paradigm-shifting, interactions with these lovely ladies,the light dawned on me. Infidelity is a virtue.If you are too religious about your God, you may end up mastering pointer arithmetic or design patterns or GC options but may miss out on lambda calculus , Left recursion, parser combinators, side-effects.There are quite a few things a real programmer should well be aware of. (Perhaps, it gotta be: “Should be well aware of”.)

Object Oriented Programming (and imperative programming) is just one side of the story. On the other side of this yang, there’s the lovely Yin too – the functional programming. And both Yin and Yang are integral part of the great Tao.
This site talks about my attempts at wooing some of the angels among the languages – Haskell, OCaml , Scheme, Scala and Forth – each with more Yin than Yang about them.Especially Haskell. She is pure Yin. And consequently the toughest to court. She falls for you pretty easily, if you are a PhD in computer science and if you have an especially strong fetish for obscure theories of mathematics. I am severely lacking in both these aspects, but I will try to overcome it with determination. Like all determined efforts do, this one’s going to look ugly. But who cares?. Once she’s mine, the others aren’t  going to be tough to get hold of. Be fore warned though. Am no natural cassanova.My attempts are bound to have more than a touch of gaucherie about them.

From my next post onwards, I will try to solve a set of problems in Haskell. Assuming I survive the ordeal, I will then attempt the same problems in OCaml,and then in Scheme, and may be in Forth too but finally in Scala. (I have a feeling that I might eventually settle down with scala but the path am choosing is probably necessary if I have to do full justice to all the features available in Scala). The  idea of this blog is to consider problems belonging to different areas of programming: parsing, string manipulation, numerical programs, web applications and ,hopefully,some aspects of parallel programming too.
Come by again in a short while,to see a few programs on haskell (there wont be too much of nonsense. That I promise.)


Get every new post delivered to your Inbox.