Z Algorithm in Haskell

I tried to implement, in haskell, the Z Algorithm given in Dan Gusfield’s Algorithms on Strings,trees and Sequences.

It is not the prettiest, fastest and definitely not the most elegant implementation of this algorithm in haskell,but it sure taught me a trick or two. In this post, I’ll limit myself with a naive-implementation of the algorithm. And in later blogs, I will attempt at improving the same. Here’s the code – warts and all.


module ZAlgo where

import Data.ByteString.Char8 as B
import Data.Array

type Zs = Array Int Int

data ZAlgoDS = ZAlgoDS { zeds ::Zs, rSoFar ::Int, lSoFar::Int} deriving Show

compareStr :: (B.ByteString,Int)->(B.ByteString,Int)->Int ->Int
compareStr (s1,pos1) (s2,pos2) soFar
           | (pos1 < B.length  s1) &&  (pos2 < B.length s2) =  let
                                                                  c1 = B.index s1 pos1
                                                                  c2 = B.index s2 pos2
                                                                in
                                                                  if(c1 == c2)    then
                                      compareStr (s1,(pos1 + 1)) (s2, (pos2 + 1)) (soFar + 1)
                                                                  else soFar

           | otherwise                                        = soFar

getZAt ::B.ByteString -> Int-> Int
getZAt str 0 =  B.length str
getZAt str pos
        | (pos >0) && (pos < B.length str) = compareStr (str,0) (str,pos) 0
        | otherwise = error "Index out of range"

getInitZArray :: Int -> Zs
getInitZArray lngth = array (0,lngth) ( (0,(lngth+1)) : [(i,0) | i <- [1..lngth]])

zAlgo :: B.ByteString -> ZAlgoDS -> Int -> ZAlgoDS
zAlgo str (ZAlgoDS{zeds = zds,rSoFar=r,lSoFar = l})  pos
	| (pos >= B.length str) = ZAlgoDS{zeds = zds, rSoFar =r , lSoFar =l}
	| (pos >= r ) = zAlgo str (helper1 str pos zds r l) (pos +1)
	| otherwise = zAlgo str (helper2 str pos zds r l) (pos + 1)

helper1 :: B.ByteString -> Int -> Zs -> Int ->Int -> ZAlgoDS
helper1 str pos zs r l = let
			   zpos = getZAt str pos
                           r' = pos + zpos - 1
                         in
                           if (zpos == 0 ) then ZAlgoDS{zeds = zs,rSoFar =r,lSoFar =l}
	                   else ZAlgoDS{ zeds = (zs // [(pos,zpos)]), rSoFar = r',lSoFar = pos}

helper2 :: B.ByteString -> Int -> Zs -> Int ->Int -> ZAlgoDS
helper2 str pos zs r l = let
			   k' = pos - l
                           zk' = zs ! k'
                           beta = r - pos +1
                         in
                           if ( zk' <= beta ) then ZAlgoDS{zeds = (zs//[(pos,zk')]),rSoFar =r,lSoFar =l}
	                   else
                              let
				q  = compareStr (str,beta) (str,r) 0
                                r' = r + q
                                l' = pos
	                        zpos = r' - l' + 1
                               in ZAlgoDS{zeds = (zs //[(pos,zpos)]),rSoFar =r',lSoFar=l'}

zMain :: String -> (B.ByteString,ZAlgoDS)
zMain str = let
              byteString  = B.pack str
              zinit       = getInitZArray ((B.length byteString) - 1)
              zDataStruct = ZAlgoDS {zeds = zinit,rSoFar = 0,lSoFar = 0}
            in
              (byteString,(zAlgo byteString zDataStruct 1))

A small Main.hs – that takes the string to be preprocessed and prints it out:


module Main where

import ZAlgo
import System

main = do
      args <- getArgs
      print $ show (zMain (head args))

To compile the code : ghc –make Main.hs

and to run ./Main <string-to-be-preprocessed>

I’ve also played around with a few interesting options are to use the “-prof” option of the ghc to see some profiling data and using the -fhpc option to get the code coverage for a given input. More about them later.
Not counting the amateurish function names and the very-inelegant-for-haskell-standards code (and even potential bugs!), there’s a major drawback with the implementation. Wheras, the algorithm is pretty generic – it can handle any types of sequences, the above code can handle only a sequence of characters (that too a single byte characters, I think). In the next post, I will rectify that.

This entry was posted on April 29, 2008 at 9:03 pm and is filed under haskell. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “Z Algorithm in Haskell”

cabinet stomatologic iasi Says:
May 6, 2012 at 9:52 am | Reply
Look here for more:…

[…]I am no longer certain the place you’re getting your information, but great topic.[…]…

Mediocre To The Core