I tried to implement, in haskell, the Z Algorithm given in Dan Gusfield’s Algorithms on Strings,trees and Sequences.
It is not the prettiest, fastest and definitely not the most elegant implementation of this algorithm in haskell,but it sure taught me a trick or two. In this post, I’ll limit myself with a naive-implementation of the algorithm. And in later blogs, I will attempt at improving the same. Here’s the code – warts and all.
module ZAlgo where
import Data.ByteString.Char8 as B
import Data.Array
type Zs = Array Int Int
data ZAlgoDS = ZAlgoDS { zeds ::Zs, rSoFar ::Int, lSoFar::Int} deriving Show
compareStr :: (B.ByteString,Int)->(B.ByteString,Int)->Int ->Int
compareStr (s1,pos1) (s2,pos2) soFar
| (pos1 < B.length s1) && (pos2 < B.length s2) = let
c1 = B.index s1 pos1
c2 = B.index s2 pos2
in
if(c1 == c2) then
compareStr (s1,(pos1 + 1)) (s2, (pos2 + 1)) (soFar + 1)
else soFar
| otherwise = soFar
getZAt ::B.ByteString -> Int-> Int
getZAt str 0 = B.length str
getZAt str pos
| (pos >0) && (pos < B.length str) = compareStr (str,0) (str,pos) 0
| otherwise = error "Index out of range"
getInitZArray :: Int -> Zs
getInitZArray lngth = array (0,lngth) ( (0,(lngth+1)) : [(i,0) | i <- [1..lngth]])
zAlgo :: B.ByteString -> ZAlgoDS -> Int -> ZAlgoDS
zAlgo str (ZAlgoDS{zeds = zds,rSoFar=r,lSoFar = l}) pos
| (pos >= B.length str) = ZAlgoDS{zeds = zds, rSoFar =r , lSoFar =l}
| (pos >= r ) = zAlgo str (helper1 str pos zds r l) (pos +1)
| otherwise = zAlgo str (helper2 str pos zds r l) (pos + 1)
helper1 :: B.ByteString -> Int -> Zs -> Int ->Int -> ZAlgoDS
helper1 str pos zs r l = let
zpos = getZAt str pos
r' = pos + zpos - 1
in
if (zpos == 0 ) then ZAlgoDS{zeds = zs,rSoFar =r,lSoFar =l}
else ZAlgoDS{ zeds = (zs // [(pos,zpos)]), rSoFar = r',lSoFar = pos}
helper2 :: B.ByteString -> Int -> Zs -> Int ->Int -> ZAlgoDS
helper2 str pos zs r l = let
k' = pos - l
zk' = zs ! k'
beta = r - pos +1
in
if ( zk' <= beta ) then ZAlgoDS{zeds = (zs//[(pos,zk')]),rSoFar =r,lSoFar =l}
else
let
q = compareStr (str,beta) (str,r) 0
r' = r + q
l' = pos
zpos = r' - l' + 1
in ZAlgoDS{zeds = (zs //[(pos,zpos)]),rSoFar =r',lSoFar=l'}
zMain :: String -> (B.ByteString,ZAlgoDS)
zMain str = let
byteString = B.pack str
zinit = getInitZArray ((B.length byteString) - 1)
zDataStruct = ZAlgoDS {zeds = zinit,rSoFar = 0,lSoFar = 0}
in
(byteString,(zAlgo byteString zDataStruct 1))
A small Main.hs – that takes the string to be preprocessed and prints it out:
module Main where
import ZAlgo
import System
main = do
args <- getArgs
print $ show (zMain (head args))
To compile the code : ghc –make Main.hs
and to run ./Main <string-to-be-preprocessed>
I’ve also played around with a few interesting options are to use the “-prof” option of the ghc to see some profiling data and using the -fhpc option to get the code coverage for a given input. More about them later.
Not counting the amateurish function names and the very-inelegant-for-haskell-standards code (and even potential bugs!), there’s a major drawback with the implementation. Wheras, the algorithm is pretty generic – it can handle any types of sequences, the above code can handle only a sequence of characters (that too a single byte characters, I think). In the next post, I will rectify that.
May 6, 2012 at 9:52 am |
Look here for more:…
[…]I am no longer certain the place you’re getting your information, but great topic.[…]…