A Generic Z Algorithm
As mentioned earlier, the biggest problem with the ZAlgo funtion defined in the previous blog is
that it works for only a fixed types of inputs. It takes a String and produces a ByteString (apart from ZAlgoDS). And that is a gross injustice to the algorithm which works for just about any sequences.
The algorithm works for sequences of all types – sequences of integers, characters, even sequences of Strings or any ADT for that matter.
In this chapter, we will make the ZAlgo function handle all types of data.
In this blog, I will make the ZAlgo function, generic enough to handle other types of sequences also.
That is, the new module which I will define now can be used for processing any kinds of sequences.
I’ll continue to use the Data.Array for holding the Z Array(and not consider any generic-ness here), for the sake of simplicity although there are other options available in Standard libraries of the haskell to hold similar structure with perhaps better performance.
module GenZ where
import Data.Array
import Data.ByteString.Char8 as B
The second import(Data.ByteString.Char8 ) is used only for the instance declaration (in the usage section). It is not used in the main algo
type Zs = Array Int Int
data ZAlgoDS = ZAlgoDS{ zeds ::Zs, rSoFar ::Int, lSoFar::Int} deriving Show
Now we need a class for which the functions can be defined. A class in haskell is a collection of related types.
1 Class Declaration
class MyString m where
zEqAtIndex :: (m,Int)->(m,Int)->Bool
zLength :: m -> Int
This is the type we will use while defining the functions to create a generic algorithm.
2 Comparison of the sequences
compareStr :: (MyString m)=> (m,Int)->(m,Int)->Int->Int
compareStr (s1,pos1) (s2,pos2) soFar
| (pos1 < zLength s1) || (pos2 < zLength s2) = let
compRes = zEqAtIndex (s1,pos1) (s2,pos2)
in
if compRes then
compareStr (s1,(pos1 + 1)) (s2, (pos2 + 1)) (soFar + 1)
else soFar
| otherwise = soFar
As can be seen above, we barely made two changes :
- We used zEqAtIndex to compare the items at respective positions
- we are using zLength instead of B.length
3 getZAt
Moving on, getZAt method can be modified in similar fashion
getZAt :: (MyString m)=> m -> Int ->Int
getZAt str 0 = zLength str
getZAt str pos
| (pos >0) && (pos < zLength str) = compareStr (str,0) (str,pos) 0
| otherwise = error Ïndex out of range"
4 Initializing the z array
There need not be any change in this function.
getInitZArray :: Int -> Zs
getInitZArray lngth = array (0,lngth) ( (0,(lngth+1)) : [(i,0) - i <- [1..lngth]])
5 zAlgo and its helper functions
In zAlgo and its helper functions, the only change we need to do is using the zLength instead of the library function
zAlgo str (ZAlgoDSzeds = zds,rSoFar=r,lSoFar = l) pos
- (pos >= zLength str) = ZAlgoDSzeds = zds, rSoFar =r , lSoFar =l
- (pos >= r ) = zAlgo str (helper1 str pos zds r l) (pos +1)
- otherwise = zAlgo str (helper2 str pos zds r l) (pos + 1)
Both the first of the helper functions
helper1 str pos zs r l = let
zpos = getZAt str pos
r' = pos + zpos - 1
in
if (zpos == 0 ) then ZAlgoDSzeds = zs,rSoFar =r,lSoFar =l
else ZAlgoDS zeds = (zs // [(pos,zpos)]), rSoFar = r',lSoFar = pos
helper2 str pos zs r l = let
k' = pos - l
zk' = zs ! k'
beta = r - pos +1
in
if ( zk' <= beta ) then ZAlgoDSzeds = (zs//[(pos,zk')]),rSoFar =r,lSoFar =l
else
let
q = compareStr (str,beta) (str,r) 0
r' = r + q
l' = pos
zpos = r' - l' + 1
in ZAlgoDSzeds = (zs //[(pos,zpos)]),rSoFar =r',lSoFar=l'
remain largely unaltered.
That completes generalizing the algorithm, so that it can process any sequence of characters.Now comes the complicated part.
6 Usage
I’llo use the same functions above for two different types of sequences.
I’ll first define a couple of dummy functions to compare at indices – one for Binary Strings and one for Lists-Of-Strings
compAtIndices :: (B.ByteString,Int)->(B.ByteString,Int) -> Bool
compAtIndices (s1,pos1) (s2,pos2) = (B.index s1 pos1) == (B.index s2 pos2)
data LoS = LoS [String] deriving (Show)
compStrAtIndices :: (LoS,Int)->(LoS,Int)->Bool
compStrAtIndices (LoS sList1,pos1) (LoS sList2,pos2) = ((sList1 !! pos1) == (sList2 !! pos2))
getLength :: LoS -> Int
getLength (LoS slist) = Prelude.length slist
Now a couple of instances
instance MyString B.ByteString where
zEqAtIndex = compAtIndices
zLength = B.length
instance MyString LoS where
zEqAtIndex = compStrAtIndices
zLength = getLength
7 Wrapper Function
Now, the wrapper functions. One for ByteString and one for LoS
zMain :: String -> (B.ByteString,ZAlgoDS)
zMain str = let
byteString = B.pack str
zinit = getInitZArray ((B.length byteString) - 1)
zDataStruct = ZAlgoDS zeds = zinit,rSoFar = 0,lSoFar = 0
in
(byteString,(zAlgo byteString zDataStruct 1))
zMain' :: [String] -> (LoS,ZAlgoDS)
zMain' strList = let
zinit = getInitZArray ((Prelude.length strList) - 1)
zDataStruct = ZAlgoDS zeds = zinit,rSoFar = 0,lSoFar = 0
in
(LoS strList,(zAlgo (LoS strList) zDataStruct 1))
Now, we are done.
we can use zMain to process strings and zMain’ to process lists of strings.
In general, we can thus use the same algorithm to process a linear list/sequence of any data types. The OOPs equivalent of this is , perhaps the strategy design pattern.
September 1, 2008 at 5:44 am |
Hello!,