First define a regular expression type:

> import List

> data RE a = Symbol a | Star (RE a) | Sum [RE a] | Prod [RE a]

> deriving (Eq,Show,Ord)

I use lists instead of binary operators to represent products and sums because

addition and multiplication of regular expressions is associative.

> a <*> b = Prod [a,b]

> a <+> b = Sum [a,b]

> zero = Sum []

> one = Prod []

Here's how to make a simple regular expression from a string

> re x = Prod $ map Symbol x

Exercise: Write a proper parser for these things so we can use conventional regexp notation!

Here's a test to see whether a regular expression matches the empty string:

> acceptsEmpty (Symbol _) = False

> acceptsEmpty (Star _) = True

> acceptsEmpty (Sum l) = or $ map acceptsEmpty l

> acceptsEmpty (Prod l) = and $ map acceptsEmpty l

I've decided to name these derivatives after people. So first I define the McBride derivative which is a generalisation of the dissection operator introduced in Clowns to the left.... The key idea is that if

`d = mcbride c j a`then

`d (x <*> y) = (d x <*> j y) <+> (c x <*> d y)`. This needs to respect the associativity of <*> and one way to achieve this is by choosing

`c`and

`j`that are multiplicative homomorphisms so that, for example,

`c(x <*> y) = c x <*> c y`.

> mcbride _ _ a (Symbol a') = if a==a' then one else zero

> mcbride c j a (Sum p) = Sum $ map (mcbride c j a) p

> mcbride _ _ _ (Prod []) = zero

> mcbride c j a (Prod x) = Sum $ map f (splits x) where

> f (x,y:ys) = Prod $ map c x ++ [mcbride c j a y] ++ map j ys

> mcbride c j a (Star p) = c (Star p) <*> mcbride c j a p <*> j (Star p)

And now I can define all the Leibniz, Fox and Brzozowski derivatives as special cases of the mcbride derivative. The latter each come in left- and right-handed pairs.

> delta p = if acceptsEmpty p then one else zero

> leibniz,fox_l,fox_r,brzozowski_l :: (Ord a,Eq a) => a -> RE a -> RE a

> leibniz a p = simplify $ mcbride id id a p

> fox_l a p = simplify $ mcbride id (const (one)) a p

> fox_r a p = simplify $ mcbride (const (one)) id a p

> brzozowski_l a p = simplify $ mcbride delta id a p

> brzozowski_r a p = simplify $ mcbride id delta a p

`brzozowski_l`is the usual Brzozowski derivative. If

`r`matches some language,

`brzozowski_l a t`matches the sublanguage of strings beginning with

`a`and with that

`a`removed. We can define regular expression matching in the usual way in terms of it:

> matches re s = acceptsEmpty (matches' re s) where

> matches' re [] = re

> matches' re (a:as) = matches' (brzozowski_l a re) as

Now we can start matching.

`ex1`would normally be written as "abc|b*c".

> ex1 = (re "abc") <+> (Star (re "b") <*> Symbol 'c')

> test1 = map (matches ex1) ["abc","bbc","bbbbc","abbbbc"]

`brzozowski_r`chews off a symbol from the right:

> test2 = map (matches (brzozowski_r 'c' ex1)) ["abc","bbc","ab","bbbbb"]

`leibniz a`allows you to leave out one

`a`from anywhere in the regexp.

`ex3`matches "abracadabra" as it might be shouted by a wizard who might lengthen the second last 'a' to make sure the spirits can hear.

`ex4`matches such strings with precisely one missing 'r'.

`ex3`would normally be written as "abracada*bra".

> ex3 = re "abracad" <*> Star (re "a") <*> re "bra"

> test3 = map (matches ex3) ["abracadabra","abracadaaaaabra"]

> ex4 = leibniz 'r' ex3

> test4 = map (matches ex4) ["abracadabra","abacadabra","abracaaaaadaba","abacadaba"]

`fox_l a`makes regular expressions accept any prefix ending just before an

`a`. Combining

`fox_l`and

`fox_r`allows you to search for all elements of a language just between a chosen pair of symbols.

In

`ex5`we build a regexp for certain types of statement in a simple language and then pull out from it the part corresponding to substrings within parentheses. The undifferentiated part of

`ex5`would normally be written as "x = ([0-9]+[0-9]); *".

> ex5 = simplify $ fox_r '(' $ fox_l ')' $

> let digits = Star (Sum (map Symbol "0123456789")) in

> re "x = (" <*> digits <*> re "+" <*> digits <*> re ");" <+> Star (re " ")

> test5 = map (matches ex5) ["1+2","7-4"]

Open Question: There are more derivatives that can be constructed from the parts I've given. What do they do? What other multiplicative homomorphisms can you define? What derivatives do they give? (Hint: consider the property "R matches only a finite language").

Oh...I almost forgot the whole point of this. It's now possible to see that McBride's l-operator is a version of the Brzozowski derivative. In fact, the theorems he proves about l are pretty similar to the theorems that get proved about the Brzozowski derivative such as Theorem 4.4 here.

Some helper functions:

`splits x`returns all the ways of splitting

`x`into pairs

`u`and

`v`such that

`u ++ v == x`and

`v`is non-empty.

> splits [] = []

> splits (a:as) = ([],a:as) : map f (splits as) where

> f (x,y) = (a:x,y)

Some (incomplete) simplification rules for regular expressions

> simplify a = let b = simplify' a in if a==b then b else simplify b where

> simplify' (Prod [a]) = a

> simplify' (Prod a) | zero `elem` a = zero

> simplify' (Prod (a:b)) = case a of

> Prod x -> Prod $ x ++ map simplify b

> a -> Prod $ filter (/= one) $ map simplify (a:b)

>

> simplify' (Sum [a]) = a

> simplify' (Sum (a:b)) = case a of

> Sum x -> Sum $ x ++ map simplify b

> a -> Sum $ nub $ sort $ filter (/= zero) $ map simplify (a:b)

>

> simplify' (Star a) = Star (simplify a)

> simplify' a = a

## No comments:

Post a Comment