### Fox's Ubiquitous Free Derivative pt. 2

As promised earlier, here are some Haskell definitions of derivatives that can be applied to regular expressions. They all give operators that might occasionally be of practical use. But what's more interesting to me is that we now have enough of an intuition pump going between types and regular expression that we can siphon off ideas from papers on types and use them with regexps.

First define a regular expression type:

I use lists instead of binary operators to represent products and sums because

addition and multiplication of regular expressions is associative.

Here's how to make a simple regular expression from a string

Exercise: Write a proper parser for these things so we can use conventional regexp notation!

Here's a test to see whether a regular expression matches the empty string:

I've decided to name these derivatives after people. So first I define the McBride derivative which is a generalisation of the dissection operator introduced in Clowns to the left.... The key idea is that if

And now I can define all the Leibniz, Fox and Brzozowski derivatives as special cases of the mcbride derivative. The latter each come in left- and right-handed pairs.

Now we can start matching.

In

Open Question: There are more derivatives that can be constructed from the parts I've given. What do they do? What other multiplicative homomorphisms can you define? What derivatives do they give? (Hint: consider the property "R matches only a finite language").

Oh...I almost forgot the whole point of this. It's now possible to see that McBride's l-operator is a version of the Brzozowski derivative. In fact, the theorems he proves about l are pretty similar to the theorems that get proved about the Brzozowski derivative such as Theorem 4.4 here.

Some helper functions:

Some (incomplete) simplification rules for regular expressions

First define a regular expression type:

> import List

> data RE a = Symbol a | Star (RE a) | Sum [RE a] | Prod [RE a]

> deriving (Eq,Show,Ord)

I use lists instead of binary operators to represent products and sums because

addition and multiplication of regular expressions is associative.

> a <*> b = Prod [a,b]

> a <+> b = Sum [a,b]

> zero = Sum []

> one = Prod []

Here's how to make a simple regular expression from a string

> re x = Prod $ map Symbol x

Exercise: Write a proper parser for these things so we can use conventional regexp notation!

Here's a test to see whether a regular expression matches the empty string:

> acceptsEmpty (Symbol _) = False

> acceptsEmpty (Star _) = True

> acceptsEmpty (Sum l) = or $ map acceptsEmpty l

> acceptsEmpty (Prod l) = and $ map acceptsEmpty l

I've decided to name these derivatives after people. So first I define the McBride derivative which is a generalisation of the dissection operator introduced in Clowns to the left.... The key idea is that if

`d = mcbride c j a`then`d (x <*> y) = (d x <*> j y) <+> (c x <*> d y)`. This needs to respect the associativity of <*> and one way to achieve this is by choosing`c`and`j`that are multiplicative homomorphisms so that, for example,`c(x <*> y) = c x <*> c y`.

> mcbride _ _ a (Symbol a') = if a==a' then one else zero

> mcbride c j a (Sum p) = Sum $ map (mcbride c j a) p

> mcbride _ _ _ (Prod []) = zero

> mcbride c j a (Prod x) = Sum $ map f (splits x) where

> f (x,y:ys) = Prod $ map c x ++ [mcbride c j a y] ++ map j ys

> mcbride c j a (Star p) = c (Star p) <*> mcbride c j a p <*> j (Star p)

And now I can define all the Leibniz, Fox and Brzozowski derivatives as special cases of the mcbride derivative. The latter each come in left- and right-handed pairs.

> delta p = if acceptsEmpty p then one else zero

> leibniz,fox_l,fox_r,brzozowski_l :: (Ord a,Eq a) => a -> RE a -> RE a

> leibniz a p = simplify $ mcbride id id a p

> fox_l a p = simplify $ mcbride id (const (one)) a p

> fox_r a p = simplify $ mcbride (const (one)) id a p

> brzozowski_l a p = simplify $ mcbride delta id a p

> brzozowski_r a p = simplify $ mcbride id delta a p

`brzozowski_l`is the usual Brzozowski derivative. If`r`matches some language,`brzozowski_l a t`matches the sublanguage of strings beginning with`a`and with that`a`removed. We can define regular expression matching in the usual way in terms of it:

> matches re s = acceptsEmpty (matches' re s) where

> matches' re [] = re

> matches' re (a:as) = matches' (brzozowski_l a re) as

Now we can start matching.

`ex1`would normally be written as "abc|b*c".

> ex1 = (re "abc") <+> (Star (re "b") <*> Symbol 'c')

> test1 = map (matches ex1) ["abc","bbc","bbbbc","abbbbc"]

`brzozowski_r`chews off a symbol from the right:

> test2 = map (matches (brzozowski_r 'c' ex1)) ["abc","bbc","ab","bbbbb"]

`leibniz a`allows you to leave out one`a`from anywhere in the regexp.`ex3`matches "abracadabra" as it might be shouted by a wizard who might lengthen the second last 'a' to make sure the spirits can hear.`ex4`matches such strings with precisely one missing 'r'.`ex3`would normally be written as "abracada*bra".

> ex3 = re "abracad" <*> Star (re "a") <*> re "bra"

> test3 = map (matches ex3) ["abracadabra","abracadaaaaabra"]

> ex4 = leibniz 'r' ex3

> test4 = map (matches ex4) ["abracadabra","abacadabra","abracaaaaadaba","abacadaba"]

`fox_l a`makes regular expressions accept any prefix ending just before an`a`. Combining`fox_l`and`fox_r`allows you to search for all elements of a language just between a chosen pair of symbols.In

`ex5`we build a regexp for certain types of statement in a simple language and then pull out from it the part corresponding to substrings within parentheses. The undifferentiated part of`ex5`would normally be written as "x = ([0-9]+[0-9]); *".

> ex5 = simplify $ fox_r '(' $ fox_l ')' $

> let digits = Star (Sum (map Symbol "0123456789")) in

> re "x = (" <*> digits <*> re "+" <*> digits <*> re ");" <+> Star (re " ")

> test5 = map (matches ex5) ["1+2","7-4"]

Open Question: There are more derivatives that can be constructed from the parts I've given. What do they do? What other multiplicative homomorphisms can you define? What derivatives do they give? (Hint: consider the property "R matches only a finite language").

Oh...I almost forgot the whole point of this. It's now possible to see that McBride's l-operator is a version of the Brzozowski derivative. In fact, the theorems he proves about l are pretty similar to the theorems that get proved about the Brzozowski derivative such as Theorem 4.4 here.

Some helper functions:

`splits x`returns all the ways of splitting`x`into pairs`u`and`v`such that`u ++ v == x`and`v`is non-empty.

> splits [] = []

> splits (a:as) = ([],a:as) : map f (splits as) where

> f (x,y) = (a:x,y)

Some (incomplete) simplification rules for regular expressions

> simplify a = let b = simplify' a in if a==b then b else simplify b where

> simplify' (Prod [a]) = a

> simplify' (Prod a) | zero `elem` a = zero

> simplify' (Prod (a:b)) = case a of

> Prod x -> Prod $ x ++ map simplify b

> a -> Prod $ filter (/= one) $ map simplify (a:b)

>

> simplify' (Sum [a]) = a

> simplify' (Sum (a:b)) = case a of

> Sum x -> Sum $ x ++ map simplify b

> a -> Sum $ nub $ sort $ filter (/= zero) $ map simplify (a:b)

>

> simplify' (Star a) = Star (simplify a)

> simplify' a = a

Labels: haskell, mathematics

## 0 Comments:

Post a Comment

<< Home