peggy—PEG parsing in Python
This morning, to scratch an itch1, I wrote a PEG implementation in Python.
I took a bit of an unusual approach to it, which I’ll explain here and I hope you may find interesting.
I am, I confess, a Haskell programmer on the sly. Haskell is quite interesting for giving one a bit of a different perspective on how code should look.
In particular, here, the design process for the imperative mindset is to ask what a parser does. I think what’s an interesting alternative approach is the declarative mindset, where we ask what a parser is.
To a first approximation, a parser is this (if you’ll forgive the Haskell syntax):
type Parser a = String -> Maybe (a, String)
Transliterating this to English: a parser which parses
is: a function which takes an
a and returns either an
a and the
rest of the unparsed input, or nothing (in case of a parse error).
In Haskell, this takes the shape of a stack of monad transformers. Indeed, if we restate that type with some newtype wrappers:
type Parser = StateT String Maybe
We get most of the functionality we want “for free”. If we throw in the following extra utility function:
dot :: Parser Char dot = do (x:xs) <- get put xs return x
That’s enough for the majority of basic parsing2.
Sequencing is achieved through
>> which both come with the
Applicative instance for
StateT String Maybe. Ordered choice and empty
empty in the Alternative instance for
String Maybe. In fact, the
* operators also come from
Alternative instance as
I think this makes rather a nice example for Haskell: a pretty full set of parsing combinators in five lines of code, one of which is just type signature.
I wanted to translate this formulation into Python to produce a
relatively small and clean PEG implementation. The basic idea was a
Parser wrapper class around these parser functions.
A Python parser function here is one that takes an input string and either:
- Returns a
(parsed value, residue)tuple, or
- Throws a
The other combinators can all be built up from that basic building block.
The implementation is about 140 lines of Python.