Packrat Parsers
Its a long weekend, so I wrote a packrat parser! (AKA Parsing Expression Grammar, or PEG.)
I implemented the example grammar from the original paper. Here its parsing the string 2*(3+4)
:
Holy cow thats a lot of steps for a 7 character string. No wonder modern compilers take ages to run!
Packrat parsers are neat because unlike other parsers you can combine your tokeniser and lexer into a single grammar. There's no need to define token rules separately from your semantic rules.
So for example, the definition of an array literal can simply be:
ArrayLiteral = "[" ListOf<AssignmentExpressionOrElision, ","> "]"
If you want to learn more, here's an online version of Ohm (a clean, full featured JS packrat parser) that you can mess around with: http://www.cdglabs.org/ohm/visualizer/.
Here's a big pile of academic papers: http://bford.info/packrat/
Finally the simple code I wrote is here. My full PEG implementation including the grammar fits in less than 200 lines of code.