Pitfalls
in which I discuss some dioms and gotchas you'll likely hit while building a grammar.
Prefer recurse over lazy for recursive grammars
Parser.lazy defers parser construction until run time, so it can
close over self-references. The cost is that every invocation rebuilds
the parser sub-tree, so for input of depth N, that's N allocations per
parse.
Parser.recurse runs a parser through a stable cell and does not
rebuild. Declare the recursive position in a top-level def, set it
once, reference it from sub-parsers. See examples/lisp.carp for the
pattern. Use lazy only when the grammar is shallow or short-lived or
performance is less important than clarity.
A lazy thunk must call a sibling, not itself
(defn parens []
(Parser.between '(' ')'
(Parser.optional (Parser.lazy (fn [] (parens))))))
This loops at parse time. The self-call inside the thunk does not terminate. Split the recursive position into two functions:
(defn parens-content []
(Parser.alt (Parser.lazy (fn [] (parens-pair)))
(Parser.pure ())))
(defn parens-pair []
(Parser.between '(' ')' (parens-content)))
recurse against a stable cell sidesteps this entirely.
Use String.byte-slice, not String.slice
In a custom combinator that extracts a substring, use
String.byte-slice. The library does internally. String.slice walks
the input twice (chars then from-chars), byte-slice is a direct
memcpy. The cost difference is two orders of magnitude.
The tradeoff here is UTF-8.
Pass (fn ...) to combinators by value, not by reference
Combinators like Parser.map, Parser.bind, Parser.satisfy, and
Parser.take-while take their function arguments by value:
(Parser.map (Parser.byte \a) (fn [c] (Char.to-int c))) ; right
(Parser.map (Parser.byte \a) &(fn [c] (Char.to-int c))) ; wrong
The &fn form compiles for capture-free closures (Carp hoists those
to static functions) but produces a dangling reference once the
closure captures a local. The failure mode is a runtime segfault, not
a compile error.
This should be resolved in Carp eventually, but until then I want to tag it.
Parser.parse is strict
Parser.parse p input succeeds only if p consumes all of input.
There is no partial-parse entry point. If you want to ignore trailing
input, append a permissive take-while to your grammar, or trim the
input before parsing.