Parser
is a Parsec-style parser combinator library for Carp.
Installation
(load "git@github.com:carpentry-org/parsec@0.3.0")
Usage
(let [p (Parser.seq (Parser.byte \h) (Parser.byte \i))]
(match (Parser.parse p "hi")
(Result.Success _) (println "matched")
(Result.Error e) (IO.errorln &(Parser.format-error &e))))
Lexer
Module
is a submodule of common-token parsers: whitespace handling, integers, identifiers, and literal symbols.
UTF8
Module
is a submodule of codepoint-level parsers layered on the byte-level core. Each codepoint advances the cursor by one column, regardless of byte width.
alt
(Fn [(Parser a), (Parser a)] (Parser a))
(alt p q)
if p fails without consuming, runs q from the same
cursor. If p consumes anything, no backtrack — wrap p in try
to allow backtracking.
before
(Fn [(Parser a), (Parser b)] (Parser a))
(before p q)
runs p then q, returning p's value and discarding
q's. The mirror image of then.
between
(Fn [(Parser a), (Parser b), (Parser c)] (Parser c))
(between open close p)
runs open, then p, then close, returning
p's value. Once open consumes, all subsequent failures are
consumed failures.
bind
(Fn [(Parser a), (Fn [a] (Parser b))] (Parser b))
(bind p f)
is the monadic bind. Runs p, then runs (f v) where
v is p's value. The continuation parser is computed from the
value.
bind-result
(Fn [(Parser a), (Fn [a, (Ref Cursor b)] (Result c ParseErr))] (Parser c))
(bind-result p f)
runs p, then applies f to its value and the
cursor after p. f returns a (Result b ParseErr) directly,
avoiding a per-parse Parser.pure/Parser.fail rebuild that the
equivalent bind formulation would do. The cursor argument lets f
build a ParseErr at the right position when it returns
Result.Error.
byte
(Fn [Char] (Parser Char))
(byte b)
consumes a specific byte (as Char). Fails empty if
the input doesn't match or is at EOF.
chainl
(Fn [(Parser a), (Parser (Fn [a, a] a)), a] (Parser a))
(chainl p op default)
like chainl1 but returns default if p fails on
the first attempt.
chainl1
(Fn [(Parser a), (Parser (Fn [a, a] a))] (Parser a))
(chainl1 p op)
parses one or more occurrences of p separated by
op. op returns a binary function used to combine values
left-associatively. Example: (chainl1 (Lexer.integer) add-op)
parses 1+2+3 as ((1+2)+3).
chainr
(Fn [(Parser a), (Parser (Fn [a, a] a)), a] (Parser a))
(chainr p op default)
like chainr1 but returns default if p fails on
the first attempt.
chainr1
(Fn [(Parser a), (Parser (Fn [a, a] a))] (Parser a))
(chainr1 p op)
parses one or more occurrences of p separated by
op. op returns a binary function used to combine values
right-associatively. Example: with exponentiation as op,
2^3^2 parses as 2^(3^2) = 512.
choice
(Fn [(Array (Parser a))] (Parser a))
(choice ps)
tries each parser in turn; returns the first
success. If all parsers fail empty, the resulting failure merges
their expected sets. A consumed failure of any parser short-circuits
(use try inside if needed).
count
(Fn [Int, (Parser a)] (Parser (Array a)))
(count n p)
runs p exactly n times, collecting results into
an Array. If any iteration fails, propagates with appropriate
consumption.
end-by
(Fn [(Parser a), (Parser b)] (Parser (Array a)))
(end-by p sep)
zero or more p each followed by sep. Returns an
Array (possibly empty). Each element must be terminated by sep.
end-by1
(Fn [(Parser a), (Parser b)] (Parser (Array a)))
(end-by1 p sep)
one or more p each followed by sep. Unlike
sep-by1, the separator acts as a terminator — the last p must
also be followed by sep. Returns an Array of p values.
eof
(Fn [] (Parser ()))
(eof)
succeeds without consuming iff the cursor is at end of input.
When the input is not at end, the error's unexpected field is
populated with the byte that was seen.
fail
(Fn [String] (Parser a))
(fail lbl)
fails empty, with the given label as the expected token.
format-error
(Fn [(Ref ParseErr a)] String)
(format-error e)
renders a ParseErr as a human-readable
String. Format: parse error at LINE:COL: unexpected X; expected: Y, Z, W.
Uses StringBuf internally.
init
(Fn [(Fn [(Ref String a), Int, (Ref Cursor b)] (Reply c))] (Parser c))
creates a Parser.
label
(Fn [String, (Parser a)] (Parser a))
(label name p)
replaces the expected-set on empty failure with
name. Has no effect on consumed failures or successes.
lazy
(Fn [(Fn [] (Parser a))] (Parser a))
(lazy thunk)
defers parser construction until run time. Use this to break eager-construction infinite recursion in recursive grammars.
The thunk must call a sibling function, not the immediately enclosing one. Pattern: split a recursive grammar into two functions, with the lazy thunk in one calling the other.
(sig sexp (Fn [] (Parser SExp)))
(defn list-p []
(Parser.between open close
(Parser.many (Parser.lazy (fn [] (sexp))))))
(defn sexp [] (Parser.alt (atom-p) (list-p)))
Each invocation rebuilds the deferred parser. For deep recursion, prefer
Parser.recurse against a pre-built grammar in a top-level def.
lookahead
(Fn [(Parser a)] (Parser a))
(lookahead p)
runs p, then restores the cursor. p's value
is kept, but no input is consumed. p's failures propagate as-is.
many
(Fn [(Parser a)] (Parser (Array a)))
(many p)
runs p zero or more times, collecting results. Stops
on empty failure of p. If p succeeds without consuming, the loop
stops to avoid infinite recursion (do not use many with an
empty-success parser).
many-till
(Fn [(Parser a), (Parser b)] (Parser (Array a)))
(many-till p end)
runs p repeatedly until end succeeds, collecting
p results in an Array. Returns the array (not the end result).
The end parser is tried before each p. If end can consume on
failure, wrap it in try.
many1
(Fn [(Parser a)] (Parser (Array a)))
(many1 p)
runs p one or more times. Fails (with same
consumption as p's first failure) if p doesn't match at least
once.
not-followed-by
(Fn [(Parser a)] (Parser ()))
(not-followed-by p)
succeeds (without consuming) if p fails;
fails if p succeeds. Useful for negative assertions.
optional
(Fn [(Parser a)] (Parser (Maybe a)))
(optional p)
if p succeeds, wraps the value in Just; if p
fails empty, succeeds with Nothing without consuming. Consumed
failures propagate.
parse
(Fn [(Parser a), (Ref String b)] (Result a ParseErr))
(parse p src)
runs p over the entire input. Strict: fails if p
doesn't consume all of src. Returns (Result a ParseErr).
parse-partial
(Fn [(Parser a), (Ref String b)] (Result (Pair a String) ParseErr))
(parse-partial p src)
runs p on the input without requiring it to
consume everything. On success, returns a Pair of the parsed value
and the remaining unconsumed input as a String. On error, returns
the ParseErr.
placeholder
(Fn [] (Parser a))
(placeholder)
is an uninitialized parser that always fails.
Use as the initial value for a top-level def that will be set! to
the real parser before parsing. Companion to recurse.
recurse
(Fn [(Ref (Parser a) StaticLifetime)] (Parser a))
(recurse pref)
runs a parser referenced by a stable static reference,
without rebuilding it per call. Use for deeply recursive grammars where
lazy would rebuild the parser tree at every level.
Pattern: declare the recursive grammar in a top-level def, set! it
once at startup, and reference it from sub-parsers via recurse.
(def *sexp* (the (Parser SExp) (Parser.placeholder)))
(defn list-p []
(Parser.between open close
(Parser.many (Parser.recurse &*sexp*))))
(defn init-grammar []
(set! *sexp* (Parser.alt (atom-p) (list-p))))
The grammar must be initialized via set! before parsing.
run
(Fn [(Ref (Parser a) b)] (Ref (Fn [(Ref String c), Int, (Ref Cursor d)] (Reply a)) b))
gets the run property of a Parser.
satisfy
(Fn [(Fn [Char] Bool a), String] (Parser Char))
(satisfy pred lbl)
consumes one byte if pred holds, otherwise fails empty.
When the predicate rejects a byte, the error's unexpected field is
populated with the byte that was seen.
sep-by
(Fn [(Parser a), (Parser b)] (Parser (Array a)))
(sep-by p sep)
zero or more p separated by sep. Returns an
Array (possibly empty).
sep-by1
(Fn [(Parser a), (Parser b)] (Parser (Array a)))
(sep-by1 p sep)
one or more p separated by sep. Trailing sep
is not allowed (sep must be followed by another p). Returns an
Array of p values.
seq
(Fn [(Parser a), (Parser b)] (Parser (Pair a b)))
(seq p q)
runs p then q, returning a Pair of their values.
Failure of either propagates with appropriate consumption tracking.
set-run
(Fn [(Parser a), (Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a))] (Parser a))
sets the run property of a Parser.
set-run!
(Fn [(Ref (Parser a) b), (Fn [(Ref String c), Int, (Ref Cursor d)] (Reply a))] ())
sets the run property of a Parser in place.
skip-many
(Fn [(Parser a)] (Parser ()))
(skip-many p)
runs p zero or more times, discarding all
results. Returns (). Avoids building an Array, saving
allocations compared to many. Stops on empty failure of p.
skip-many1
(Fn [(Parser a)] (Parser ()))
(skip-many1 p)
runs p one or more times, discarding all
results. Returns (). Fails if the first application of p fails.
slice-of
(Fn [(Parser a)] (Parser String))
(slice-of p)
runs p; on success, replaces p's value with
the slice of input that p consumed (as a String). On failure,
propagates as-is.
string
(Fn [String] (Parser String))
(string s)
matches the literal string s byte-for-byte.
Atomic: on partial mismatch, fails empty (no consumed failure
mid-string).
take
(Fn [Int] (Parser String))
(take n)
consumes exactly n bytes, returning them as a
String. Fails empty if fewer than n bytes remain.
take-while
(Fn [(Fn [Char] Bool a)] (Parser String))
(take-while pred)
consumes bytes while pred holds. Always
succeeds (possibly with empty result). Returns the consumed slice as
a String.
take-while1
(Fn [(Fn [Char] Bool a)] (Parser String))
(take-while1 pred)
like take-while, but requires at least one byte to match.
then
(Fn [(Parser a), (Parser b)] (Parser b))
(then p q)
runs p then q, discarding p's value and returning
q's. Equivalent to (bind p (fn [_] q)) but does not allocate the
continuation closure per parse.
try
(Fn [(Parser a)] (Parser a))
(try p)
if p fails after consuming, restores the cursor and
reports the failure as empty so alt can backtrack.
update-run
(Fn [(Parser a), (Ref (Fn [(Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a))] (Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a)) d) e)] (Parser a))
updates the run property of a (Parser a) using a function f.