Parser
is a Parsec-style parser combinator library for Carp.
Installation
(load "git@github.com:carpentry-org/parsec@0.1.0")
Usage
(let [p (Parser.seq (Parser.byte \h) (Parser.byte \i))]
(match (Parser.parse p "hi")
(Result.Success _) (println "matched")
(Result.Error e) (IO.errorln &(Parser.format-error &e))))
Lexer
Module
is a submodule of common-token parsers: whitespace handling, integers, identifiers, and literal symbols.
UTF8
Module
is a submodule of codepoint-level parsers layered on the byte-level core. Each codepoint advances the cursor by one column, regardless of byte width.
alt
(Fn [(Parser a), (Parser a)] (Parser a))
(alt p q)
if p fails without consuming, runs q from the same
cursor. If p consumes anything, no backtrack — wrap p in try
to allow backtracking.
between
(Fn [(Parser a), (Parser b), (Parser c)] (Parser c))
(between open close p)
runs open, then p, then close, returning
p's value. Once open consumes, all subsequent failures are
consumed failures.
bind
(Fn [(Parser a), (Fn [a] (Parser b))] (Parser b))
(bind p f)
is the monadic bind. Runs p, then runs (f v) where
v is p's value. The continuation parser is computed from the
value.
byte
(Fn [Char] (Parser Char))
(byte b)
consumes a specific byte (as Char). Fails empty if
the input doesn't match or is at EOF.
choice
(Fn [(Array (Parser a))] (Parser a))
(choice ps)
tries each parser in turn; returns the first
success. If all parsers fail empty, the resulting failure merges
their expected sets. A consumed failure of any parser short-circuits
(use try inside if needed).
count
(Fn [Int, (Parser a)] (Parser (Array a)))
(count n p)
runs p exactly n times, collecting results into
an Array. If any iteration fails, propagates with appropriate
consumption.
fail
(Fn [String] (Parser a))
(fail lbl)
fails empty, with the given label as the expected token.
format-error
(Fn [(Ref ParseErr a)] String)
(format-error e)
renders a ParseErr as a human-readable
String. Format: parse error at LINE:COL: unexpected X; expected: Y, Z, W.
Uses StringBuf internally.
init
(Fn [(Fn [(Ref String a), Int, (Ref Cursor b)] (Reply c))] (Parser c))
creates a Parser.
label
(Fn [String, (Parser a)] (Parser a))
(label name p)
replaces the expected-set on empty failure with
name. Has no effect on consumed failures or successes.
lazy
(Fn [(Fn [] (Parser a))] (Parser a))
(lazy thunk)
defers parser construction until run time. Use this to break eager-construction infinite recursion in recursive grammars.
The thunk must call a sibling function, not the immediately enclosing one. Pattern: split a recursive grammar into two functions, with the lazy thunk in one calling the other.
(sig sexp (Fn [] (Parser SExp)))
(defn list-p []
(Parser.between open close
(Parser.many (Parser.lazy (fn [] (sexp))))))
(defn sexp [] (Parser.alt (atom-p) (list-p)))
Each invocation rebuilds the deferred parser. For deep recursion, prefer
Parser.recurse against a pre-built grammar in a top-level def.
lookahead
(Fn [(Parser a)] (Parser a))
(lookahead p)
runs p, then restores the cursor. p's value
is kept, but no input is consumed. p's failures propagate as-is.
many
(Fn [(Parser a)] (Parser (Array a)))
(many p)
runs p zero or more times, collecting results. Stops
on empty failure of p. If p succeeds without consuming, the loop
stops to avoid infinite recursion (do not use many with an
empty-success parser).
many1
(Fn [(Parser a)] (Parser (Array a)))
(many1 p)
runs p one or more times. Fails (with same
consumption as p's first failure) if p doesn't match at least
once.
not-followed-by
(Fn [(Parser a)] (Parser ()))
(not-followed-by p)
succeeds (without consuming) if p fails;
fails if p succeeds. Useful for negative assertions.
optional
(Fn [(Parser a)] (Parser (Maybe a)))
(optional p)
if p succeeds, wraps the value in Just; if p
fails empty, succeeds with Nothing without consuming. Consumed
failures propagate.
parse
(Fn [(Parser a), (Ref String b)] (Result a ParseErr))
(parse p src)
runs p over the entire input. Strict: fails if p
doesn't consume all of src. Returns (Result a ParseErr).
placeholder
(Fn [] (Parser a))
(placeholder)
is an uninitialized parser that always fails.
Use as the initial value for a top-level def that will be set! to
the real parser before parsing. Companion to recurse.
recurse
(Fn [(Ref (Parser a) StaticLifetime)] (Parser a))
(recurse pref)
runs a parser referenced by a stable static reference,
without rebuilding it per call. Use for deeply recursive grammars where
lazy would rebuild the parser tree at every level.
Pattern: declare the recursive grammar in a top-level def, set! it
once at startup, and reference it from sub-parsers via recurse.
(def *sexp* (the (Parser SExp) (Parser.placeholder)))
(defn list-p []
(Parser.between open close
(Parser.many (Parser.recurse &*sexp*))))
(defn init-grammar []
(set! *sexp* (Parser.alt (atom-p) (list-p))))
The grammar must be initialized via set! before parsing.
run
(Fn [(Ref (Parser a) b)] (Ref (Fn [(Ref String c), Int, (Ref Cursor d)] (Reply a)) b))
gets the run property of a Parser.
satisfy
(Fn [(Fn [Char] Bool a), String] (Parser Char))
(satisfy pred lbl)
consumes one byte if pred holds, otherwise fails empty.
sep-by
(Fn [(Parser a), (Parser b)] (Parser (Array a)))
(sep-by p sep)
zero or more p separated by sep. Returns an
Array (possibly empty).
sep-by1
(Fn [(Parser a), (Parser b)] (Parser (Array a)))
(sep-by1 p sep)
one or more p separated by sep. Trailing sep
is not allowed (sep must be followed by another p). Returns an
Array of p values.
seq
(Fn [(Parser a), (Parser b)] (Parser (Pair a b)))
(seq p q)
runs p then q, returning a Pair of their values.
Failure of either propagates with appropriate consumption tracking.
set-run
(Fn [(Parser a), (Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a))] (Parser a))
sets the run property of a Parser.
set-run!
(Fn [(Ref (Parser a) b), (Fn [(Ref String c), Int, (Ref Cursor d)] (Reply a))] ())
sets the run property of a Parser in place.
slice-of
(Fn [(Parser a)] (Parser String))
(slice-of p)
runs p; on success, replaces p's value with
the slice of input that p consumed (as a String). On failure,
propagates as-is.
string
(Fn [String] (Parser String))
(string s)
matches the literal string s byte-for-byte.
Atomic: on partial mismatch, fails empty (no consumed failure
mid-string).
take
(Fn [Int] (Parser String))
(take n)
consumes exactly n bytes, returning them as a
String. Fails empty if fewer than n bytes remain.
take-while
(Fn [(Fn [Char] Bool a)] (Parser String))
(take-while pred)
consumes bytes while pred holds. Always
succeeds (possibly with empty result). Returns the consumed slice as
a String.
take-while1
(Fn [(Fn [Char] Bool a)] (Parser String))
(take-while1 pred)
like take-while, but requires at least one byte to match.
try
(Fn [(Parser a)] (Parser a))
(try p)
if p fails after consuming, restores the cursor and
reports the failure as empty so alt can backtrack.
update-run
(Fn [(Parser a), (Ref (Fn [(Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a))] (Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a)) d) e)] (Parser a))
updates the run property of a (Parser a) using a function f.