Parser

is a Parsec-style parser combinator library for Carp.

Installation

(load "git@github.com:carpentry-org/parsec@0.3.0")

Usage

(let [p (Parser.seq (Parser.byte \h) (Parser.byte \i))]
  (match (Parser.parse p "hi")
    (Result.Success _) (println "matched")
    (Result.Error e)   (IO.errorln &(Parser.format-error &e))))

Lexer

module

Module

is a submodule of common-token parsers: whitespace handling, integers, identifiers, and literal symbols.

UTF8

module

Module

is a submodule of codepoint-level parsers layered on the byte-level core. Each codepoint advances the cursor by one column, regardless of byte width.

alt

defn

(Fn [(Parser a), (Parser a)] (Parser a))

                        (alt p q)
                    

if p fails without consuming, runs q from the same cursor. If p consumes anything, no backtrack — wrap p in try to allow backtracking.

any-byte

defn

(Fn [] (Parser Char))

                        (any-byte)
                    

consumes any byte; fails only at end of input.

before

defn

(Fn [(Parser a), (Parser b)] (Parser a))

                        (before p q)
                    

runs p then q, returning p's value and discarding q's. The mirror image of then.

between

defn

(Fn [(Parser a), (Parser b), (Parser c)] (Parser c))

                        (between open close p)
                    

runs open, then p, then close, returning p's value. Once open consumes, all subsequent failures are consumed failures.

bind

defn

(Fn [(Parser a), (Fn [a] (Parser b))] (Parser b))

                        (bind p f)
                    

is the monadic bind. Runs p, then runs (f v) where v is p's value. The continuation parser is computed from the value.

bind-result

defn

(Fn [(Parser a), (Fn [a, (Ref Cursor b)] (Result c ParseErr))] (Parser c))

                        (bind-result p f)
                    

runs p, then applies f to its value and the cursor after p. f returns a (Result b ParseErr) directly, avoiding a per-parse Parser.pure/Parser.fail rebuild that the equivalent bind formulation would do. The cursor argument lets f build a ParseErr at the right position when it returns Result.Error.

byte

defn

(Fn [Char] (Parser Char))

                        (byte b)
                    

consumes a specific byte (as Char). Fails empty if the input doesn't match or is at EOF.

chainl

defn

(Fn [(Parser a), (Parser (Fn [a, a] a)), a] (Parser a))

                        (chainl p op default)
                    

like chainl1 but returns default if p fails on the first attempt.

chainl1

defn

(Fn [(Parser a), (Parser (Fn [a, a] a))] (Parser a))

                        (chainl1 p op)
                    

parses one or more occurrences of p separated by op. op returns a binary function used to combine values left-associatively. Example: (chainl1 (Lexer.integer) add-op) parses 1+2+3 as ((1+2)+3).

chainr

defn

(Fn [(Parser a), (Parser (Fn [a, a] a)), a] (Parser a))

                        (chainr p op default)
                    

like chainr1 but returns default if p fails on the first attempt.

chainr1

defn

(Fn [(Parser a), (Parser (Fn [a, a] a))] (Parser a))

                        (chainr1 p op)
                    

parses one or more occurrences of p separated by op. op returns a binary function used to combine values right-associatively. Example: with exponentiation as op, 2^3^2 parses as 2^(3^2) = 512.

choice

defn

(Fn [(Array (Parser a))] (Parser a))

                        (choice ps)
                    

tries each parser in turn; returns the first success. If all parsers fail empty, the resulting failure merges their expected sets. A consumed failure of any parser short-circuits (use try inside if needed).

copy

template

(Fn [(Ref (Parser a) b)] (Parser a))

copies a Parser.

count

defn

(Fn [Int, (Parser a)] (Parser (Array a)))

                        (count n p)
                    

runs p exactly n times, collecting results into an Array. If any iteration fails, propagates with appropriate consumption.

delete

template

(Fn [(Parser a)] ())

deletes a Parser. Should usually not be called manually.

end-by

defn

(Fn [(Parser a), (Parser b)] (Parser (Array a)))

                        (end-by p sep)
                    

zero or more p each followed by sep. Returns an Array (possibly empty). Each element must be terminated by sep.

end-by1

defn

(Fn [(Parser a), (Parser b)] (Parser (Array a)))

                        (end-by1 p sep)
                    

one or more p each followed by sep. Unlike sep-by1, the separator acts as a terminator — the last p must also be followed by sep. Returns an Array of p values.

eof

defn

(Fn [] (Parser ()))

                        (eof)
                    

succeeds without consuming iff the cursor is at end of input. When the input is not at end, the error's unexpected field is populated with the byte that was seen.

fail

defn

(Fn [String] (Parser a))

                        (fail lbl)
                    

fails empty, with the given label as the expected token.

format-error

defn

(Fn [(Ref ParseErr a)] String)

                        (format-error e)
                    

renders a ParseErr as a human-readable String. Format: parse error at LINE:COL: unexpected X; expected: Y, Z, W. Uses StringBuf internally.

init

template

(Fn [(Fn [(Ref String a), Int, (Ref Cursor b)] (Reply c))] (Parser c))

creates a Parser.

label

defn

(Fn [String, (Parser a)] (Parser a))

                        (label name p)
                    

replaces the expected-set on empty failure with name. Has no effect on consumed failures or successes.

lazy

defn

(Fn [(Fn [] (Parser a))] (Parser a))

                        (lazy thunk)
                    

defers parser construction until run time. Use this to break eager-construction infinite recursion in recursive grammars.

The thunk must call a sibling function, not the immediately enclosing one. Pattern: split a recursive grammar into two functions, with the lazy thunk in one calling the other.

(sig sexp (Fn [] (Parser SExp)))
(defn list-p []
  (Parser.between open close
    (Parser.many (Parser.lazy (fn [] (sexp))))))
(defn sexp [] (Parser.alt (atom-p) (list-p)))

Each invocation rebuilds the deferred parser. For deep recursion, prefer Parser.recurse against a pre-built grammar in a top-level def.

lookahead

defn

(Fn [(Parser a)] (Parser a))

                        (lookahead p)
                    

runs p, then restores the cursor. p's value is kept, but no input is consumed. p's failures propagate as-is.

many

defn

(Fn [(Parser a)] (Parser (Array a)))

                        (many p)
                    

runs p zero or more times, collecting results. Stops on empty failure of p. If p succeeds without consuming, the loop stops to avoid infinite recursion (do not use many with an empty-success parser).

many-till

defn

(Fn [(Parser a), (Parser b)] (Parser (Array a)))

                        (many-till p end)
                    

runs p repeatedly until end succeeds, collecting p results in an Array. Returns the array (not the end result). The end parser is tried before each p. If end can consume on failure, wrap it in try.

many1

defn

(Fn [(Parser a)] (Parser (Array a)))

                        (many1 p)
                    

runs p one or more times. Fails (with same consumption as p's first failure) if p doesn't match at least once.

map

defn

(Fn [(Parser a), (Fn [a] b)] (Parser b))

                        (map p f)
                    

applies f to the value produced by p.

not-followed-by

defn

(Fn [(Parser a)] (Parser ()))

                        (not-followed-by p)
                    

succeeds (without consuming) if p fails; fails if p succeeds. Useful for negative assertions.

optional

defn

(Fn [(Parser a)] (Parser (Maybe a)))

                        (optional p)
                    

if p succeeds, wraps the value in Just; if p fails empty, succeeds with Nothing without consuming. Consumed failures propagate.

parse

defn

(Fn [(Parser a), (Ref String b)] (Result a ParseErr))

                        (parse p src)
                    

runs p over the entire input. Strict: fails if p doesn't consume all of src. Returns (Result a ParseErr).

parse-partial

defn

(Fn [(Parser a), (Ref String b)] (Result (Pair a String) ParseErr))

                        (parse-partial p src)
                    

runs p on the input without requiring it to consume everything. On success, returns a Pair of the parsed value and the remaining unconsumed input as a String. On error, returns the ParseErr.

placeholder

defn

(Fn [] (Parser a))

                        (placeholder)
                    

is an uninitialized parser that always fails. Use as the initial value for a top-level def that will be set! to the real parser before parsing. Companion to recurse.

prn

template

(Fn [(Ref (Parser a) b)] String)

converts a Parser to a string.

pure

defn

(Fn [a] (Parser a))

                        (pure x)
                    

succeeds without consuming, producing the given value.

recurse

defn

(Fn [(Ref (Parser a) StaticLifetime)] (Parser a))

                        (recurse pref)
                    

runs a parser referenced by a stable static reference, without rebuilding it per call. Use for deeply recursive grammars where lazy would rebuild the parser tree at every level.

Pattern: declare the recursive grammar in a top-level def, set! it once at startup, and reference it from sub-parsers via recurse.

(def *sexp* (the (Parser SExp) (Parser.placeholder)))

(defn list-p []
  (Parser.between open close
    (Parser.many (Parser.recurse &*sexp*))))

(defn init-grammar []
  (set! *sexp* (Parser.alt (atom-p) (list-p))))

The grammar must be initialized via set! before parsing.

run

instantiate

(Fn [(Ref (Parser a) b)] (Ref (Fn [(Ref String c), Int, (Ref Cursor d)] (Reply a)) b))

gets the run property of a Parser.

satisfy

defn

(Fn [(Fn [Char] Bool a), String] (Parser Char))

                        (satisfy pred lbl)
                    

consumes one byte if pred holds, otherwise fails empty. When the predicate rejects a byte, the error's unexpected field is populated with the byte that was seen.

sep-by

defn

(Fn [(Parser a), (Parser b)] (Parser (Array a)))

                        (sep-by p sep)
                    

zero or more p separated by sep. Returns an Array (possibly empty).

sep-by1

defn

(Fn [(Parser a), (Parser b)] (Parser (Array a)))

                        (sep-by1 p sep)
                    

one or more p separated by sep. Trailing sep is not allowed (sep must be followed by another p). Returns an Array of p values.

seq

defn

(Fn [(Parser a), (Parser b)] (Parser (Pair a b)))

                        (seq p q)
                    

runs p then q, returning a Pair of their values. Failure of either propagates with appropriate consumption tracking.

set-run

template

(Fn [(Parser a), (Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a))] (Parser a))

sets the run property of a Parser.

set-run!

template

(Fn [(Ref (Parser a) b), (Fn [(Ref String c), Int, (Ref Cursor d)] (Reply a))] ())

sets the run property of a Parser in place.

skip-many

defn

(Fn [(Parser a)] (Parser ()))

                        (skip-many p)
                    

runs p zero or more times, discarding all results. Returns (). Avoids building an Array, saving allocations compared to many. Stops on empty failure of p.

skip-many1

defn

(Fn [(Parser a)] (Parser ()))

                        (skip-many1 p)
                    

runs p one or more times, discarding all results. Returns (). Fails if the first application of p fails.

slice-of

defn

(Fn [(Parser a)] (Parser String))

                        (slice-of p)
                    

runs p; on success, replaces p's value with the slice of input that p consumed (as a String). On failure, propagates as-is.

str

template

(Fn [(Ref (Parser a) b)] String)

converts a Parser to a string.

string

defn

(Fn [String] (Parser String))

                        (string s)
                    

matches the literal string s byte-for-byte. Atomic: on partial mismatch, fails empty (no consumed failure mid-string).

take

defn

(Fn [Int] (Parser String))

                        (take n)
                    

consumes exactly n bytes, returning them as a String. Fails empty if fewer than n bytes remain.

take-while

defn

(Fn [(Fn [Char] Bool a)] (Parser String))

                        (take-while pred)
                    

consumes bytes while pred holds. Always succeeds (possibly with empty result). Returns the consumed slice as a String.

take-while1

defn

(Fn [(Fn [Char] Bool a)] (Parser String))

                        (take-while1 pred)
                    

like take-while, but requires at least one byte to match.

then

defn

(Fn [(Parser a), (Parser b)] (Parser b))

                        (then p q)
                    

runs p then q, discarding p's value and returning q's. Equivalent to (bind p (fn [_] q)) but does not allocate the continuation closure per parse.

try

defn

(Fn [(Parser a)] (Parser a))

                        (try p)
                    

if p fails after consuming, restores the cursor and reports the failure as empty so alt can backtrack.

update-run

instantiate

(Fn [(Parser a), (Ref (Fn [(Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a))] (Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a)) d) e)] (Parser a))

updates the run property of a (Parser a) using a function f.