Parser

is a Parsec-style parser combinator library for Carp.

Installation

(load "git@github.com:carpentry-org/parsec@0.1.0")

Usage

(let [p (Parser.seq (Parser.byte \h) (Parser.byte \i))]
  (match (Parser.parse p "hi")
    (Result.Success _) (println "matched")
    (Result.Error e)   (IO.errorln &(Parser.format-error &e))))

Lexer

module

Module

is a submodule of common-token parsers: whitespace handling, integers, identifiers, and literal symbols.

UTF8

module

Module

is a submodule of codepoint-level parsers layered on the byte-level core. Each codepoint advances the cursor by one column, regardless of byte width.

alt

defn

(Fn [(Parser a), (Parser a)] (Parser a))

                        (alt p q)
                    

if p fails without consuming, runs q from the same cursor. If p consumes anything, no backtrack — wrap p in try to allow backtracking.

any-byte

defn

(Fn [] (Parser Char))

                        (any-byte)
                    

consumes any byte; fails only at end of input.

between

defn

(Fn [(Parser a), (Parser b), (Parser c)] (Parser c))

                        (between open close p)
                    

runs open, then p, then close, returning p's value. Once open consumes, all subsequent failures are consumed failures.

bind

defn

(Fn [(Parser a), (Fn [a] (Parser b))] (Parser b))

                        (bind p f)
                    

is the monadic bind. Runs p, then runs (f v) where v is p's value. The continuation parser is computed from the value.

byte

defn

(Fn [Char] (Parser Char))

                        (byte b)
                    

consumes a specific byte (as Char). Fails empty if the input doesn't match or is at EOF.

choice

defn

(Fn [(Array (Parser a))] (Parser a))

                        (choice ps)
                    

tries each parser in turn; returns the first success. If all parsers fail empty, the resulting failure merges their expected sets. A consumed failure of any parser short-circuits (use try inside if needed).

copy

template

(Fn [(Ref (Parser a) b)] (Parser a))

copies a Parser.

count

defn

(Fn [Int, (Parser a)] (Parser (Array a)))

                        (count n p)
                    

runs p exactly n times, collecting results into an Array. If any iteration fails, propagates with appropriate consumption.

delete

template

(Fn [(Parser a)] ())

deletes a Parser. Should usually not be called manually.

eof

defn

(Fn [] (Parser ()))

                        (eof)
                    

succeeds without consuming iff the cursor is at end of input.

fail

defn

(Fn [String] (Parser a))

                        (fail lbl)
                    

fails empty, with the given label as the expected token.

format-error

defn

(Fn [(Ref ParseErr a)] String)

                        (format-error e)
                    

renders a ParseErr as a human-readable String. Format: parse error at LINE:COL: unexpected X; expected: Y, Z, W. Uses StringBuf internally.

init

template

(Fn [(Fn [(Ref String a), Int, (Ref Cursor b)] (Reply c))] (Parser c))

creates a Parser.

label

defn

(Fn [String, (Parser a)] (Parser a))

                        (label name p)
                    

replaces the expected-set on empty failure with name. Has no effect on consumed failures or successes.

lazy

defn

(Fn [(Fn [] (Parser a))] (Parser a))

                        (lazy thunk)
                    

defers parser construction until run time. Use this to break eager-construction infinite recursion in recursive grammars.

The thunk must call a sibling function, not the immediately enclosing one. Pattern: split a recursive grammar into two functions, with the lazy thunk in one calling the other.

(sig sexp (Fn [] (Parser SExp)))
(defn list-p []
  (Parser.between open close
    (Parser.many (Parser.lazy (fn [] (sexp))))))
(defn sexp [] (Parser.alt (atom-p) (list-p)))

Each invocation rebuilds the deferred parser. For deep recursion, prefer Parser.recurse against a pre-built grammar in a top-level def.

lookahead

defn

(Fn [(Parser a)] (Parser a))

                        (lookahead p)
                    

runs p, then restores the cursor. p's value is kept, but no input is consumed. p's failures propagate as-is.

many

defn

(Fn [(Parser a)] (Parser (Array a)))

                        (many p)
                    

runs p zero or more times, collecting results. Stops on empty failure of p. If p succeeds without consuming, the loop stops to avoid infinite recursion (do not use many with an empty-success parser).

many1

defn

(Fn [(Parser a)] (Parser (Array a)))

                        (many1 p)
                    

runs p one or more times. Fails (with same consumption as p's first failure) if p doesn't match at least once.

map

defn

(Fn [(Parser a), (Fn [a] b)] (Parser b))

                        (map p f)
                    

applies f to the value produced by p.

not-followed-by

defn

(Fn [(Parser a)] (Parser ()))

                        (not-followed-by p)
                    

succeeds (without consuming) if p fails; fails if p succeeds. Useful for negative assertions.

optional

defn

(Fn [(Parser a)] (Parser (Maybe a)))

                        (optional p)
                    

if p succeeds, wraps the value in Just; if p fails empty, succeeds with Nothing without consuming. Consumed failures propagate.

parse

defn

(Fn [(Parser a), (Ref String b)] (Result a ParseErr))

                        (parse p src)
                    

runs p over the entire input. Strict: fails if p doesn't consume all of src. Returns (Result a ParseErr).

placeholder

defn

(Fn [] (Parser a))

                        (placeholder)
                    

is an uninitialized parser that always fails. Use as the initial value for a top-level def that will be set! to the real parser before parsing. Companion to recurse.

prn

template

(Fn [(Ref (Parser a) b)] String)

converts a Parser to a string.

pure

defn

(Fn [a] (Parser a))

                        (pure x)
                    

succeeds without consuming, producing the given value.

recurse

defn

(Fn [(Ref (Parser a) StaticLifetime)] (Parser a))

                        (recurse pref)
                    

runs a parser referenced by a stable static reference, without rebuilding it per call. Use for deeply recursive grammars where lazy would rebuild the parser tree at every level.

Pattern: declare the recursive grammar in a top-level def, set! it once at startup, and reference it from sub-parsers via recurse.

(def *sexp* (the (Parser SExp) (Parser.placeholder)))

(defn list-p []
  (Parser.between open close
    (Parser.many (Parser.recurse &*sexp*))))

(defn init-grammar []
  (set! *sexp* (Parser.alt (atom-p) (list-p))))

The grammar must be initialized via set! before parsing.

run

instantiate

(Fn [(Ref (Parser a) b)] (Ref (Fn [(Ref String c), Int, (Ref Cursor d)] (Reply a)) b))

gets the run property of a Parser.

satisfy

defn

(Fn [(Fn [Char] Bool a), String] (Parser Char))

                        (satisfy pred lbl)
                    

consumes one byte if pred holds, otherwise fails empty.

sep-by

defn

(Fn [(Parser a), (Parser b)] (Parser (Array a)))

                        (sep-by p sep)
                    

zero or more p separated by sep. Returns an Array (possibly empty).

sep-by1

defn

(Fn [(Parser a), (Parser b)] (Parser (Array a)))

                        (sep-by1 p sep)
                    

one or more p separated by sep. Trailing sep is not allowed (sep must be followed by another p). Returns an Array of p values.

seq

defn

(Fn [(Parser a), (Parser b)] (Parser (Pair a b)))

                        (seq p q)
                    

runs p then q, returning a Pair of their values. Failure of either propagates with appropriate consumption tracking.

set-run

template

(Fn [(Parser a), (Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a))] (Parser a))

sets the run property of a Parser.

set-run!

template

(Fn [(Ref (Parser a) b), (Fn [(Ref String c), Int, (Ref Cursor d)] (Reply a))] ())

sets the run property of a Parser in place.

slice-of

defn

(Fn [(Parser a)] (Parser String))

                        (slice-of p)
                    

runs p; on success, replaces p's value with the slice of input that p consumed (as a String). On failure, propagates as-is.

str

template

(Fn [(Ref (Parser a) b)] String)

converts a Parser to a string.

string

defn

(Fn [String] (Parser String))

                        (string s)
                    

matches the literal string s byte-for-byte. Atomic: on partial mismatch, fails empty (no consumed failure mid-string).

take

defn

(Fn [Int] (Parser String))

                        (take n)
                    

consumes exactly n bytes, returning them as a String. Fails empty if fewer than n bytes remain.

take-while

defn

(Fn [(Fn [Char] Bool a)] (Parser String))

                        (take-while pred)
                    

consumes bytes while pred holds. Always succeeds (possibly with empty result). Returns the consumed slice as a String.

take-while1

defn

(Fn [(Fn [Char] Bool a)] (Parser String))

                        (take-while1 pred)
                    

like take-while, but requires at least one byte to match.

try

defn

(Fn [(Parser a)] (Parser a))

                        (try p)
                    

if p fails after consuming, restores the cursor and reports the failure as empty so alt can backtrack.

update-run

instantiate

(Fn [(Parser a), (Ref (Fn [(Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a))] (Fn [(Ref String b), Int, (Ref Cursor c)] (Reply a)) d) e)] (Parser a))

updates the run property of a (Parser a) using a function f.