Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

In the current context of a rapidly evolving research effort towards LLM-based formal reasoning and theorem proving, our aim is to contribute a novel approach that complements existing methods with a focus on formal guarantees of correctness.

In this blog post Proposition 7: Truth by Construction, we introduced the core idea of our approach: guide LLMs to produce only tokens that are guaranteed to be part of a correct expression in a language. Such approaches already exist in various forms, but our contribution lies in the generalization of this idea to a larger and extensible class of formal languages, along with a concrete algorithm to achieve this goal.

This document serves as a formal specification of the theoretical foundations, as documentation of the algorithm, and as a reference for future work building upon this approach.

The Aufbau Engine

Aufbau is the grammar-driven completion engine powering the Proposition 7 project. Given a grammar and a partial expression it computes the set of next tokens that can extend the expression toward a complete, well-typed program. The central guarantee is that every suggested token keeps the expression on a valid trajectory that can then be completed until it reaches a complete (a word in the language defined by the grammar)

The name comes from the German word for “construction” or “build-up” from Carnap’s Der logische Aufbau der Welt

Structure

The specification follows the layered structure of the implementation:

  1. Basic Concepts : alphabets, Kleene closure, formal languages, and the completability set $\mathcal{C}_L(s)$ at the heart of the engine

  2. Concepts: the more advanced theoretical framework

  3. Parsing: the chart parser, the segment model, the span cache and its invalidation strategy, and the meta-parser that reads grammar specifications.

  4. Typing: the type system, the inference rules encoded in grammar specs, context propagation, and the typed AST overlay that is built on top of the partial parse forest.

  5. Completion: how the engine synthesizes and scores completion candidates, and how the beam synthesizer explores the space of completions.

Invariant

The key correctness property that runs through every layer of the specification:

Theorem: Completion Soundness

For any partial expression $s$ and any token $a$ returned by the completion engine,

$$s \cdot a \text{ is completable}$$

That is, every suggested token keeps the expression on a path toward at least one complete, well-typed program. We’ll see better deinfiiotn of completability later on.

DONE Basic Concepts and Notations

First, we are going to establish some simple concepts of programming language theory, so that we can refer to them later in the specification.

Definition: Alphabet

An alphabet, often noted sigma $\Sigma$ is a finite set of symbols.

We are going to use shorthand notation for ranges of symbols in alphabets. For example a-z denotes all lowercase letters from a to z, and 0-9 denotes all digits from 0 to 9.

Example: Alphabet Example

For example, the alphabet for arithmetic expressions could be defined as: $$\Sigma_\text{Arithmetic} = \{0\text{-}9, +, -, *, /\}$$

An alphabet can be subjected to various operations to create different kinds of sets. We usually call those sets languages. The Kleene star operation, denoted $X^*$, creates the set of all possible strings that can be formed by concatenating zero or more elements from $X$. It’s similar to the power set operation, but instead of creating subsets, it creates strings.

Definition: Kleene Closure

The Kleene closure of an alphabet $\Sigma$, denoted $\Sigma^\ast$, is the set of all possible strings that can be formed by concatenating elements of $\Sigma$.

Example: Kleene Closure Example

For example, the Kleene closure of the alphabet $\Sigma = {a,b}^\ast$ is : $$\Sigma^\ast = \{\epsilon, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, \ldots\}$$

A language is defined as a set of strings over an alphabet. A string is a finite sequence of symbols from the alphabet. A string in a language is called a word.

Lemma: Language Closure Lemma

For all languages $L$ over an alphabet $\Sigma$, $$L \subset \Sigma^\ast$$

The concatenation operation is written as $s \cdot t$ or the shorthand $st$ for strings $s, t \in \Sigma^\ast$. It is defined as the string obtained by appending $t$ to $s$.

Remark: Notation

The empty input is denoted by $\epsilon$.

Definitions and notations

Let $L$ be a formal language over an alphabet $\Sigma$.

An expression $s \in \Sigma^\ast$ is said to be completable in $L$ if there exists some $s’ \in \Sigma^\ast$ such that $ss’ \in L$

Definition: Completability Set

For any expression $s \in \Sigma^\ast$, the completability set in $L$ is: $$\mathcal{C}_L(s) = \{a \in \Sigma : \exists s’ \in \Sigma^\ast : sas’ \in L\}$$

The completability set is central to our formalism. We can use it to state an important equivalence:

Theorem: Completability Equivalence

$ \forall s \in \Sigma^\ast$,

$s \text{ completable in } L \iff \mathcal{C}_L(s) \neq \emptyset$

Completability of $s$ in $L$ thus reduces to checking whether its completability set is non-empty.

Remark: Nullable Completability

As an interesting edge case, if $s \in L$, then $s$ is complete in $L$ but also completable in $L$ and its completability set is the nullable set: $$ s\in L \iff \mathcal{C}_L(s) = \{\epsilon\} $$

DONE Theoretical Foundation

This chapter defines the formal structures underlying the Aufbau system: grammars, parse forests, partial trees, typing judgments, and the completability guarantee.

Sources: src/logic/grammar/mod.rs, src/logic/partial/structure.rs, src/logic/typing/mod.rs

Grammar

Definition: Grammar

A grammar $G$ is a tuple $(N, T, P, S, \Theta, A)$ where:

  • $N$: finite set of non-terminals (strings).
  • $T$: finite set of terminals, each a derivative regex over the input alphabet.
  • $P$: set of productions. Each production is a pair $(\text{lhs} \in N,\ \text{rhs} \in (T \cup N)^*)$ with an optional typing rule name.
  • $S \in N$: the distinguished start symbol.
  • $\Theta$: a finite map from rule names to typing rules (may be empty).
  • $A: N \to P^*$: the alternative map — for each non-terminal, the ordered list of its productions.

Symbol

Definition: Symbol

A symbol is either:

  • A terminal Symbol::Terminal { regex, binding }: matches a segment of input via a derivative regex. Literal strings 'x' and "x" compile to Regex::literal(x); /pattern/ compiles to a full regex.
  • A non-terminal Symbol::Nonterminal { name, binding }: a reference to another production rule by name.

Both variants carry an optional binding name used by the typing system to reference sub-trees.

Production

Definition: Production

A production is a sequence of symbols $\alpha_0[b_0],\alpha_1[b_1]\cdots\alpha_n[b_n]$ where each $\alpha_k \in T \cup N$ and each $b_k \in \mathcal{B} \cup {\varepsilon}$ is an optional binding name. Productions also carry an optional rule name string used to look up a typing rule in $\Theta$.

Remark: Grammar Equality

Grammar equality (used for the meta-parser cache) compares productions, special_tokens, delimiters, and start. The typing_rules field is deliberately excluded because of indeterminate forms like the meta variables ?X or the context calls.

DONE Partial Trees and Forests

Definition: Partial Tree

A partial tree $t = (V, E,\lambda,\pi,\alpha,\text{root})$ is a structure where:

  • $V$: finite set of nodes.
  • $E \subseteq V \times \mathbb{N} \times V$: position-indexed edges.
  • $\lambda: V \to N \cup T$: node labels.
  • $\pi: V \to P$: production associated with the node.
  • $\alpha: V \to \mathbb{N}$: chosen alternative index.
  • $\text{root} \in V$: the start node.

Parse Forest

Definition: Partial Parse Forest

For a completable input $s \in \Sigma^*$ and grammar $G$, the partial parse forest $\mathcal{F}(s)$ is the set of all rooted partial parse trees that match $s$.

In the implementation, partial parse forests are stored as Shared Packed Parse Forrects (SPPF) to save up space.

Partial Parser

Definition: Partial Parser

A partial parser for grammar $G$ is a function: $$\Psi_G : \Sigma^* \to \{ \mathcal{F},\text{reject} \}$$ Mapping each valid (completable) input to a partial parse forest and invalid inputs to $\text{reject}$.

The parser is sound in the sense that every prefix of $s$ will be accepted by it and produce a forest.

Tree Paths

Definition: Tree Path

The space of tree paths is $\mathcal{P} = \mathbb{N}^*$, where each coordinate is an edge index pointing to a specific child. Every node $v \in V$ has a unique path $\text{path}(v)$ from the root:

$$\text{path}(v) = \begin{cases} \varepsilon & \text{if } v \text{ is root} \\ \text{path}(v’) \cdot i & \text{if } v’ \text{ is parent of } v \text{ and } v \text{ is its } i\text{-th child} \end{cases}$$

Lemma: Path Injectivity

Paths are injective: $\text{path}_T(x_1) = \text{path}_T(x_2) \implies x_1 = x_2$.

We use $x[i]$ for the child at index $i$, and extend this to relative paths $x[p]$.

Terminal Status and Extensibility

A terminal node matching input $s$ has a status:

  • Complete: Matched a full token. It may carry an extension derivative $D_s(r)$ for further matching.
  • Partial: Matched only a prefix. It carries a remainder derivative for what must follow.
Definition: Extensibility

A terminal $v$ is extensible if its derivative with respect to its matched text is non-empty: $$\text{extensible}(v) \iff D_{\text{matched}(v)}(r_v) \neq \emptyset$$

A terminal can be both complete and extensible. For example, /[a-z]+/ matching foo is complete but extensible by bar. Conversely, a terminal is incomplete and not extensible only if its remainder regex is $\emptyset$ (a parse error). This distinction is critical for binding resolution where bindings to extensible terminals on the rightmost spine are treated as Partial.

Completeness and Frontiers

A node is complete by induction:

  • A terminal is complete if it matched the full input.
  • A non-terminal $v$ is complete $\iff$ all symbols in its production $\pi(v)$ are satisfied.
Definition: Frontier

The frontier is the path to the unique incomplete node where parsing stopped. In a complete tree, there is no frontier.

Note that this is not extacly the same as extensibility. A (complete) tree will have no frontier, but can have extensible nodes.

Definition: Forest Completeness

A forest is complete $\iff$ at least one tree in it is complete.

Monotonicity

For partial parses $\mathcal{F}(s)$ extending to $\mathcal{F}(s \cdot t)$, the frontier path length is monotonically non-decreasing.

Lemma: Frontier Monotonicity

$\operatorname{front}(\mathcal{F}(s \cdot t)) \geq \operatorname{front}(\mathcal{F}(s))$

This property is a primary invariant for parser correctness; a decreasing frontier indicates an engine failure.

Example: Lambda Completion

Input: λx:Int.

Partial Tree:

T = Expression
     └─[0]→ Abstraction(abs)  [complete = false]
            ├─[0]→ "λ"
            ├─[1]→ Identifier("x")
            ├─[2]→ ":"
            ├─[3]→ Type
            │      └─[0]→ AtomicType
            │             └─[0]→ BaseType("Int")
            ├─[4]→ "."
            └─[5]→ Expression  [Missing / End of Input]

Frontier: path [0, 5]. Choosing "x" as a completion yields a complete tree where the Abstraction node is satisfied.

DONE Binding Resolution

Given grammar $G = (N, T, P, S, \Theta, A)$, we construct a binding map $\beta: \mathcal{B} \times P \to \mathcal{P}^*$ associating binding identifiers with grammar paths.

Definition: Grammar Path

A grammar path is a sequence of non-terminals, production indices, and alternative indices: $$ p = x_1@a_1 \to x_2@a_2 \to \dots \to x_n@a_n $$ where:

  • $x_i \in \mathbb{N}$: index in the production.
  • $a_i \in \mathbb{N}$: alternative index chosen for non-terminal $x_i$.

Grammar paths navigate grammar structure relative to a production. Alternative indices act as constraints: we only follow a path if the tree node’s chosen alternative matches the path. This allows grammar paths to convert to tree paths at runtime while ensuring the tree respects the specified structure.

Regular Grammar Paths

Definition: Regular Grammar Path

For recursive structures, Regular Grammar Paths are regular expressions over $\Sigma = \mathbb{N} \times \mathbb{N}$, where $i@k$ represents traversing the $i$-th child of a node selecting alternative $k$.

Example: binding τ through nested parentheses: $$ \beta(\tau, \text{Abstraction}) = 3@0 \cdot (0@1 \cdot 1@0)^* \cdot 0@0 \cdot 0 $$

Binding Map Construction

For each production $p \in P$ with rule $\Theta(p)$:

  1. Extraction: Let $\mathcal{B}_\theta$ be bindings mentioned in $\Theta(p)$.
  2. Traversal: For $b \in \mathcal{B}_\theta$, find $b$ in $p$ at position $k$. If $\alpha_k[b]$ is a non-terminal, recursively explore alternatives in $A(\alpha_k)$ via BFS.
  3. Invariant Extraction: Compute acyclic grammar paths $\mathcal{P}_b^p$ from rule $\theta$ to binding $b$. Paths must include alternative annotations. Use regular grammar paths for cycles.
  4. Store: $\beta(b, p) = \mathcal{P}_b^p$.

Grammar paths are static; they are fixed at load time and guide tree path generation during evaluation.

Binding Resolution

For tree $T$, binding $b$, and node $N$, we resolve nodes $V_b$ by matching tree paths against $R_b = \beta(b, N)$. A node $v$ matches if:

  1. Edge Match: The edge index sequence matches $R_b$.
  2. Alternative Verification: For every step $i@k$, the tree node has alternative $k$.

If edge match succeeds but alternative verification fails for all invariants, the parse is rejected.

Correctness and Constraints

Theorem: Binding Invariance

For any binding $b$ in production $p$, $\beta(b, p)$ is uniquely determined by the grammar structure and independent of input.

Assumption: Binding Uniqueness

If $|\beta(b, p)| > 1$, then $b$ must be declared as tuple-typed in $\Theta(p)$ as (b...).

Remark: Limitations

The construction algorithm enforces a limit: MAX_RECURSION_DEPTH = 16. Paths deeper than this are omitted. Consequently, $\beta(b, p)$ is an under-approximation for deeply recursive grammars, and bindings at extreme depths may fail to resolve.

DONE Type System

This chapter defines the type language, the typed tree and forest structures, the typing judgment, and the subtyping relation. Types in Aufbau are not information carriers in the classical sense: they are constraints on completability. A type describes what extensions of a partial expression may become.

The Typing Judgment

Definition: Typing Judgment

The typing judgment

$$\Gamma \vdash_\Theta t : \mathcal{S}$$

reads “under context $\Gamma$ and typing rules $\Theta$, tree $t$ has status $\mathcal{S}$.” When no external context is needed we write $\vdash_\Theta t : \mathcal{S}$ with an empty context.

Tree Status

Definition: Tree Status

The tree status lattice $\mathcal{S}$ classifies the result of typing a tree:

$$\text{Valid}(\tau) \sqsupset \text{Partial}(\tau) \sqsupset \text{Invalid}$$

  • $\text{Valid}(\tau)$: the tree is complete and fully satisfies the constraint $\tau$.
  • $\text{Partial}(\tau)$: the constraint $\tau$ is consistent with the tree so far, but the tree is incomplete; the status may strengthen or fail as more input arrives.
  • $\text{Invalid}$: the constraint is violated; the tree cannot be completed into any well-typed expression.

This lattice is monotone: status can only move from $\text{Partial}$ toward $\text{Valid}$ or $\text{Invalid}$, never in reverse.

Definition: Forest Status

The status of a partial forest $\mathcal{F}$ under $\vdash_\Theta$ is the supremum of the statuses of its trees under the lattice $\mathcal{S}$:

$$\Gamma \vdash_\Theta \mathcal{F} \;:=\; \bigsqcup_{t \,\in\, \mathcal{F}} \;\Gamma \vdash_\Theta t$$

Concretely:

$$\Gamma \vdash_\Theta \mathcal{F} = \begin{cases} \text{Valid}(\tau) & \text{if some tree has status } \text{Valid}(\tau) \\ \text{Partial}(\tau) & \text{if no tree is valid but some is } \text{Partial}(\tau) \\ \text{Invalid} & \text{if every tree is } \text{Invalid} \end{cases}$$

The forest is $\text{Invalid}$ only when all of its trees have been rejected.

Lemma: Status Monotonicity

For partial parses $\mathcal{F}(s)$ extending to $\mathcal{F}(s \cdot a)$: $$\Gamma \vdash_\Theta \mathcal{F}(s) = \text{Invalid} \implies \Gamma \vdash_\Theta \mathcal{F}(s \cdot a) = \text{Invalid} \quad \forall a$$

Monotonicity makes typed search efficient: once a state is $\text{Invalid}$, the entire subtree below it in the search space can be pruned without examination.

Typed Trees and Forests

The typing engine $\vdash_\Theta$ maps a tree to either a status or rejection. This lifts trees and forests into typed counterparts.

Definition: Typed Tree

A typed tree $\hat{t}$ is a pair $(t,\, \mathcal{T})$ where:

  • $t$ is a partial tree, and
  • $\mathcal{T} : V \to \tau \cup {\varepsilon}$ maps each node to its inferred type, or to $\varepsilon$ if not typable

$\hat{t}$ exists whenever $\vdash_\Theta t \neq \text{Invalid}$, i.e. the tree carries status $\text{Partial}(\tau)$ or $\text{Valid}(\tau)$.

The typing engine thus acts as a constructor:

$$ \vdash_\Theta t = \begin{cases} \hat{t} & \text{if } \vdash_\Theta t : \text{Valid}(\tau) \text{ or } \text{Partial}(\tau) \text{ for some } \tau \\ \text{Invalid} & \text{if the typing constraint is violated} \end{cases} $$

Definition: Typed Forest

A typed forest $\hat{\mathcal{F}}$ is a forest in which every tree is either a typed tree $\hat{t}$ or has been individually rejected as $\text{Invalid}$.

Extending the engine to forests:

$$ \vdash_\Theta \mathcal{F} = \begin{cases} \hat{\mathcal{F}} & \text{if at least one tree in } \mathcal{F} \text{ produces a typed tree} \\ \text{Invalid} & \text{if every tree in } \mathcal{F} \text{ is } \text{Invalid} \end{cases} $$

A partial tree that is typedly invalid carries an inconsistency in its typing rules. It is syntactically valid but semantically invalid

Types as Constraints on Completability

Without typing, $\mathcal{C}_L(s)$ is the set of all syntactically valid extensions of a prefix $s$. The typed tree structure gives this a sharper interpretation: every type $\tau$ assigned to a node in $\hat{t}$ is a constraint asserting that any syntactic completion of that subtree must produce a value of type $\tau$, or the tree transitions to $\text{Invalid}$.

Types therefore define a refinement of the completability set:

$$ \mathcal{C}_{L,\Theta}(s) = \{\, a \in \mathcal{C}_L(s) \mid \Gamma \vdash_\Theta \mathcal{F}(sa) \neq \text{Invalid} \,\} $$

Lemma: Typed Completability Refinement

$$\mathcal{C}_{L,\Theta}(s) \subseteq \mathcal{C}_L(s)$$

Typing can only narrow the completability set, never widen it. This refinement is computed incrementally:

  1. The parser produces a partial forest $\mathcal{F}(s)$.
  2. $\vdash_\Theta$ evaluates each tree, yielding $\hat{\mathcal{F}}$ or $\text{Invalid}$ per tree.
  3. The synthesizer computes the grammar-level completion set $\mathcal{C}_L(s)$.
  4. Each candidate token $a$ is tested by extension: parse $s \cdot a$, apply $\vdash_\Theta$, and retain only those where some root remains non-$\text{Invalid}$.

The type system acts as a filter, not a generator. It cannot propose completions the grammar does not support; it only rejects those that violate typing constraints.

Type Language

Definition: Type

A type $\tau$ is defined inductively:

FormNotationDescription
Atom$\tau$Named type from binding resolution
Meta$?A$Unification variable (unknown type)
Raw$\text{‘int’}$Concrete type literal
Arrow$\tau_1 \to \tau_2$Function type
Union$\tau_1 \mid \tau_2$Union type
Not$\lnot\tau$Negation type
ContextCall$\Gamma(x)$Deferred context lookup
Any$\top$Top type (universal supertype)
None$\emptyset$Bottom type (empty type)
Path$\text{Path}(i_1@a_1 \cdot i_2@a_2 \cdots i_n@a_n)$Internal: unresolved binding at grammar path
PathOf$\text{PathOf}(\tau,; i_1@a_1 \cdots i_n@a_n)$Internal: type $\tau$ at unresolved grammar path
Indeterminate$\widetilde{\tau}$Internal: type inferred from an incomplete parse

$\text{Path}(\cdots)$, $\text{PathOf}(\cdots)$, and $\widetilde{\tau}$ are markers of indeterminacy: the engine cannot yet produce a definitive type because the parse tree is incomplete. Their presence in a judgment forces the result to $\text{Partial}(\tau)$, propagating incompleteness upward without rejecting the tree.

DONE Regex Engine

The regex engine underpins all terminal matching in the parser. It uses Brzozowski derivatives to match incrementally, a design that integrates naturally with partial parsing. The theoretical foundations (derivatives, two-level completability) are developed in Completability and Regular Expressions. This page covers only the implementation-specific choices.

Source: src/regex/

The Regex Type

Definition: Regex

$$r ::= \emptyset \mid \varepsilon \mid c \mid [a\text{-}b] \mid r_1 \cdot r_2 \mid r_1 \mid r_2 \mid r^*$$

Empty, Epsilon, Char, Range, Concat, Union, Star. Grammar terminals written as /[a-z]+/ are parsed into this representation at grammar load time. Common shorthands (digit(), word(), alnum()) are combinators over these primitives.

Prefix Matching

Definition: PrefixStatus

prefix_match(r, s) computes $d = \text{simplify}(\partial_s r)$ and classifies the result:

OutcomeConditionMeaning
NoMatch$d = \emptyset$$s$ is not a prefix of any string in $\mathcal{L}(r)$
Extensible(d)$\nu(d)$$s \in \mathcal{L}(r)$ and $d$ accepts further characters
Prefix(d)$\lnot\nu(d)$, $d \neq \emptyset$$s$ is a valid prefix but not yet a complete match

The parser maps these outcomes to terminal node status: Extensible produces a Terminal::Complete with a non-null extension derivative; Prefix produces a Terminal::Partial with a remainder derivative.

Simplification (standard algebraic identities: $\emptyset \cdot r = \emptyset$, $\varepsilon \cdot r = r$, $r \mid r = r$, etc.) is applied after each derivative step to keep expression size bounded.

NFA Fallback

For full-string matching (matches) and maximum-length prefix matching (match_len), the engine compiles to an NFA via Thompson construction rather than using derivatives. This avoids the expression-growth problem derivatives can exhibit on long inputs.

DFA Intersection

has_intersection(r_1, r_2) checks $\mathcal{L}(r_1) \cap \mathcal{L}(r_2) \neq \emptyset$ by product-DFA construction. Used by the typing engine to detect overlapping terminal patterns.

String Generation

examples(n) generates up to $n$ concrete strings in $\mathcal{L}(r)$. The synthesizer uses this to find token candidates when extending a partial tree with a regex terminal.

DONE Grammar Specification

The grammar is composed of production rules with syntactic meaning and attached typing rules conveying semantic information about the language.

The core data structures are defined in src/logic/grammar/mod.rs.

Production Rules

Production rules define the syntactic structure of the language. Each rule maps a nonterminal to one or more alternative right-hand sides.

Definition: Production Rule Syntax
Nonterminal(rule_name) ::= symbol₁ symbol₂ ... | alternative₂ | ...
  • Nonterminal: symbol being defined.
  • (rule_name): optional typing rule name.
  • ::=: separator.
  • |: alternative separator.
  • symbols: sequence of terminals and nonterminals.

Bindings are resolved via grammar paths during type-checking. See src/logic/binding/grammar.rs for the path construction algorithm.

Epsilon Productions

An epsilon production represents an empty alternative, written as ε.

// Optional tail matching zero or more Terms
ProgramTail ::= Term ProgramTail | ε

Typing Rules

Typing rules attach semantic information to productions using an inference-rule format.

premise₁, premise₂, ...
------------------------ (rule_name)
conclusion

The rule name in parentheses must match a (rule_name) annotation on a production.

Premises

Premises express conditions for rule application.

SyntaxMeaning
Γ ⊢ e : τTerm e has type τ in context Γ (inference)
Γ ▷ eCheck that term e is well-typed in context Γ (checking)
x ∈ ΓVariable x is bound in context Γ
τ₁ = τ₂Types must unify
τ₁ ⊆ τ₂Type τ₁ is a subtype of τ₂
Γ[x:τ]Extend context with binding x:τ

Conclusions

Conclusions specify the rule’s result:

SyntaxMeaning
τReturn type τ
Γ(x)Look up type of x in context
Γ → Γ[x:τ] ⊢ τSet type of x in context and return τ

Checking mode () is not implemented as a conclusion; use the check premise Γ ▷ e instead.

Meta-Variables

A meta-variable (written ?A, ?B, etc.) is a placeholder unified during type checking, allowing rules to express polymorphic constraints.

// ?A must unify with both the function's domain and the argument's type.
Γ ⊢ f : ?A → ?B, Γ ⊢ e : ?A
-------------------------------- (app)
?B

Context Operations

The typing context Γ maps variable names to types:

  • Γ(x): Look up type of variable x.
  • Γ[x:τ]: Extend context with binding x:τ.
  • x ∈ Γ: Check if x is bound in context.

Type Syntax

Types in typing rules follow this syntax:

SyntaxDescription
τType variable (from binding)
Not Type
?AMeta-variable (unified during checking)
'int'Raw/concrete type literal
τ₁ → τ₂Function type
τ₁ | τ₂Union type
Top type (any)
Bottom type (none)
Γ(x)Context lookup

Example: Simply-Typed Lambda Calculus

A complete specification for STLC from examples/stlc.auf:

// Identifier (supports Unicode)
Identifier ::= /[A-Za-z_][A-Za-z0-9]*/

// Variables with var typing rule
Variable(var) ::= Identifier[x]

TypeName ::= /[A-Za-z0-9_τ₁₂₃₄₅₆₇₈₉₀]+/

// Base types (parentheses are literals, hence quoted)
BaseType ::= TypeName | '(' Type ')'

// Function types (right-associative)
AtomicType ::= BaseType | '(' Type ')'
FunctionType ::= AtomicType '->' Type
Type ::= AtomicType | FunctionType

// Lambda abstraction (dot is a literal)
Lambda(lambda) ::= 'λ' Identifier[a] ':' Type[τ] '.' Expression[e]

// Atomic expressions
AtomicExpression ::= Variable | '(' Expression ')' | Lambda

// Applications (left-associative)
Application(app) ::= Expression[l] AtomicExpression[r]

// Expressions (start symbol)
Expression ::= AtomicExpression |  Application

// Typing Rules
x ∈ Γ
----------- (var)
Γ(x)

Γ[a:τ] ⊢ e : ?B
--------------------------- (lambda)
τ → ?B

Γ ⊢ l : ?A → ?B, Γ ⊢ r : ?A
--------------------------------- (app)
?B

Source Files

ComponentPath
Grammar structuressrc/logic/grammar/mod.rs
Spec file loadingsrc/logic/grammar/load.rs
Typing rulessrc/logic/typing/rule.rs
Type definitionssrc/logic/typing/mod.rs
Binding mapsrc/logic/binding/grammar.rs

DONE Partial Parsing

The partial parser implements the theoretical framework of partial forests as a memoized recursive-descent parser with regex-derivative terminal matching. Given an input $s \in \Sigma^*$, it produces a partial forest $\mathcal{F}$ containing all valid parse trees (complete and partial) consistent with the grammar $G$.

Source: src/logic/partial/parse.rs

Segmentation

The first stage of parsing tokenizes input into a sequence of segments.

Definition: Segment

A segment is a whitespace-delimited token carrying its byte span and a flag indicating whether it is a prefix of a special token. Formally, segmentation is a function:

$$\text{tokenize}: \Sigma^* \to \text{Segment}^*$$

Segments preserve positional information (byte offsets) and are the atomic unit of the span cache.

Special tokens (multi-character operators like ->, <=) are matched longest-first. If a segment at the end of input is a strict prefix of a special token (e.g., - when -> exists in the grammar), it is marked as a partial special token. This flag disables caching for spans involving that segment, since the token’s identity is ambiguous until more input arrives.

Recursive Descent with Memoization

The parser is a recursive-descent parser over segments, augmented with:

  1. A persistent span cache for cross-parse memoization
  2. Cycle detection for left-recursive grammars
  3. Depth-bounded recursion
Definition: Parse Entry

The main entry point partial(s) produces a PartialParseOutcome:

  1. Tokenize $s$ into segments.
  2. Check cache validity: if the new segments extend the previous input’s segments (prefix match), reuse the cache; otherwise clear it.
  3. Parse from the start symbol $S$ at position 0 with a fresh ParseState.
  4. Filter roots to those consuming all segments.
  5. Return success (with possibly partial trees) or failure.
Definition: Parse State

A parse state tracks per-invocation mutable data:

  • $\text{visited}: (N \times \mathbb{N}) \to \mathbb{N}$, mapping (nonterminal, position) to visit count for cycle detection.
  • $\text{hit_depth_limit}: \text{bool}$, a sticky flag set when any branch exceeds the recursion bound. When true, no results from this parse invocation are cached.

Nonterminal Parsing

Definition: Nonterminal Parse

For nonterminal $A$ at position $i$ and depth $d$:

  1. Depth guard: if $d > d_{\max}$, abort and set $\text{hit_depth_limit} \leftarrow \text{true}$.
  2. Cache lookup: if the span cache can answer for $(d_{\max}, A, i, \ell)$ for all relevant span lengths, return cached trees.
  3. Cycle detection: track $(A, i)$ in visited. Allow up to $\min(d_{\max}, |\text{segments}| + 2)$ re-entries before aborting. This handles left-recursive productions like Term ::= Term BaseTerm.
  4. Try all productions: iterate over productions for $A$ in rotated order (see below), calling $\text{parse_production}$ for each.
  5. Cache results: if no depth limit was hit and segments do not end with a partial special token, store complete subtrees in the span cache grouped by consumed length.

Alternative Ordering

Definition: Production Rotation

Productions for a nonterminal are tried in a rotated order determined by recursion depth:

$$\text{order}(n, d) = [(d \bmod n), (d+1 \bmod n), \ldots, (d-1 \bmod n)]$$

where $n$ is the number of productions and $d$ is the current recursion depth.

This rotation distributes search effort across recursion depths, preventing systematic bias toward the first production. It is particularly important for left-recursive grammars where always trying the recursive alternative first would waste effort at shallow depths.

Symbol Sequence Parsing

A production’s right-hand side is a sequence of symbols parsed left-to-right. For each successful parse of the first symbol, the parser attempts to parse the remaining symbols with the unconsumed segments.

A critical invariant governs partial nodes: if a child node is partial (incomplete), it is accepted only if it consumed all remaining segments (i.e., it is at the frontier of the input). Partial nodes cannot appear in non-frontier positions.

Partial Terminals

Terminals are regex patterns matched via Brzozowski derivatives (see two-level language completability). Matching a segment against a terminal regex yields one of four outcomes:

Definition: Terminal Match Outcomes
OutcomeNode ProducedMeaning
CompleteTerminal::Complete (no extension)Segment fully matches the regex
ExtensibleTerminal::Complete (with extension derivative)Segment is a complete match AND a valid prefix of longer matches
PrefixTerminal::Partial (with remainder derivative)Segment is a valid prefix but not a complete match
NoMatch(none)Segment does not match

At end-of-input (empty remaining segments), the parser produces a Terminal::Partial with the full regex as remainder. This enables downstream completion generation: the remainder derivative describes exactly which tokens could legally appear next.

The distinction between extension and remainder is central to the completability analysis:

  • An extension derivative on a complete terminal means the terminal could accept more characters (e.g., identifier x could become xy).
  • A remainder derivative on a partial terminal describes what characters are needed to complete the match.

Epsilon Productions

An epsilon production (empty right-hand side, $\varepsilon$) matches zero segments and produces a nonterminal with no children. Such a nonterminal is considered complete.

Recursion Limit

Definition: Default Recursion Depth

The default maximum recursion depth is $d_{\max} = 15$ (DEFAULT_MAX_RECURSION_DEPTH). This is overridable per-parser and is distinct from the typing depth limit of 50.

When the depth limit is hit, the parser returns a DepthLimit error. The MetaParser wrapper in src/logic/partial/meta.rs uses iterative deepening to find the minimum sufficient depth.

Incremental Parsing

The parser supports incremental parsing across successive calls. When input $s$ is extended to $s \cdot t$:

  1. Normalize both segment sequences to position-independent keys.
  2. If the new segments start with the previous segments (prefix match), reuse the span cache.
  3. Otherwise, clear the cache entirely.

This prefix-stability property ensures that typing a single character does not require re-parsing the entire input from scratch.

DONE Tokenizer

The tokenizer converts a raw input string into a sequence of segments, the atomic units that the parser operates on. It is constructed once from a grammar’s special tokens and runs in a single linear pass over the input bytes.

Segment

Definition: Segment

A segment is a contiguous slice of the input string, carrying:

  • bytes: the UTF-8 content
  • start, end: byte offsets into the original input
  • index: position in the token stream
  • is_partial_special: true when the segment is a strict prefix of some special token but is not itself a complete special token

The is_partial_special flag has a specific consequence for the parser: any span whose last segment is a partial special is excluded from the span cache. The token’s identity is not yet determined — it may grow into a complete special with more input, or may be classified as a word token.

Tokenizer Construction

Tokenizer::new(special_tokens, delimiters) sorts the special token list by length descending, then alphabetically as a tiebreak, then deduplicates. Longest-match priority is structural: the first special that matches at a given position wins, and since longer specials come first, a two-character -> takes priority over its one-character prefix -.

Delimiters (default: space, newline, tab) are consumed and discarded — they produce no segment.

Tokenization Algorithm

The tokenizer scans left to right, applying three cases at each position in order:

  1. If any special token matches exactly at the current position, emit it as a segment and advance by its length. Because the list is sorted longest-first, this is always the longest possible special match.

  2. If the entire remaining suffix is a strict prefix of some special token (but does not match any completely), emit it as a single segment with is_partial_special = true and stop. This case only fires when we are at the end of input, since otherwise there would be more characters to determine whether the match completes.

  3. Consume characters one by one into a word token, stopping when:

    • a delimiter is reached
    • a complete special match starts at the current character
    • a partial-special prefix starts at the current character and there is a character-class transition from the previous character (word↔non-word boundary).

The character-class condition in case 3 is the critical subtlety.

Example: Tokenization example

For the fun grammar with specials ["->", "λ", ":", ".", "=", "let", "in", "if", "then", "else", "true", "false", "(", ")"] and delimiters [' ', '\n', '\t']:

Input "λx:Int" produces four segments:

  • λ (special)
  • x (word)
  • : (special)
  • Int (word, none partial).

The trailing t of Int is a prefix of then and true, but since t and n are both word characters, the word-boundary rule suppresses the split.

Input "λx:Int-" produces five segments, the last being - with is_partial_special = true, because - is a strict prefix of -> and there is a word↔non-word boundary between t and -.

Input "le" at end of input produces one segment, le, with is_partial_special = true (strict prefix of let).

DONE Type Inference

As we already explored, yhe type inference engine evaluates partial trees against the typing rules $\Theta$ attached to grammar productions.

The central judgment is $\Gamma \vdash_\Theta t : \mathcal{S}$, where $\mathcal{S}$ is a tree status in the lattice $$\{\text{Valid}(\tau), \text{Partial}(\tau), \text{TooDeep}, \text{Malformed}\}$$

Dispatch

Type checking is dispatched top-down from the root of the partial tree.

Definition: Evaluation Dispatch

Given a tree reference $t$ at depth $d$ in context $\Gamma$:

  1. Depth guard: if $d > 50$, return $\text{TooDeep}$.
  2. Cache lookup: if the type cache contains an entry for $t$’s path, return $\text{Valid}(\text{cached})$.
  3. Terminal: if $t$ is a terminal node:
    • Complete terminal: $\text{Valid}(\top)$.
    • Partial terminal: $\text{Partial}(\top)$.
  4. Non-terminal with rule: if $t$’s production has a typing rule $\theta \in \Theta$, apply the rule.
  5. Non-terminal without rule: apply the drilling wrapper.

Completeness Promotion

Lemma: Partial-to-Malformed Promotion

If a non-terminal $t$ is complete (all children present and complete) and not extensible, but the typing result is $\text{Partial}$ we return $\text{Malformed}$ because its weird.

This prevents permanently indeterminate states on finished subtrees. A complete, non-extensible subtree that cannot be assigned a definite type has definitively failed type checking.

Rule Application

When a non-terminal has an associated typing rule $\theta = (\text{premises}, \text{conclusion})$we can apply the following algorithm to resolve it.

  1. Resolve bindings: compute $B = \text{resolve}(t, \theta)$ via runtime binding. Return $\text{Partial}$ if the target is at the frontier; return $\text{Malformed}$ if the structure is invalid.

  2. Create unifier: initialize a unifier $U$ for meta-variable resolution.

  3. Initialize context lanes: create named context maps. The primary lane is set by the conclusion’s context input, the first premise’s setting, or defaults to $\Gamma$.

  4. Evaluate premises (in order): for each $p_i$, compute $\text{check_premise}(p_i)$:

    • Ok(G') updates the relevant context lane
    • Fail immediately returns Malformed
    • Partial sets a flag but evaluation continues since a later premise may still fail
  5. If any premise was Partial: return $\text{Partial}(\top)$.

  6. Evaluate conclusion: resolve the output type via the unifier and bindings, return $\text{Valid}(\tau)$.

  7. Extract context transform: if the conclusion specifies $\Gamma \to \Gamma[x:\tau]$, build the extended context for sibling propagation.

The non-short-circuiting behavior on Partial premises (step 4) is deliberate: a Partial followed by a Fail should yield Malformed, not Partial.

Transparent Wrapper

Productions without typing rules are “transparent” to their child if they only have one, else they fail. This is done for convenience when writting grammars.

Definition: Transparent Wrapper

For a non-terminal without a typing rule:

  • 1 non-terminal child: recurse into that child. The parent inherits the child’s type and output context unchanged.
  • 0 non-terminal children (terminals only): if at the frontier, $\text{Partial}(\top)$; otherwise $\text{Valid}(\top)$.
  • 2+ non-terminal children: $\text{malformed}$

Type Cache

Definition: Type Cache

The type cache is a map $\text{cache}: \mathcal{P} \to \tau$ from tree paths to types, created fresh per check_tree invocation and threaded through all recursive calls. Only $\text{Valid}$ results are cached. $\text{Partial}$, $\text{Malformed}$, and $\text{TooDeep}$ results are not cached because they may change as the tree evolves.

The cache is controlled by a thread-local flag (TYPE_CACHE_ENABLED, default true). Disabling it is useful for debugging but has significant performance cost on deep trees.

Meta-Variable Resolution

Meta-variables ($?A$, $?B$, etc.) are resolved via Hindley-Milner unification. The unifier maintains a substitution $\sigma: \text{MetaVar} \to \tau$ with the following invariants:

  • Occurs check: $?X$ must not appear in $\tau$ before binding $?X := \tau$.
  • Idempotent substitution: $\sigma(\sigma(\tau)) = \sigma(\tau)$.
Definition: Unification

Unification of types $\tau_1$ and $\tau_2$ produces a three-valued result:

  • Ok: types are structurally compatible; meta-variable bindings are recorded.
  • Fail: types are structurally incompatible (e.g., $\text{Raw}(\text{int})$ vs $\text{Raw}(\text{bool})$).
  • Indeterminate: one or both sides contain unresolved components ($\top$, $\text{Path}(i_1@a_1 \cdots i_n@a_n)$, $\Gamma(x)$); cannot determine compatibility yet.

$\top$ is treated as indeterminate in unification (not unified with concrete types), which is consistent with its role as “not yet determined” rather than “accepts everything.”

Resolution Pipeline

Type expressions undergo a two-stage resolution before use:

  1. Meta resolution ($\text{solve_meta}$): substitute all bound meta-variables from the unifier’s substitution map.
  2. Binding resolution ($\text{solve_binding}$): substitute all $\text{Atom}(x)$ variables with values read from the tree via the runtime binding system.

Subtyping

Definition: Subtyping Rules

The subtyping relation $\tau_1 \subseteq \tau_2$ is defined by:

  • $\bot \subseteq \tau$ for all $\tau$ (bottom is subtype of everything).
  • $\tau \subseteq \top$ for all $\tau$ (everything is subtype of top).
  • $\tau \subseteq \tau$ (reflexivity, via structural equality).
  • $(\tau_1 \to \tau_2) \subseteq (\tau_3 \to \tau_4)$ iff $\tau_3 \subseteq \tau_1$ and $\tau_2 \subseteq \tau_4$ (contravariant in domain, covariant in codomain).
  • $(\tau_1 \mid \ldots \mid \tau_n) \subseteq \tau$ iff $\tau_i \subseteq \tau$ for all $i$.
  • $\tau \subseteq (\tau_1 \mid \ldots \mid \tau_n)$ iff $\tau \subseteq \tau_i$ for some $i$.

Equality is three-valued: $\text{equal}(\tau_1, \tau_2) \in \{\text{true}, \text{false}, \text{indeterminate}\}$. Indeterminate arises when either side contains $\top$, $\text{Path}(\cdots)$, or $\Gamma(x)$.

Depth Limit

The typing depth limit of 50 prevents infinite recursion on grammars with cyclic typing dependencies. It is bad design.

Context

WIP Premises

Premises are conditions above the line in a typing rule, evaluated left-to-right to thread the typing context.

Premise Result

Evaluating a premise yields a three-valued result:

  • $\text{Ok}(\Gamma’)$: the premise holds. $\Gamma’$ is the modified context.
  • $\text{Fail}$: the premise is definitively violated.
  • $\text{Partial}$: the premise is undecided due to incomplete information.

This mirrors the tree status lattice. Typing constraints are deferred when information is missing.

Premise Types

Definition: Ascription (Inference)

$\Gamma \vdash e : \tau$ infers the type of $e$ in context $\Gamma$ and unifies with $\tau$.

  1. Resolve binding $e$ to a tree node.
  2. Recursively type-check the subtree.
  3. Unify the inferred type with $\tau$.
Definition: Check

$\Gamma \triangleright e$ asserts that $e$ is well-typed in $\Gamma$ without constraining its type.

  1. Resolve binding $e$.
  2. Type-check the subtree.
  3. Succeed if status is $\text{Valid}$ or $\text{Partial}$; fail if $\text{Invalid}$.
Definition: Membership

$x \in \Gamma$ asserts that $x$ is bound in context $\Gamma$.

  1. Resolve binding $x$ to its text value.
  2. Look up text in $\Gamma$.
  3. Succeed if found; fail if not; return $\text{Partial}$ if $x$ is incomplete.
Definition: Context Extension

$\Gamma[x:\tau]$ extends the context with binding $x:\tau$.

  1. Resolve $x$ and $\tau$.
  2. Call $\Gamma.\text{extend}(x, \tau)$.
  3. Succeed with $\Gamma’ = \Gamma[x:\tau]$; fail if $x$ exists in $\Gamma$.

Operation (Equality / Subtyping)

$\tau_1\ \text{op}\ \tau_2$ where $\text{op} \in \{=, \subseteq\}$.

  • $\tau_1 = \tau_2$: unify the two types.
  • $\tau_1 \subseteq \tau_2$: check the subtype relation.

Both use three-valued logic, returning $\text{Ok}$, $\text{Fail}$, or $\text{Partial}$.

Evaluation Order

Premises evaluate strictly left-to-right. The context from premise $i$ is input to $i+1$, unless in no-propagate mode. Premise ordering is semantically significant: reordering changes binding availability.

WIP Conclusion

The conclusion determines the rule’s result type and any modifications to the context $\Gamma$.

Source: src/logic/typing/eval.rs, src/logic/typing/rule.rs

Conclusion Kinds

Definition: Conclusion Kinds

A conclusion has one of the following forms:

FormSyntaxResult
Type$\tau$Return type $\tau$
Lookup$\Gamma(x)$Type of $x$ in $\Gamma$
Transform$\Gamma \to \Gamma[x:\tau] \vdash \tau’$Extend $\Gamma$ with $x:\tau$, return type $\tau’$

Checking mode (void) is initiated by the Check premise $\Gamma \triangleright e$, not by a conclusion. All conclusions must return a type.

Type Conclusion

Definition: Type Conclusion

A conclusion $\tau$ evaluates directly to type $\tau$. Any meta-variables in $\tau$ must be bound by premises; unbound meta-variables yield an indeterminate type.

Context Lookup

Definition: Context Lookup

A conclusion $\Gamma(x)$ resolves the binding $x$ in context $\Gamma$.

  • If $x$ is not fully parsed: $\text{Partial}$.
  • If $x$ is not in $\Gamma$: $\text{Invalid}$.
  • Otherwise: The type associated with $x$.

Context Transform

Definition: Context Transform

A conclusion $\Gamma \to \Gamma[x:\tau] \vdash \tau’$ extends context $\Gamma$ for sibling nodes.

  1. Evaluate $x$ and $\tau$ to concrete values.
  2. $\Gamma’ = \Gamma \cup \{x:\tau\}$.
  3. The result type is $\tau’$.
  4. The output context is $\Gamma’$.

DONE Completion

The completion pipeline takes a partial input string $s$ and produces a complete, well-typed expression in the language $L$ defined by grammar $G$. It is the core synthesis loop of Aufbau, the mechanism by which partial expressions are extended into valid programs.

The theoretical foundation for completability over two-level languages (where the grammar alphabet is itself a regular language over bytes) is developed in Completability and Regular Expressions.

This page is the roadmap for the completion subsection of the book. It is meant to be read linearly with the next three chapters in order: Synthesizer, Search, then Scoring. To avoid duplication, those chapters focus on one stage each instead of re-explaining the whole pipeline.

Pipeline Overview

The completion pipeline composes three components:

  1. Synthesizer $\Sigma_s$: maintains incremental parsing state and exposes typed extension operations. Given input $s$ and a typing context $\Gamma$, it produces the set of type-valid next tokens.

  2. Search: explores the space of token sequences that extend $s$ to a complete expression. The current implementation tries a greedy fast path first, then falls back to a priority-queue frontier.

  3. Scoring: a heuristic function $|\sigma|: Q \to \mathbb{R}$ that ranks search states by estimated proximity to a valid completion. Guides the search toward promising branches.

The composition is:

$$\text{complete}(G, s, \Gamma) = \text{Search}(\Sigma_s(G, s), |\sigma|, \Gamma)$$

where $\Gamma$ is an optional external typing context.

Typed Completability

Recall from Basic Concepts that the completability set $\mathcal{C}_L(s) = \{a \in \Sigma : \exists s’ \in \Sigma^*, sas’ \in L\}$ is the set of valid next symbols. The completion pipeline computes a refined version:

$$\mathcal{C}_{L,\Theta}(s) = \{a \in \mathcal{C}_L(s) : \Gamma \vdash_\Theta \mathcal{F}(sa) \neq \text{Invalid}\}$$

This is the typed completability set, the set of next tokens that are both syntactically valid and consistent with all typing constraints. The type system explains why this refinement is always a subset of the syntactic completability set.

Data Flow

Input s ──→ Synthesizer ──→ completions_ctx ──→ Search loop
                │                                      │
                │            extend_with_regex ←───────┘
                │                   │
                └───────────────────┘
                        │
                  find_valid_completion ──→ Success / Exhausted

This diagram is intentionally high-level. The current implementation’s greedy fast path, beam-style child truncation, and local grounding preference are search-policy details described in Search.

Each iteration of the search loop:

  1. Pops the highest-scored state from the priority queue
  2. Checks whether the current typed forest already contains a complete root
  3. If not, asks the synthesizer for completions_ctx at the current state
  4. Extends the state with candidate tokens, scores and prunes the children, and pushes the survivors onto the queue

For the formal definition of completability over two-level languages (where terminals include regex patterns over $\Sigma_\text{vocab}$), including the Brzozowski derivative mechanism for partial token validation, see Completability and Regular Expressions.

Source Files

DONE Synthesizer

The synthesizer is the stateful interface between the parser and the search engine. It wraps incremental parsing with type-filtered extension operations, providing the search with a clean abstraction: given the current input, what are the valid next tokens?

Source: src/logic/partial/synth.rs

This chapter is meant to be read before Search. It only covers token generation and input extension; frontier management is described in Search, and numeric ranking is described in Scoring.

Definition

Definition: Synthesizer

A synthesizer is a stateful triple:

$$\Sigma_s = (G, \mathcal{M}, s)$$

where:

  • $G$ is a grammar
  • $\mathcal{M}$ is a MetaParser instance layered on top of the partial parser, with cached depth state
  • $s \in \Sigma^*$ is the current input string

The synthesizer exposes the following operations:

OperationSignatureDescription
partialpartial(&mut self) -> Result<PartialAST, String>Parse current input into a partial forest
completionscompletions(&mut self) -> CompletionSetType-filtered completion set under the empty context
completions_ctxcompletions_ctx(&mut self, ctx: &Context) -> CompletionSetType-filtered completion set in context $\Gamma$
try_extendtry_extend(&mut self, token: &str, ctx: &Context) -> Result<(TypedAST, String), String>Parse $s \cdot t$, return the typed result without mutation
extendextend(&mut self, token: &str, ctx: &Context) -> Result<TypedAST, String>Parse $s \cdot t$ and commit to the new input
extend_with_regexextend_with_regex(&mut self, token: &DerivativeRegex, ctx: &Context) -> Option<(TypedAST, String)>Extend with a regex token, returning the first valid candidate
extend_all_with_regexextend_all_with_regex(&mut self, token: &DerivativeRegex, ctx: &Context, max_examples: usize) -> Vec<(TypedAST, String)>Extend with a regex token, returning up to max_examples valid candidates
completecomplete(&mut self) -> Option<TypedNode>Return a complete typed root when one exists

Extension Semantics

Token Extension

Definition: Parse Extension

Given input $s$ and candidate token $t$, the parse extension function attempts three concatenations in order:

  1. $s \cdot_\text{auto} t$: with automatic separator insertion (see below)
  2. $s \cdot t$: direct concatenation
  3. $s \cdot \text{’ ’} \cdot t$: with explicit space

The first successful parse is returned.

Definition: Automatic Separator

The automatic separator heuristic inserts a space between $s$ and $t$ when:

$$\text{last}(s) \in \mathcal{W} \land \text{first}(t) \in \mathcal{W}$$

where $\mathcal{W} = \{a\text{-}z, A\text{-}Z, 0\text{-}9, _\}$ is the set of word characters.

The auto-separator is a syntactic heuristic, not a grammatical guarantee. It handles the common case where two adjacent identifiers or keywords require whitespace separation. The fallback to direct concatenation and explicit space ensures no valid extension is missed; at worst, the heuristic wastes one parse attempt.

Regex Extension

Definition: Regex Extension

When the next token is a DerivativeRegex $r$ (a regex that can still match more input), the synthesizer cannot directly concatenate $r$ as a string. Instead, it gathers concrete candidate strings and tries each:

  1. Grammar seeds: word-like special tokens and raw types collected from the grammar
  2. Primary example: $r.\text{example}()$, the canonical example string of the regex
  3. Tree terminals: collect terminal text from the current typed tree

Candidates are tried in this order via try_extend. extend_with_regex returns the first valid extension; extend_all_with_regex keeps distinct successful extensions up to the requested bound.

The candidate ordering is intentional: reusing grammar-derived literals and already-seen terminals tends to produce more coherent completions than inventing arbitrary new text. This is a heuristic for quality, not correctness; any valid candidate would produce a correct completion.

Step-by-step: building an expression token by token

This section traces exactly what the synthesizer does when the search engine calls extend repeatedly to build the expression let x = 1 from an empty input. The grammar assumed is a simple expression language where a let-binding has the form let <ident> = <expr>.

why this example? `let x = 1` hits all three extension modes: keyword (no separator needed before `let`), identifier (`x`, chosen from `completions_ctx`), punctuation (`=`), and a literal. Four tokens, four calls to `extend`.

Initial state. The synthesizer starts with $s = \varepsilon$ (the empty string). partial() returns an empty forest; every nonterminal in $G$ is a candidate root. completions_ctx with any context will return the set of tokens that can begin a valid expression — including let.

Step 1: extend with let. The search engine calls extend(\texttt{"let"}, \Gamma). The auto-separator check sees $\text{last}(\varepsilon)$ is undefined so no separator is inserted. The MetaParser parses "let" and returns a partial forest with one active parse: a let-binding with three remaining holes (<ident>, =, <expr>). The synthesizer commits: $s \leftarrow \texttt{“let”}$.

After this step, completions_ctx returns only tokens that can extend "let" to a syntactically and type-valid string. In a well-designed grammar, that set consists entirely of identifiers matching [a-z][a-z0-9_]*.

Step 2: extend with x. The auto-separator check: $\text{last}(\texttt{“let”}) = \texttt{t} \in \mathcal{W}$ and $\text{first}(\texttt{“x”}) = \texttt{x} \in \mathcal{W}$, so a space is automatically inserted. The MetaParser parses "let x". The partial forest now has the <ident> hole filled; the remaining holes are = and <expr>. The synthesizer commits: $s \leftarrow \texttt{“let x”}$.

completions_ctx now returns only =, since that is the only token that can advance the let-binding parse.

Step 3: extend with =. $\text{last}(\texttt{“let x”}) = \texttt{x} \in \mathcal{W}$, $\text{first}(\texttt{“=”}) = \texttt{=} \notin \mathcal{W}$, so no auto-separator. The MetaParser parses "let x =". The partial forest has one hole remaining: <expr>. The synthesizer commits: $s \leftarrow \texttt{“let x =”}$.

completions_ctx with a typing context $\Gamma$ now filters: any token that begins a valid expression and whose eventual type is compatible with the binding context. Numeric literals (1, 2, …) and any bound variable in $\Gamma$ are valid.

Step 4: extend with 1. $\text{last}(\texttt{“let x =”}) = \texttt{=} \notin \mathcal{W}$, no auto-separator. The MetaParser parses "let x = 1". The partial forest now contains a complete parse rooted at the let-binding nonterminal. complete() returns Some(root). The search engine can now stop on this state.

The display in the widget mirrors what the synthesizer’s internal state looks like at each step: the accumulated string $s$, which holes in the parse tree have been filled, and what remains. Clicking any offered token fires a Web Audio blip and advances the state exactly as extend would.

Depth Retry

bounded retry retry is intentionally local: nearby states should have nearby depths (e.g. 10 -> 11/12, not 41)
Remark: Depth-Differential Retry

When completion filtering fails on the first pass, the synthesizer retries with a fresh MetaParser in a bounded local window:

$$d_\text{start} = d + 1, \quad d_\text{max} = d + 2$$

where $d$ is the depth used by the initial parse.

This makes sure the parsing gathers anough trees to produce a typed alternative. It’s a heuristics, and is faillible. Bad design but works enough.

DONE Search

The search engine explores the space of token-level extensions to find a complete, well-typed expression. It uses a priority queue ordered by a heuristic scoring function.

Source: src/logic/search/mod.rs

This chapter is meant to be read after the completion overview and synthesizer chapters. It only describes search-side control flow: frontier management, pruning, and stopping conditions. Token generation lives in Synthesizer; the numeric heuristic itself lives in Scoring.

Operational Shape

Definition: Search Shape

The current implementation has two phases:

  1. A greedy fast path (try_greedy_complete) that repeatedly takes the locally best child and returns immediately if it reaches a complete root.
  2. A best-first frontier search (search_complete) that uses a BinaryHeap once greedy completion stalls.

Both phases use the same synthesizer operations and the same SearchConfig budgets.

Search Space

Definition: Search Space

The search space is operationally an exploration over typed states $(T, \pi, d)$ where:

  • $Q$ is the set of all such typed states
  • $T$ is the current typed parse forest (TypedAST)
  • $\pi \in \Sigma_t^*$ is the sequence of regex completion tokens chosen so far
  • $d \in \mathbb{N}$ is the search depth
  • $q_0 = (T_0, [], 0)$ is the initial state, where $T_0$ is the typed parse of the original input
  • $F \subseteq Q$ is the set of states whose typed forest contains a complete root

Let $\Sigma_t$ be the set of regex completion tokens produced by completions_ctx. For a state $q$ and token $t \in \Sigma_t$, the search asks the synthesizer to try up to $n_{\max}$ concrete witnesses for $t$, producing a finite child set $\Delta(q, t) \subseteq Q$.

Definition: Search State

A search state $q = (T, \pi, d)$ consists of:

  • $T$: a typed parse forest (the current TypedAST)
  • $\pi$: the regex-token completion path. This records completion tokens, not the concrete witness strings tried for regex expansion.
  • $d$: the number of successful extensions applied since $q_0$

Algorithm

Definition: Search Algorithm

The current implementation uses two phases over the same state representation:

  1. Build a Synthesizer for $s_0$ and parse the initial typed forest $T_0 = \text{partial_typed_ctx}(s_0, \Gamma)$. Reject with Invalid if this fails.
  2. Seed the visited set with T_0.text().
  3. Run a greedy fast path (try_greedy_complete) for up to $d_{\max}$ steps:
    • ask the synthesizer for completions_ctx(\Gamma)
    • expand each regex token using up to $n_{\max}$ concrete witnesses
    • keep the locally best child (grounding first, then open slots, then score)
    • return immediately if a complete root is reached
  4. If greedy completion stalls, initialize a max-heap $H$ with $q_0 = (T_0, [], 0)$ scored by Scoring.
  5. Loop: pop the highest-scored state $q = (T, \pi, d)$ from $H$.
    • If $T$ has a complete root, return Success.
    • If $d \geq d_{\max}$, skip expansion.
    • If the total explored-state budget $s_{\max}$ has been hit, stop and return Exhausted.
    • Otherwise, set the synthesizer input to T.text(), compute completions_ctx(\Gamma), and expand each token via extend_all_with_regex_candidates(..., n_{\max}).
    • Discard children whose typed forest is empty or whose text() has already been visited.
    • If any child has a grounded root (ty != Any), discard children whose roots are all Any.
    • Sort remaining children by grounded-root count, then score, keep only the top $b_{\max}$, and push them onto $H$.
  6. If $H$ becomes empty, return Exhausted.

The module is documented as “Priority-guided DFS search.” This name is historical and relative: the predecessor algorithm was pure BFS, expanding every completion at every depth level before proceeding to the next. The current algorithm replaced that with a BinaryHeap (max-heap), making it structurally more depth-oriented in practice since high-scoring deep states are preferred over low-scoring shallow ones. This label applies to the heap-driven second phase; the greedy fast path runs before any frontier is built.

However, it is not true DFS. True DFS uses a stack (LIFO) and commits to fully exploring one branch before backtracking. This algorithm uses a priority queue: the next state popped is always the globally highest-scored state, regardless of which branch produced it. It freely interleaves exploration across branches whenever a state from a different branch scores higher than the current branch’s frontier. The depth bound $d_{\max}$ is a safety cutoff, not a stack discipline.

Concretely: if branch $A$ at depth 3 scores $2.1$ and branch $B$ at depth 1 scores $2.3$, the algorithm explores $B$ next. True DFS would continue with $A$. The algorithm makes no commitment to a single branch.

Formally, this is best-first search (BeFS) with scoring function $|\sigma|$. The “DFS” label reflects the intent (explore depth before breadth, relative to the predecessor) not the mechanism.

Configuration

Definition: Search Configuration

The search configuration $\kappa = (d_{\max}, n_{\max}, s_{\max}, b_{\max})$ in SearchConfig::default() currently consists of:

  • $d_{\max} = 10$: maximum search depth (number of token extensions from the initial state)
  • $n_{\max} = 1$: maximum number of concrete examples to try per regex token during regex expansion
  • $s_{\max} = 96$: total explored-state budget before returning Exhausted
  • $b_{\max} = 12$: beam width per expanded state after local sorting

Result

Definition: Search Result

The search returns one of three outcomes:

  • $\text{Success}(s’, T, \pi, d)$: a complete input $s’$, the first complete typed root $T$, the regex-token completion path $\pi$, and the depth $d$
  • $\text{Exhausted}(d_{\max}, n, V)$: the frontier emptied or the state budget was hit after exploring $n$ states; $V$ is the list of deduplicated reconstructed inputs visited so far
  • $\text{Invalid}(m)$: the initial input $s_0$ could not be partially typed; $m$ is a diagnostic message

Search-Side Checks

The search code itself performs only lightweight checks. The meaning of the attached types comes from earlier chapters and from partial_typed_ctx; this chapter only documents what the search layer does with those typed forests.

Definition: Expansion Filter

A child produced during expansion is kept only if:

  1. regex expansion produced a non-empty TypedAST
  2. its reconstructed input text() has not been visited before
  3. after the local grounding preference, it survives the beam-width cutoff

Search does not run a separate Valid/Partial/Invalid predicate here; it relies on the synthesizer to produce typed child forests in the first place.

Lemma: Grounding Preference

During one expansion step, if any child has at least one root whose type is not Any, the implementation drops children whose roots are all Any before score ordering. This is a local search policy, not part of the numeric scoring function.

Definition: Success Condition

A state is accepted as soon as its TypedAST contains a complete root (TypedNode::is_complete()). In normal use this corresponds to a complete parse from the grammar start symbol, because the parser roots are generated from the grammar entry point.

State Deduplication

The search maintains a HashSet<String> keyed by the reconstructed surface input tree.text() of each state. Two states with identical reconstructed inputs are considered equivalent, even if they arrived via different token paths. This prevents re-exploration of states that parse the same surface string.

For how the surviving states are numerically ranked, continue to Scoring.

DONE Scoring

The scoring function $|\sigma|$ assigns a real-valued score to each search state, guiding the best-first search toward promising completions. This chapter documents only the numeric heuristic in calculate_score(). Deduplication, grounding preference, greedy completion, and beam truncation are search-policy details described in Search.

Source: src/logic/search/scoring.rs

This chapter is meant to be read after Search.

Score Tuple

Definition: State Score

The state score is a 6-tuple:

$$\sigma = (c, p, \ell, o, s, r) \in \mathbb{R}^6$$

with overall score:

$$|\sigma| = c + p + \ell + o + s + r$$

ComponentNameRangeDirection
$c$Completeness$[0, 2]$higher = more matched structure
$p$Production fullness$[0, 1]$higher = more RHS positions filled
$\ell$Token length bonusslow-growing, non-negativehigher = more concrete terminals
$o$Open slots penalty${0, -0.3, -0.6, \ldots}$closer to 0 = fewer missing children
$s$Simplicity$[0, 0.3]$higher = shallower search depth
$r$Recursion penalty$[-0.5, 0]$closer to 0 = shallower tree

Several components choose the best or shallowest root among alternatives rather than averaging over all roots. This is intentional: one promising parse alternative should be able to dominate the score early instead of being drowned out by worse siblings.

Sub-Score Definitions

Completeness

Definition: Completeness Score

For each root $u$, walk all nodes in the subtree. A complete terminal contributes $1$. A partial terminal whose current surface text has length $m$ contributes $\frac{0.5}{m+1}$. An expression node with no children contributes $0$.

Let $S_u$ be the sum of those contributions and $N_u$ the total node count visited. The implementation computes

$$c(u) = \min\left(2,\ 2 \cdot \frac{S_u}{N_u}\right), \qquad c = \max_u c(u).$$

Expression nodes with children do not add direct bonus here; they only contribute through their descendants while still increasing the denominator. So the score prefers states with more concrete material and fewer empty placeholders.

Production Fullness

Definition: Production Fullness Score

For each expression node $v$ with expected right-hand side length $n_v > 0$ and currently filled child count $k_v > 0$, define its local fill ratio as $\frac{k_v}{n_v}$. For a root $u$, let $E_u$ be the set of such expression nodes.

The implementation computes the RMS fill ratio

$$p(u) = \sqrt{\frac{1}{|E_u|}\sum_{v \in E_u}\left(\frac{k_v}{n_v}\right)^2}, \qquad p = \max_u p(u),$$

with $p = 0$ if $E_u$ is empty.

This rewards roots whose productions are already mostly filled, even if the tree is not yet complete.

Token Length Bonus

Definition: Token Length Bonus

Let $L_u$ be the number of leaf terminals in root $u$. The implementation computes

$$\ell = 0.25 \cdot \sqrt{\max_u L_u}.$$

This is a small bonus for states that already contain more concrete terminal material. The square root keeps the signal mild.

Open Slots Penalty

Definition: Open Slots Penalty

For each root $u$, count open slots recursively:

  • an expression node with no children contributes $1$
  • an expression node with expected RHS length $n_v$ and $k_v$ filled children contributes $n_v - k_v$
  • children are then counted recursively as well

Let $O_u$ be the resulting count. The implementation uses the best (smallest) root:

$$o = -0.3 \cdot \min_u O_u.$$

This is the dominant negative signal in practice: trees with fewer remaining holes tend to outrank structurally larger but more incomplete alternatives.

Simplicity

Definition: Simplicity Score

The simplicity score rewards shallower search paths:

$$s = 0.3 \cdot \left(1 - \frac{d}{d_{\max}}\right)$$

where $d$ is the current search depth and $d_{\max}$ is the configured maximum depth.

Recursion Penalty

Definition: Recursion Penalty

Let $D_u$ be the maximum node depth of root $u$. The implementation takes the shallowest root among alternatives and applies a light quadratic penalty:

$$r = -0.5 \cdot \left(\min\left(1, \frac{\min_u D_u}{d_{\max} + 1}\right)\right)^2.$$

This discourages unnecessarily deep trees, but much more gently than the open-slots penalty discourages incomplete ones.

Ordering

Definition: Score Use

Search uses the numeric score in two places:

  1. heap priority for frontier states
  2. secondary ordering among children after expansion

Before score comparison, Search may prefer children with grounded roots (ty != Any) and may truncate to a beam width. So score is important, but it is not the whole search policy.

DONE Completability

This chapter formalizes the completability properties that Aufbau guarantees, connecting the theoretical definitions from Basic Concepts to the algorithmic implementation.

For the treatment of completability over two-level languages, where the grammar alphabet $\Sigma$ is itself a regular language over a byte alphabet $\Sigma_\text{vocab}$, see Completability and Regular Expressions.

Source: src/validation/completability.rs

Typed Completability

Recall from Basic Concepts that a string $s$ is completable in $L$ if $\mathcal{C}_L(s) \neq \emptyset$. The typing engine (see Type System) refines this:

Definition: Typed Completability

A string $s$ is typed-completable in $(L, \Theta, \Gamma)$ if:

$$\mathcal{C}_{L,\Theta}(s) = \{a \in \mathcal{C}_L(s) : \Gamma \vdash_\Theta \mathcal{F}(sa) \neq \text{Invalid}\} \neq \emptyset$$

That is, there exists at least one next token that is both syntactically valid and does not violate any typing constraint.

Completion

Definition: Completion

$\text{complete}(G, s, d_{\max}, \Gamma)$ performs a depth-bounded search for a complete, well-typed expression extending input $s$:

$$\text{complete}(G, s, d_{\max}, \Gamma) = \begin{cases} \text{Success}(s’, T) & \text{if finds a valid completion } s’ \text{ with tree } T \\ \text{Exhausted} & \text{if explores all states within } d_{\max} \text{ without success} \\ \text{Invalid} & \text{if } s \text{ cannot be parsed} \end{cases}$$

Prefix Soundness

Definition: Prefix Soundness

A string $s$ is prefix-sound in $(G, \Theta, \Gamma)$ if every prefix of $s$ is typed-completable:

$$\text{sound}(G, s, \Gamma) \iff \forall i \in [0, |s|] : \text{prefix_ok}(G, s[0..i], \Gamma)$$

where whitespace-only prefixes (except the empty prefix) are excluded from checking.

Definition: Prefix OK

A prefix $s’$ satisfies $\text{prefix_ok}(G, s’, \Gamma)$ if either:

  1. $\mathcal{C}_{L,\Theta}(s’) \neq \emptyset$: there exist typed completions (the input is not a dead end), or
  2. $\exists r \in \mathcal{F}(s’) : \Gamma \vdash_\Theta r \in \{\text{Valid}(\tau), \text{Partial}(\tau)\}$: some parse tree root is currently valid or indeterminate

Condition 1 alone is insufficient. Consider an input like let x : int in. The body expression has not started, so the typed completion set may depend on the body’s grammar structure, which is complex to compute from scratch. But the partial tree itself is still consistent: the let-binding is well-formed so far, and the body is simply missing. Condition 2 catches these cases by checking the current tree status rather than demanding that forward extensions exist.

Lemma: Prefix Soundness as a Safety Guarantee

If $\text{sound}(G, s, \Gamma)$ holds, then for any prefix $s’$ of $s$, the system is never in a state where no forward progress is possible. The user (or LLM) could have stopped producing tokens at any intermediate point, and the system would still have been able to find a valid completion.

This is the key usability property: the system never traps the input.

Theorem: Sound Completion

$\text{sound_complete}(G, s, \Gamma)$ holds if and only if:

  1. $\text{sound}(G, s, \Gamma)$: all prefixes are OK
  2. $\text{complete}(G, s, d_{\max}, \Gamma) = \text{Success}(s’, T)$: the full input has a valid completion

Together, these ensure that the input $s$ can be completed into a valid expression and that the system was never stuck at any intermediate point during the production of $s$.

Two-Level Completability

The grammar alphabet $\Sigma$ contains elements that are themselves languages over a finer alphabet. String literals match single strings, but regex patterns like /[a-z][a-zA-Z0-9]*/ match sets of strings. This creates a two-level language structure:

$$L \subseteq \Sigma^* \quad \text{where} \quad \Sigma \subseteq \Sigma_\text{vocab}^*$$

The partial parser $\Phi_L$ operates at the $\Sigma$ level, but the actual input arrives at the $\Sigma_\text{vocab}$ level (bytes, characters, or model vocabulary tokens). A partial input may be in the middle of a token: the string bi is a prefix of the token /b(u|o|i)p/ but is not itself a token.

The Brzozowski derivative $D_s(r)$ resolves this: given a regex $r$ and a partial string $s$, the derivative $D_s(r)$ is a new regex matching exactly the valid suffixes of $s$ in $L(r)$. If $D_s(r) \neq \emptyset$, then $s$ is a valid prefix of some string in $L(r)$.

This is formalized in detail in Completability and Regular Expressions. The implementation lives in the custom regex engine at src/regex/.

DONE Validation

The validation system verifies that Aufbau’s engines produce correct results. It is the empirical counterpart to the formal properties defined in Completability: where the spec defines what must hold, the validation system checks that it holds on concrete grammars and inputs.

Source: src/validation/

Architecture

The validation system has three layers:

Layer 3: Test Suites (per-grammar: arithmetic, stlc, fun, imp, ...)
    │
    ▼
Layer 2: Test Harness (typed completion test runner, timeout, profiling)
    │
    ▼
Layer 1: Core Algorithms (completability.rs → search → scoring)

Layer 1 is the completion pipeline and the completability checker, i.e., the same algorithms used in production. The validation system does not use separate “test” algorithms; it validates the real pipeline.

Layer 2 is a test harness that wraps Layer 1 with:

  • Per-test configuration (depth budget, initial context, expected pass/fail)
  • Timeout enforcement (default: 300 seconds per test)
  • Prefix soundness checking (verifying all prefixes, not just the final input)
  • Profiling metadata (timing, states explored, visited states)

Layer 3 is a collection of test suites, each targeting a specific grammar from examples/*.auf.

What is Validated

Prefix Soundness

The primary property validated is prefix soundness: for a given input $s$ and grammar $G$, every prefix of $s$ must be typed-completable. This is checked by sound_complete, which:

  1. Enumerates all non-whitespace prefixes of $s$
  2. For each proper prefix: parses it, computes typed completions, and calls prefix_ok
  3. For the full input: runs the complete search to find a valid completion
  4. Reports the first failing prefix (if any)

A completion system that can find completions for let x : int in x but not for let x : int in is broken: the system would accept a token that leads to a dead end. Prefix soundness catches this class of bugs by testing every intermediate state.

Expected Failures

Tests can be marked xfail (expected to fail completion). This is used for inputs that are intentionally outside the grammar’s typed language. An xfail test passes if completion fails, and fails if completion unexpectedly succeeds.

Timeout Enforcement

Each test runs in a dedicated thread with a configurable timeout. This catches performance regressions and infinite loops in the search: if the search cannot find a completion within the time budget, the test fails with a timeout diagnostic.

Test Structure

Each grammar has a dedicated test module:

GrammarDescriptionKey Properties Tested
arithmeticInteger arithmetic with +, *, parenthesesBasic completability, operator precedence
stlcSimply-typed lambda calculusVariable binding, function application, type inference
funLet-bindings, if-then-else, functionsContext extension, nested scopes, keyword separators
impImperative language (assignments, while loops)Statement sequencing, mutable state

Parseability Checking

A separate, faster validation layer checks parseability: whether all prefixes of an input can be parsed at all (without completion search). This runs in $O(n^2)$ time (one parse per prefix) and catches parser bugs independently of the type system.

Complexity Analysis

The validation system includes an empirical complexity measurement module that determines the parse-time exponent $k$ in $O(n^k)$ via log-log linear regression over increasing input sizes. This is used to detect performance regressions and to empirically verify that the parser operates within expected complexity bounds.

WIP Implementation Map

This page maps the specification sections to source code modules.

Module Structure

Spec SectionSource ModuleDescription
Grammar Specificationsrc/logic/grammar/Grammar data structures and .auf file loading
Partial Parsingsrc/logic/partial/Chart parser, segment handling, partial trees
Span Cachesrc/logic/partial/cache.rs4D memoization table for parse results
MetaParser (see Partial Parsing)src/logic/partial/meta.rsAdaptive depth search, grammar-scope caching
Binding Resolutionsrc/logic/binding/Grammar path construction (compile-time)
Runtime Bindingsrc/logic/typing/binding.rsTree path resolution (runtime)
Type Inferencesrc/logic/typing/Typing engine, rule application, type cache
Contextsrc/logic/typing/core.rsTyping context, lanes, no-propagate
Premisessrc/logic/typing/eval.rsPremise evaluation
Conclusionsrc/logic/typing/eval.rsConclusion evaluation
Synthesizersrc/logic/partial/synth.rsIncremental typed extension interface
Searchsrc/logic/search/mod.rsBest-first completion search
Scoringsrc/logic/search/scoring.rsHeuristic scoring for search states
Completabilitysrc/validation/completability.rsPrefix soundness, completion checking
Validationsrc/validation/Test harness, per-grammar test suites

Auxiliary Modules

ModuleDescription
src/regex/Custom regex engine with Brzozowski derivatives for two-level completability
src/logic/typing/ops.rsType operations: unification, subtyping
src/logic/typing/syntax.rsType parsing from grammar spec strings
src/logic/typing/rule.rsTyping rule data structures