DONE Synthesizer
The synthesizer is the stateful interface between the parser and the search engine. It wraps incremental parsing with type-filtered extension operations, providing the search with a clean abstraction: given the current input, what are the valid next tokens?
Source: src/logic/partial/synth.rs
This chapter is meant to be read before Search. It only covers token generation and input extension; frontier management is described in Search, and numeric ranking is described in Scoring.
Definition
A synthesizer is a stateful triple:
$$\Sigma_s = (G, \mathcal{M}, s)$$
where:
- $G$ is a grammar
- $\mathcal{M}$ is a
MetaParserinstance layered on top of the partial parser, with cached depth state - $s \in \Sigma^*$ is the current input string
The synthesizer exposes the following operations:
| Operation | Signature | Description |
|---|---|---|
partial | partial(&mut self) -> Result<PartialAST, String> | Parse current input into a partial forest |
completions | completions(&mut self) -> CompletionSet | Type-filtered completion set under the empty context |
completions_ctx | completions_ctx(&mut self, ctx: &Context) -> CompletionSet | Type-filtered completion set in context $\Gamma$ |
try_extend | try_extend(&mut self, token: &str, ctx: &Context) -> Result<(TypedAST, String), String> | Parse $s \cdot t$, return the typed result without mutation |
extend | extend(&mut self, token: &str, ctx: &Context) -> Result<TypedAST, String> | Parse $s \cdot t$ and commit to the new input |
extend_with_regex | extend_with_regex(&mut self, token: &DerivativeRegex, ctx: &Context) -> Option<(TypedAST, String)> | Extend with a regex token, returning the first valid candidate |
extend_all_with_regex | extend_all_with_regex(&mut self, token: &DerivativeRegex, ctx: &Context, max_examples: usize) -> Vec<(TypedAST, String)> | Extend with a regex token, returning up to max_examples valid candidates |
complete | complete(&mut self) -> Option<TypedNode> | Return a complete typed root when one exists |
Extension Semantics
Token Extension
Given input $s$ and candidate token $t$, the parse extension function attempts three concatenations in order:
- $s \cdot_\text{auto} t$: with automatic separator insertion (see below)
- $s \cdot t$: direct concatenation
- $s \cdot \text{’ ’} \cdot t$: with explicit space
The first successful parse is returned.
The automatic separator heuristic inserts a space between $s$ and $t$ when:
$$\text{last}(s) \in \mathcal{W} \land \text{first}(t) \in \mathcal{W}$$
where $\mathcal{W} = \{a\text{-}z, A\text{-}Z, 0\text{-}9, _\}$ is the set of word characters.
The auto-separator is a syntactic heuristic, not a grammatical guarantee. It handles the common case where two adjacent identifiers or keywords require whitespace separation. The fallback to direct concatenation and explicit space ensures no valid extension is missed; at worst, the heuristic wastes one parse attempt.
Regex Extension
When the next token is a DerivativeRegex $r$ (a regex that can still match more input), the synthesizer cannot directly concatenate $r$ as a string. Instead, it gathers concrete candidate strings and tries each:
- Grammar seeds: word-like special tokens and raw types collected from the grammar
- Primary example: $r.\text{example}()$, the canonical example string of the regex
- Tree terminals: collect terminal text from the current typed tree
Candidates are tried in this order via try_extend. extend_with_regex returns the first valid extension; extend_all_with_regex keeps distinct successful extensions up to the requested bound.
The candidate ordering is intentional: reusing grammar-derived literals and already-seen terminals tends to produce more coherent completions than inventing arbitrary new text. This is a heuristic for quality, not correctness; any valid candidate would produce a correct completion.
Step-by-step: building an expression token by token
This section traces exactly what the synthesizer does when the search engine calls extend repeatedly to build the expression let x = 1 from an empty input. The grammar assumed is a simple expression language where a let-binding has the form let <ident> = <expr>.
Initial state. The synthesizer starts with $s = \varepsilon$ (the empty string). partial() returns an empty forest; every nonterminal in $G$ is a candidate root. completions_ctx with any context will return the set of tokens that can begin a valid expression — including let.
Step 1: extend with let. The search engine calls extend(\texttt{"let"}, \Gamma). The auto-separator check sees $\text{last}(\varepsilon)$ is undefined so no separator is inserted. The MetaParser parses "let" and returns a partial forest with one active parse: a let-binding with three remaining holes (<ident>, =, <expr>). The synthesizer commits: $s \leftarrow \texttt{“let”}$.
After this step, completions_ctx returns only tokens that can extend "let" to a syntactically and type-valid string. In a well-designed grammar, that set consists entirely of identifiers matching [a-z][a-z0-9_]*.
Step 2: extend with x. The auto-separator check: $\text{last}(\texttt{“let”}) = \texttt{t} \in \mathcal{W}$ and $\text{first}(\texttt{“x”}) = \texttt{x} \in \mathcal{W}$, so a space is automatically inserted. The MetaParser parses "let x". The partial forest now has the <ident> hole filled; the remaining holes are = and <expr>. The synthesizer commits: $s \leftarrow \texttt{“let x”}$.
completions_ctx now returns only =, since that is the only token that can advance the let-binding parse.
Step 3: extend with =. $\text{last}(\texttt{“let x”}) = \texttt{x} \in \mathcal{W}$, $\text{first}(\texttt{“=”}) = \texttt{=} \notin \mathcal{W}$, so no auto-separator. The MetaParser parses "let x =". The partial forest has one hole remaining: <expr>. The synthesizer commits: $s \leftarrow \texttt{“let x =”}$.
completions_ctx with a typing context $\Gamma$ now filters: any token that begins a valid expression and whose eventual type is compatible with the binding context. Numeric literals (1, 2, …) and any bound variable in $\Gamma$ are valid.
Step 4: extend with 1. $\text{last}(\texttt{“let x =”}) = \texttt{=} \notin \mathcal{W}$, no auto-separator. The MetaParser parses "let x = 1". The partial forest now contains a complete parse rooted at the let-binding nonterminal. complete() returns Some(root). The search engine can now stop on this state.
The display in the widget mirrors what the synthesizer’s internal state looks like at each step: the accumulated string $s$, which holes in the parse tree have been filled, and what remains. Clicking any offered token fires a Web Audio blip and advances the state exactly as extend would.
Depth Retry
When completion filtering fails on the first pass, the synthesizer retries with a fresh MetaParser in a bounded local window:
$$d_\text{start} = d + 1, \quad d_\text{max} = d + 2$$
where $d$ is the depth used by the initial parse.
This makes sure the parsing gathers anough trees to produce a typed alternative. It’s a heuristics, and is faillible. Bad design but works enough.