DONE Search
The search engine explores the space of token-level extensions to find a complete, well-typed expression. It uses a priority queue ordered by a heuristic scoring function.
Source: src/logic/search/mod.rs
This chapter is meant to be read after the completion overview and synthesizer chapters. It only describes search-side control flow: frontier management, pruning, and stopping conditions. Token generation lives in Synthesizer; the numeric heuristic itself lives in Scoring.
Operational Shape
The current implementation has two phases:
- A greedy fast path (
try_greedy_complete) that repeatedly takes the locally best child and returns immediately if it reaches a complete root. - A best-first frontier search (
search_complete) that uses aBinaryHeaponce greedy completion stalls.
Both phases use the same synthesizer operations and the same SearchConfig budgets.
Search Space
The search space is operationally an exploration over typed states $(T, \pi, d)$ where:
- $Q$ is the set of all such typed states
- $T$ is the current typed parse forest (
TypedAST) - $\pi \in \Sigma_t^*$ is the sequence of regex completion tokens chosen so far
- $d \in \mathbb{N}$ is the search depth
- $q_0 = (T_0, [], 0)$ is the initial state, where $T_0$ is the typed parse of the original input
- $F \subseteq Q$ is the set of states whose typed forest contains a complete root
Let $\Sigma_t$ be the set of regex completion tokens produced by completions_ctx. For a state $q$ and token $t \in \Sigma_t$, the search asks the synthesizer to try up to $n_{\max}$ concrete witnesses for $t$, producing a finite child set $\Delta(q, t) \subseteq Q$.
A search state $q = (T, \pi, d)$ consists of:
- $T$: a typed parse forest (the current
TypedAST) - $\pi$: the regex-token completion path. This records completion tokens, not the concrete witness strings tried for regex expansion.
- $d$: the number of successful extensions applied since $q_0$
Algorithm
The current implementation uses two phases over the same state representation:
- Build a
Synthesizerfor $s_0$ and parse the initial typed forest $T_0 = \text{partial_typed_ctx}(s_0, \Gamma)$. Reject withInvalidif this fails. - Seed the visited set with
T_0.text(). - Run a greedy fast path (
try_greedy_complete) for up to $d_{\max}$ steps:- ask the synthesizer for
completions_ctx(\Gamma) - expand each regex token using up to $n_{\max}$ concrete witnesses
- keep the locally best child (grounding first, then open slots, then score)
- return immediately if a complete root is reached
- ask the synthesizer for
- If greedy completion stalls, initialize a max-heap $H$ with $q_0 = (T_0, [], 0)$ scored by Scoring.
- Loop: pop the highest-scored state $q = (T, \pi, d)$ from $H$.
- If $T$ has a complete root, return
Success. - If $d \geq d_{\max}$, skip expansion.
- If the total explored-state budget $s_{\max}$ has been hit, stop and return
Exhausted. - Otherwise, set the synthesizer input to
T.text(), computecompletions_ctx(\Gamma), and expand each token viaextend_all_with_regex_candidates(..., n_{\max}). - Discard children whose typed forest is empty or whose
text()has already been visited. - If any child has a grounded root (
ty != Any), discard children whose roots are allAny. - Sort remaining children by grounded-root count, then score, keep only the top $b_{\max}$, and push them onto $H$.
- If $T$ has a complete root, return
- If $H$ becomes empty, return
Exhausted.
The module is documented as “Priority-guided DFS search.” This name is historical and relative: the predecessor algorithm was pure BFS, expanding every completion at every depth level before proceeding to the next. The current algorithm replaced that with a BinaryHeap (max-heap), making it structurally more depth-oriented in practice since high-scoring deep states are preferred over low-scoring shallow ones. This label applies to the heap-driven second phase; the greedy fast path runs before any frontier is built.
However, it is not true DFS. True DFS uses a stack (LIFO) and commits to fully exploring one branch before backtracking. This algorithm uses a priority queue: the next state popped is always the globally highest-scored state, regardless of which branch produced it. It freely interleaves exploration across branches whenever a state from a different branch scores higher than the current branch’s frontier. The depth bound $d_{\max}$ is a safety cutoff, not a stack discipline.
Concretely: if branch $A$ at depth 3 scores $2.1$ and branch $B$ at depth 1 scores $2.3$, the algorithm explores $B$ next. True DFS would continue with $A$. The algorithm makes no commitment to a single branch.
Formally, this is best-first search (BeFS) with scoring function $|\sigma|$. The “DFS” label reflects the intent (explore depth before breadth, relative to the predecessor) not the mechanism.
Configuration
The search configuration $\kappa = (d_{\max}, n_{\max}, s_{\max}, b_{\max})$ in SearchConfig::default() currently consists of:
- $d_{\max} = 10$: maximum search depth (number of token extensions from the initial state)
- $n_{\max} = 1$: maximum number of concrete examples to try per regex token during regex expansion
- $s_{\max} = 96$: total explored-state budget before returning
Exhausted - $b_{\max} = 12$: beam width per expanded state after local sorting
Result
The search returns one of three outcomes:
- $\text{Success}(s’, T, \pi, d)$: a complete input $s’$, the first complete typed root $T$, the regex-token completion path $\pi$, and the depth $d$
- $\text{Exhausted}(d_{\max}, n, V)$: the frontier emptied or the state budget was hit after exploring $n$ states; $V$ is the list of deduplicated reconstructed inputs visited so far
- $\text{Invalid}(m)$: the initial input $s_0$ could not be partially typed; $m$ is a diagnostic message
Search-Side Checks
The search code itself performs only lightweight checks. The meaning of the attached types comes from earlier chapters and from partial_typed_ctx; this chapter only documents what the search layer does with those typed forests.
A child produced during expansion is kept only if:
- regex expansion produced a non-empty
TypedAST - its reconstructed input
text()has not been visited before - after the local grounding preference, it survives the beam-width cutoff
Search does not run a separate Valid/Partial/Invalid predicate here; it relies on the synthesizer to produce typed child forests in the first place.
During one expansion step, if any child has at least one root whose type is not Any, the implementation drops children whose roots are all Any before score ordering. This is a local search policy, not part of the numeric scoring function.
A state is accepted as soon as its TypedAST contains a complete root (TypedNode::is_complete()). In normal use this corresponds to a complete parse from the grammar start symbol, because the parser roots are generated from the grammar entry point.
State Deduplication
The search maintains a HashSet<String> keyed by the reconstructed surface input tree.text() of each state. Two states with identical reconstructed inputs are considered equivalent, even if they arrived via different token paths. This prevents re-exploration of states that parse the same surface string.
For how the surviving states are numerically ranked, continue to Scoring.