DONE Partial Trees and Forests
A partial tree $t = (V, E,\lambda,\pi,\alpha,\text{root})$ is a structure where:
- $V$: finite set of nodes.
- $E \subseteq V \times \mathbb{N} \times V$: position-indexed edges.
- $\lambda: V \to N \cup T$: node labels.
- $\pi: V \to P$: production associated with the node.
- $\alpha: V \to \mathbb{N}$: chosen alternative index.
- $\text{root} \in V$: the start node.
Parse Forest
For a completable input $s \in \Sigma^*$ and grammar $G$, the partial parse forest $\mathcal{F}(s)$ is the set of all rooted partial parse trees that match $s$.
In the implementation, partial parse forests are stored as Shared Packed Parse Forrects (SPPF) to save up space.
Partial Parser
A partial parser for grammar $G$ is a function: $$\Psi_G : \Sigma^* \to \{ \mathcal{F},\text{reject} \}$$ Mapping each valid (completable) input to a partial parse forest and invalid inputs to $\text{reject}$.
The parser is sound in the sense that every prefix of $s$ will be accepted by it and produce a forest.
Tree Paths
The space of tree paths is $\mathcal{P} = \mathbb{N}^*$, where each coordinate is an edge index pointing to a specific child. Every node $v \in V$ has a unique path $\text{path}(v)$ from the root:
$$\text{path}(v) = \begin{cases} \varepsilon & \text{if } v \text{ is root} \\ \text{path}(v’) \cdot i & \text{if } v’ \text{ is parent of } v \text{ and } v \text{ is its } i\text{-th child} \end{cases}$$
Paths are injective: $\text{path}_T(x_1) = \text{path}_T(x_2) \implies x_1 = x_2$.
We use $x[i]$ for the child at index $i$, and extend this to relative paths $x[p]$.
Terminal Status and Extensibility
A terminal node matching input $s$ has a status:
- Complete: Matched a full token. It may carry an extension derivative $D_s(r)$ for further matching.
- Partial: Matched only a prefix. It carries a remainder derivative for what must follow.
A terminal $v$ is extensible if its derivative with respect to its matched text is non-empty: $$\text{extensible}(v) \iff D_{\text{matched}(v)}(r_v) \neq \emptyset$$
A terminal can be both complete and extensible. For example, /[a-z]+/ matching foo is complete but extensible by bar. Conversely, a terminal is incomplete and not extensible only if its remainder regex is $\emptyset$ (a parse error). This distinction is critical for binding resolution where bindings to extensible terminals on the rightmost spine are treated as Partial.
Completeness and Frontiers
A node is complete by induction:
- A terminal is complete if it matched the full input.
- A non-terminal $v$ is complete $\iff$ all symbols in its production $\pi(v)$ are satisfied.
The frontier is the path to the unique incomplete node where parsing stopped. In a complete tree, there is no frontier.
Note that this is not extacly the same as extensibility. A (complete) tree will have no frontier, but can have extensible nodes.
A forest is complete $\iff$ at least one tree in it is complete.
Monotonicity
For partial parses $\mathcal{F}(s)$ extending to $\mathcal{F}(s \cdot t)$, the frontier path length is monotonically non-decreasing.
$\operatorname{front}(\mathcal{F}(s \cdot t)) \geq \operatorname{front}(\mathcal{F}(s))$
This property is a primary invariant for parser correctness; a decreasing frontier indicates an engine failure.
Input: λx:Int.
Partial Tree:
T = Expression
└─[0]→ Abstraction(abs) [complete = false]
├─[0]→ "λ"
├─[1]→ Identifier("x")
├─[2]→ ":"
├─[3]→ Type
│ └─[0]→ AtomicType
│ └─[0]→ BaseType("Int")
├─[4]→ "."
└─[5]→ Expression [Missing / End of Input]
Frontier: path [0, 5]. Choosing "x" as a completion yields a complete tree where the Abstraction node is satisfied.