Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DONE Partial Trees and Forests

Definition: Partial Tree

A partial tree $t = (V, E,\lambda,\pi,\alpha,\text{root})$ is a structure where:

  • $V$: finite set of nodes.
  • $E \subseteq V \times \mathbb{N} \times V$: position-indexed edges.
  • $\lambda: V \to N \cup T$: node labels.
  • $\pi: V \to P$: production associated with the node.
  • $\alpha: V \to \mathbb{N}$: chosen alternative index.
  • $\text{root} \in V$: the start node.

Parse Forest

Definition: Partial Parse Forest

For a completable input $s \in \Sigma^*$ and grammar $G$, the partial parse forest $\mathcal{F}(s)$ is the set of all rooted partial parse trees that match $s$.

In the implementation, partial parse forests are stored as Shared Packed Parse Forrects (SPPF) to save up space.

Partial Parser

Definition: Partial Parser

A partial parser for grammar $G$ is a function: $$\Psi_G : \Sigma^* \to \{ \mathcal{F},\text{reject} \}$$ Mapping each valid (completable) input to a partial parse forest and invalid inputs to $\text{reject}$.

The parser is sound in the sense that every prefix of $s$ will be accepted by it and produce a forest.

Tree Paths

Definition: Tree Path

The space of tree paths is $\mathcal{P} = \mathbb{N}^*$, where each coordinate is an edge index pointing to a specific child. Every node $v \in V$ has a unique path $\text{path}(v)$ from the root:

$$\text{path}(v) = \begin{cases} \varepsilon & \text{if } v \text{ is root} \\ \text{path}(v’) \cdot i & \text{if } v’ \text{ is parent of } v \text{ and } v \text{ is its } i\text{-th child} \end{cases}$$

Lemma: Path Injectivity

Paths are injective: $\text{path}_T(x_1) = \text{path}_T(x_2) \implies x_1 = x_2$.

We use $x[i]$ for the child at index $i$, and extend this to relative paths $x[p]$.

Terminal Status and Extensibility

A terminal node matching input $s$ has a status:

  • Complete: Matched a full token. It may carry an extension derivative $D_s(r)$ for further matching.
  • Partial: Matched only a prefix. It carries a remainder derivative for what must follow.
Definition: Extensibility

A terminal $v$ is extensible if its derivative with respect to its matched text is non-empty: $$\text{extensible}(v) \iff D_{\text{matched}(v)}(r_v) \neq \emptyset$$

A terminal can be both complete and extensible. For example, /[a-z]+/ matching foo is complete but extensible by bar. Conversely, a terminal is incomplete and not extensible only if its remainder regex is $\emptyset$ (a parse error). This distinction is critical for binding resolution where bindings to extensible terminals on the rightmost spine are treated as Partial.

Completeness and Frontiers

A node is complete by induction:

  • A terminal is complete if it matched the full input.
  • A non-terminal $v$ is complete $\iff$ all symbols in its production $\pi(v)$ are satisfied.
Definition: Frontier

The frontier is the path to the unique incomplete node where parsing stopped. In a complete tree, there is no frontier.

Note that this is not extacly the same as extensibility. A (complete) tree will have no frontier, but can have extensible nodes.

Definition: Forest Completeness

A forest is complete $\iff$ at least one tree in it is complete.

Monotonicity

For partial parses $\mathcal{F}(s)$ extending to $\mathcal{F}(s \cdot t)$, the frontier path length is monotonically non-decreasing.

Lemma: Frontier Monotonicity

$\operatorname{front}(\mathcal{F}(s \cdot t)) \geq \operatorname{front}(\mathcal{F}(s))$

This property is a primary invariant for parser correctness; a decreasing frontier indicates an engine failure.

Example: Lambda Completion

Input: λx:Int.

Partial Tree:

T = Expression
     └─[0]→ Abstraction(abs)  [complete = false]
            ├─[0]→ "λ"
            ├─[1]→ Identifier("x")
            ├─[2]→ ":"
            ├─[3]→ Type
            │      └─[0]→ AtomicType
            │             └─[0]→ BaseType("Int")
            ├─[4]→ "."
            └─[5]→ Expression  [Missing / End of Input]

Frontier: path [0, 5]. Choosing "x" as a completion yields a complete tree where the Abstraction node is satisfied.