Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DONE Validation

The validation system verifies that Aufbau’s engines produce correct results. It is the empirical counterpart to the formal properties defined in Completability: where the spec defines what must hold, the validation system checks that it holds on concrete grammars and inputs.

Source: src/validation/

Architecture

The validation system has three layers:

Layer 3: Test Suites (per-grammar: arithmetic, stlc, fun, imp, ...)
    │
    ▼
Layer 2: Test Harness (typed completion test runner, timeout, profiling)
    │
    ▼
Layer 1: Core Algorithms (completability.rs → search → scoring)

Layer 1 is the completion pipeline and the completability checker, i.e., the same algorithms used in production. The validation system does not use separate “test” algorithms; it validates the real pipeline.

Layer 2 is a test harness that wraps Layer 1 with:

  • Per-test configuration (depth budget, initial context, expected pass/fail)
  • Timeout enforcement (default: 300 seconds per test)
  • Prefix soundness checking (verifying all prefixes, not just the final input)
  • Profiling metadata (timing, states explored, visited states)

Layer 3 is a collection of test suites, each targeting a specific grammar from examples/*.auf.

What is Validated

Prefix Soundness

The primary property validated is prefix soundness: for a given input $s$ and grammar $G$, every prefix of $s$ must be typed-completable. This is checked by sound_complete, which:

  1. Enumerates all non-whitespace prefixes of $s$
  2. For each proper prefix: parses it, computes typed completions, and calls prefix_ok
  3. For the full input: runs the complete search to find a valid completion
  4. Reports the first failing prefix (if any)

A completion system that can find completions for let x : int in x but not for let x : int in is broken: the system would accept a token that leads to a dead end. Prefix soundness catches this class of bugs by testing every intermediate state.

Expected Failures

Tests can be marked xfail (expected to fail completion). This is used for inputs that are intentionally outside the grammar’s typed language. An xfail test passes if completion fails, and fails if completion unexpectedly succeeds.

Timeout Enforcement

Each test runs in a dedicated thread with a configurable timeout. This catches performance regressions and infinite loops in the search: if the search cannot find a completion within the time budget, the test fails with a timeout diagnostic.

Test Structure

Each grammar has a dedicated test module:

GrammarDescriptionKey Properties Tested
arithmeticInteger arithmetic with +, *, parenthesesBasic completability, operator precedence
stlcSimply-typed lambda calculusVariable binding, function application, type inference
funLet-bindings, if-then-else, functionsContext extension, nested scopes, keyword separators
impImperative language (assignments, while loops)Statement sequencing, mutable state

Parseability Checking

A separate, faster validation layer checks parseability: whether all prefixes of an input can be parsed at all (without completion search). This runs in $O(n^2)$ time (one parse per prefix) and catches parser bugs independently of the type system.

Complexity Analysis

The validation system includes an empirical complexity measurement module that determines the parse-time exponent $k$ in $O(n^k)$ via log-log linear regression over increasing input sizes. This is used to detect performance regressions and to empirically verify that the parser operates within expected complexity bounds.