Architecture Overview
Audience: Developer contributing to the library; architect evaluating it Purpose: Explain the internal design so a new contributor can navigate the codebase.
Compilation Pipeline
The compilation pipeline transforms a regular expression into a matcher. It runs five analysis passes before compilation.
flowchart TD
A[Source Pattern] --> B[Parser]
B --> C[AST]
C --> D[HIR]
D --> E[HIR Analysis Passes]
E --> F[Prefilter Selection]
F --> G[Compiler]
G --> H[Prog + Metadata]
H --> J[ProgOptimiser]
J --> I[Pattern object]
style A fill:#f9f,stroke:#333
style I fill:#f9f,stroke:#333
- Parser — hand-written recursive-descent; produces
ExprAST - HIR — builds High-Level Intermediate Representation from the AST
- HIR Analysis Passes — five passes over the HIR tree:
- Pass 1: HIR tree construction
- Pass 2: literal prefix/suffix extraction →
LiteralSet - Pass 3: one-pass safety check
- Pass 4: output acyclicity and bounded-length analysis
- Pass 5: engine classification →
EngineHint; prefilter selection →Prefilter
- Compiler — Thompson construction to
Prog(immutable instruction array) - ProgOptimiser — post-compile pass;
foldEpsilonChainsrewrites every PC field to skipEpsilonJumpchains; recurses into lookahead/lookbehind sub-programs; runs once insidePattern.buildCompileResultbefore the finalProgis sealed - Pattern object — immutable, thread-safe; holds
Prog,Metadata,Prefilter
Metadata carries: hint, prefilter, groupCount, minLength, maxLength, startAnchored, endAnchored.
Transducer Compilation Path
Transducer expressions follow the same pipeline with two modifications injected at steps 3 and 5. Parser.parse produces a Pair(inputExpr, outputExpr, weight) AST node instead of a plain Expr; AnalysisVisitor Pass 7 validates output acyclicity and Pass 8 assigns EngineHint.PIKEVM_ONLY to the pattern. In step 4, Pattern.buildProg emits input-matching instructions for the left side of the Pair and then emits TransOutput instructions for the right side; each TransOutput carries a delta string (a literal or a backreference token such as "$1" or "${name}"). ProgOptimiser.foldEpsilonChains folds epsilon chains across TransOutput instructions as it does for all other instruction types. The resulting Transducer object holds the Prog, the original Pair AST (retained for invert()), and the flags.
flowchart TD
A[Source Transducer Expression] --> B[Parser]
B --> C["Pair(inputExpr, outputExpr, weight) AST"]
C --> D[HIR]
D --> E["HIR Analysis Passes
Pass 7: acyclicity check
Pass 8: PIKEVM_ONLY hint"]
E --> F[Prefilter Selection]
F --> G["Compiler
input side → match instructions
output side → TransOutput instructions"]
G --> H["Prog + Metadata (PIKEVM_ONLY)"]
H --> J[ProgOptimiser]
J --> I["Transducer object
(Prog + Pair AST + flags)"]
style A fill:#f9f,stroke:#333
style I fill:#f9f,stroke:#333
At match time, PikeVmEngine executes the Prog and appends each TransOutput delta to an output buffer, resolving backreferences against the current capture state. On Accept, the buffer becomes MatchResult.output(). See docs/transducer-api-reference.md for the full method contracts.
Meta-Engine Dispatch
MetaEngine runs in two phases: prefilter, then engine.
flowchart TD
A[match request] --> B{prefilter trivial?}
B -- yes --> D
B -- no --> C[prefilter.findFirst]
C --> C2{candidate found?}
C2 -- no --> Z[return no-match]
C2 -- yes --> D[read EngineHint]
D --> E{EngineHint}
E -- NEEDS_BACKTRACKER --> F[BoundedBacktrackEngine]
E -- all others --> G[PikeVmEngine]
F --> H[MatchResult]
G --> H
DFA_SAFE routes to LazyDfaEngine. ONE_PASS_SAFE routes to OnePassDfaEngine. NEEDS_BACKTRACKER routes to BoundedBacktrackEngine. All other hints route to PikeVmEngine.
Module Dependency Diagram
graph TD
A[orbit-core] --> B[HIR Analysis]
A --> C[Prefilter Module]
A --> D[Engine Implementations]
A --> E[Public API]
F[orbit-grammar] --> G[Grammar Compiler]
I[orbit-benchmarks] --> J[JMH Benchmarks]
I --> A
- orbit-core — regex engine, parser, HIR, prefilter, API; no external dependencies
- orbit-grammar — optional grammar layer; builds on orbit-core
- orbit-benchmarks — JMH benchmark harness; depends on orbit-core
Package Structure
com.orbit
├── api # Public API: Pattern, Matcher, Transducer, Token, ErrorToken, MatchToken
├── engine # Engine implementations: PikeVmEngine, BoundedBacktrackEngine,
│ # LazyDfaEngine (stub routing), OnePassDfaEngine (stub routing)
├── hir # HIR tree nodes and AnalysisVisitor (five passes)
├── prefilter # Prefilter implementations: NoopPrefilter, VectorLiteralPrefilter,
│ # AhoCorasickPrefilter, LiteralIndexOfPrefilter
├── prog # Compiled program: Prog, Instr subtypes, Metadata, ProgOptimiser
├── util # EngineHint, PatternFlag
└── parse # Parser and AST (Expr hierarchy)
Thread-Safety Model
| Type | Thread-safe? | Notes |
|---|---|---|
Pattern | Yes | Immutable after construction |
Matcher | No | One per thread; create from shared Pattern |
Transducer | Yes | Immutable; applyUp, applyDown, tokenize, invert, compose all implemented |
LazyDfaEngine | Yes | DfaStateCache uses ConcurrentHashMap |
PikeVmEngine | No | Matcher layer provides isolation |
BoundedBacktrackEngine | No | Matcher layer provides isolation |
Prog | Yes | Immutable after construction |
Prefilter | Yes | Immutable records |
Extension Points
Adding a new engine
- Implement the
Engineinterface:execute(Prog, String, int, int) → MatchResult - Add a corresponding
EngineHintvalue - Wire the routing in
MetaEngine.getEngine(EngineHint)
Adding new Instr types
- Define a new
Instrsubclass - Update
Compilerto emit it - Handle the new instruction in
PikeVmEngineandBoundedBacktrackEngine
Known Limitations
Matcheris not thread-safe — one per thread.Transducercomposed results from non-graph-eligible transducers (those with backreferences on the output side) are one-shot —invert()and furthercompose()on them throwNonInvertibleTransducerException. Composed results from graph-eligible (literal-output) transducers useTransducerGraphfor structural composition and are fully invertible.MatchTimeoutExceptionis thrown byBoundedBacktrackEnginewhen the budget is exceeded — callers onNEEDS_BACKTRACKERpatterns must catch it.LazyDfaEngineandOnePassDfaEngineare wired intoMetaEngine.getEngineforDFA_SAFEandONE_PASS_SAFEpatterns respectively.
Glossary
| Type | Package | Description |
|---|---|---|
Pattern | com.orbit.api | Compiles patterns; creates Matcher instances |
Matcher | com.orbit.api | Executes matching; holds per-match state |
Transducer | com.orbit.api | Immutable compiled transducer; applyUp, applyDown, tokenize, invert, compose |
EngineHint | com.orbit.util | Classifies patterns for engine selection |
Prog | com.orbit.prog | Immutable instruction array |
Metadata | com.orbit.prog | Per-Prog analysis results: hint, prefilter, lengths, anchors |
Prefilter | com.orbit.prefilter | Fast-path literal scan before engine dispatch |
Instr | com.orbit.prog | Primitive operation in a compiled program |