Bounded Backtracking Engine
BoundedBacktrackEngine handles patterns classified NEEDS_BACKTRACKER — those containing constructs that require backtracking state: Backref nodes (backreferences) and contextual transducer -> rules.
Current status
BoundedBacktrackEngine is fully implemented as an iterative DFS engine with an explicit heap stack. The DEFAULT_BACKTRACK_BUDGET (1_000_000) is enforced: when the budget is exceeded the engine throws MatchTimeoutException. Possessive quantifiers and atomic groups are handled via PossessiveSplit and AtomicCommit instructions.
MatchResult.backtrackCount and instructionsExecuted are populated after each call.
Why these patterns need a separate engine
Two instruction types require backtracking that neither a DFA nor the Pike/Thompson parallel simulation can handle correctly on their own:
Backreferences (BackrefCheck). A backreference such as \1 matches the same text that group 1 captured earlier. The text is not known at compile time, so no DFA transition can be precomputed. Depth-first search allows re-trying paths after a backreference fails to match, which the stateless parallel NFA simulation does not support.
Contextual -> rules. Transducer rules that depend on surrounding context can require the engine to backtrack to an earlier choice point after producing output, which the stateless parallel NFA simulation does not support.
Patterns containing either construct receive NEEDS_BACKTRACKER from AnalysisVisitor.
How bounded backtracking works
The engine explores the NFA depth-first using an explicit stack of choice points — not recursive calls, so stack depth is bounded regardless of input length.
Each time the engine pushes a split alternative onto the stack it increments an operation counter. When the counter reaches the budget, the engine throws MatchTimeoutException rather than returning a wrong result or looping indefinitely.
Possessive quantifiers and atomic groups are handled specially:
PossessiveSplitforks the NFA like a normal split but marks the current choice point as non-restorable — once the possessive branch succeeds, the alternative is discarded.AtomicCommitdrains all choice points pushed by the body of an atomic group from the stack, implementing the commit-on-success semantics of(?>...).
Budget bound
The engine enforces an operation count limit (DEFAULT_BACKTRACK_BUDGET = 1_000_000). Every consuming step and every split increments the counter. When the counter reaches the budget, MatchTimeoutException is thrown rather than looping indefinitely.
The budget provides a hard wall against ReDoS but does not guarantee O(|NFA| × n) termination — a future VisitedSet layer tracking (pc, pos) pairs would add that guarantee. See optimise/regex-session-prompts/session-3-bounded-backtracker.md for the planned design.
Decision diagram
flowchart TD
A([execute called]) --> B{hint ==\nNEEDS_BACKTRACKER?}
B -- yes --> C[BoundedBacktrackEngine.execute]
B -- no --> Z([other engine])
C --> P1[push initial state\nonto backtrack stack]
P1 --> P2[pop state]
P2 --> P3{instruction\ntype?}
P3 -- CharMatch/AnyChar --> P4[consuming match:\nadvance pos, push\nno-match continuation]
P3 -- Split --> P5[push next2 alternative\nfall through to next1]
P3 -- PossessiveSplit --> P5P[push non-restorable alternative\nfall through to next1]
P3 -- AtomicCommit --> P5A[drain body choice points\nfrom stack]
P3 -- Accept --> P6[record match\nreturn MatchResult]
P3 -- BackrefCheck --> P9[read captured text\nmatch at pos\nadvance pos]
P4 --> P7{op count\n> budget?}
P5 --> P7
P9 --> P7
P7 -- yes --> P8([throw MatchTimeoutException])
P7 -- no --> P2
P6 --> H([MatchResult with\nbacktrackCount,\ninstructionsExecuted])
Trade-offs vs unbounded backtracking
| Property | Unbounded backtracking (java.util.regex) | Bounded backtracking (Orbit) |
|---|---|---|
| Correct results | Yes | Yes, up to budget |
| ReDoS risk | Yes — O(2^n) possible | No — budget enforced |
| Result on budget exceeded | N/A | MatchTimeoutException |
| Backreference support | Yes | Yes |
| Possessive quantifiers | No | Yes |
| Atomic groups | No | Yes |
| Capture semantics | Perl/leftmost | Perl/leftmost |
| Time complexity | O(2^n) worst case | O(budget) hard limit; O(|NFA|×n) with VisitedSet (planned) |
Thread safety
BoundedBacktrackEngine is not thread-safe. MetaEngine holds one shared BACKTRACKER instance; the Matcher layer ensures only one match runs on that instance at a time.