Skip to content

perf: accelerate ordered string scans with a regex byte-class prefilter#51

Merged
JeanExtreme002 merged 2 commits into
mainfrom
jeanextreme002/fast-string-range-scan
Jun 5, 2026
Merged

perf: accelerate ordered string scans with a regex byte-class prefilter#51
JeanExtreme002 merged 2 commits into
mainfrom
jeanextreme002/fast-string-range-scan

Conversation

@JeanExtreme002
Copy link
Copy Markdown
Owner

@JeanExtreme002 JeanExtreme002 commented Jun 5, 2026

Problem

Ordered string comparisons — BIGGER_THAN, SMALLER_THAN, *_OR_EXACT_VALUE, and VALUE_BETWEEN — had no C-level shortcut. Unlike the EXACT_VALUE path (which uses bytes.find), they fell into the pure-Python fallback in scan_memory, stepping every byte (step = 1 for strings) and decoding a window with int.from_bytes per offset.

Over a real process's address space that's ~1 billion Python iterations, so a single search_by_value_between(str, …) took minutes. In the test suite, test_search_by_string_between alone was ~300s and gated the whole macOS run (xdist --dist=loadfile pins test_editor.py to one worker).

Fix

Strings compare big-endian, so a fixed-width window can only satisfy an ordered comparison when its first byte lies in a known range. The fast path:

  1. Builds a regex byte class [lo-hi] for the candidate first bytes and locates them with re.finditer — whose C engine skips the long NUL runs of reserved/zeroed memory, exactly like bytes.find does for EXACT.
  2. Accepts a candidate outright when its first byte is strictly inside the bound (the order is already decided), decoding the full window only on the rare boundary tie.

NOT_EXACT_VALUE / NOT_VALUE_BETWEEN keep the byte-by-byte loop (their match set is dense, so a prefilter wouldn't help), and numerics with unusual sizes (3/6/7 bytes, little-endian) are untouched.

Results

scenario before after
sparse 150 MB (mostly-NUL, realistic) 22.4 s 0.71 s 31x
dense 40 MB, full-range (matches every byte) 7.3 s 5.9 s 1.24x — no regression
test_search_by_string_between ~300 s ~13 s ~24x

Even the pathological full-range case is faster, because the "accept outright" path skips the per-window int.from_bytes the old loop always paid — so no density guard is needed.

Correctness review

The first-byte-dominance shortcut is only valid under two preconditions, both verified:

  • Equal width. value_to_bytes builds a ctypes buffer of exactly bufflength bytes and NUL-pads, so the target reaching scan_memory is always exactly target_value_size wide (searching "AB" with bufflength=20b"AB" + b"\x00"*18). With equal width, a strictly-greater/smaller first byte determines the full comparison — no false positives or negatives.
  • lo <= hi. A reversed VALUE_BETWEEN (start > end) would compile to a [hi-lo] class and raise re.error: bad character range. The byte-by-byte loop returns [] for start > end, so the fast path now guards lo_byte > hi_byte and returns empty to match. (Caught during review; covered by a regression test.)
  • Compiled with no flags — in particular never re.IGNORECASE, which folds ASCII case inside a class and would over-match a range overlapping A-Z/a-z. Documented inline so it isn't introduced later.

Output is byte-for-byte identical to the previous loop:

  • Verified against a byte-by-byte reference over 20k randomized cases (sizes 1–20, empty buffers, 0x00/0xff extremes, boundary ties, reversed ranges, and regex-special bytes [ ] ^ - \ & ~ |), with FutureWarning treated as an error.
  • re.escape confirmed to yield an exact inclusive ordinal class for all 256 byte values.
  • New deterministic tests in test_scan.py (incl. reversed-range and regex-special bounds) + hypothesis property tests in test_scan_properties.py.

References consulted: Python re docs (bytes patterns are matched by ordinal and are locale-independent without re.LOCALE; IGNORECASE ASCII-folds inside classes; DOTALL/MULTILINE don't affect class membership), CPython issue #74534 / bpo-30349 (the nested-set FutureWarning — not scheduled to become an error, and avoided entirely by escaping endpoints).

348 passed, 13 skipped; flake8 and mypy clean. Independent of the dyld-shared-cache fix (#50) — different files, composes cleanly.

@github-actions github-actions Bot added lib Library changes (PyMemoryEditor/) tests Test changes (tests/) labels Jun 5, 2026
…refilter

Ordered string comparisons (>, <, >=, <=, between) had no C-level shortcut like
the EXACT path's bytes.find — they stepped every byte in Python and decoded a
window per offset, so a single search_by_value_between over a process address
space took minutes.

A string window can only satisfy an ordered comparison when its first byte
falls in a known range (strings compare big-endian, so the first byte dominates
the order). Locate those candidates with a regex byte class, whose C engine
skips the long NUL runs of reserved/zeroed memory; accept a candidate outright
unless its first byte ties a bound, where the full window is decoded and checked.

Measured: 31x on a sparse 150 MB region, and still 1.24x (no regression) on a
pathological full-range scan that matches every byte. test_search_by_string_between
drops from ~300s to ~13s.

Correctness:
- Width is guaranteed: value_to_bytes pads the target to exactly bufflength
  bytes, so the first-byte-dominance shortcut is sound.
- Reversed VALUE_BETWEEN (start > end) would compile to a '[hi-lo]' class and
  raise re.error; guard it to return empty, matching the byte-by-byte loop.
- Compiled with no flags (notably never re.IGNORECASE, which folds ASCII case
  inside a class and would over-match).
- Output verified byte-for-byte against the reference over 20k randomized cases
  (incl. reversed ranges, regex-special bytes, sizes 1-20), plus deterministic
  and hypothesis property tests.
@JeanExtreme002 JeanExtreme002 force-pushed the jeanextreme002/fast-string-range-scan branch from 5c604ec to ea432f6 Compare June 5, 2026 04:33
The string optimization made two claims in docs/guide/searching.md stale:
- the ordered-comparison loop is no longer uniformly pure-Python — ordered
  string scans (>, <, between) now run through the regex byte-class prefilter
  in C, independent of the NumPy speed extra;
- the 'str/bytes scans: pure-Python loop' table row over-generalized.

Also document the string ordering semantics that the fast path relies on:
str compares UTF-8 bytes lexicographically (big-endian), the shorter value is
NUL-padded to bufflength, and a reversed VALUE_BETWEEN range matches nothing.
@github-actions github-actions Bot added the docs Documentation changes (docs/) label Jun 5, 2026
@JeanExtreme002 JeanExtreme002 merged commit e28177f into main Jun 5, 2026
17 checks passed
@github-actions github-actions Bot deleted the jeanextreme002/fast-string-range-scan branch June 5, 2026 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation changes (docs/) lib Library changes (PyMemoryEditor/) tests Test changes (tests/)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant