Feature or enhancement
Proposal:
I noticed that for regexes of the form
regex = re.compile("^foo", re.MULTILINE)
regex.search(...)
there's a character by character loop calling SRE(match) every iteration. That's significantly slower than the regex
regex = re.compile("foo...")
regex.search(...)
which does a special prefix scan for "foo" without having to call SRE(match) on each character:
|
/* pattern starts with a literal character */ |
|
/* pattern starts with a known prefix. use the overlap |
|
table to skip forward as fast as we possibly can */ |
I would expect ^foo and foo to have more or less identical performance. A simple patch like this fixes that.
diff --git a/Modules/_sre/sre_lib.h b/Modules/_sre/sre_lib.h
index df377905bfa..70de4cccefd 100644
--- a/Modules/_sre/sre_lib.h
+++ b/Modules/_sre/sre_lib.h
@@ -1855,6 +1855,18 @@ SRE(search)(SRE_STATE* state, SRE_CODE* pattern)
return 0;
}
while (status == 0 && ptr < end) {
+ if (pattern[0] == SRE_OP_AT &&
+ pattern[1] == SRE_AT_BEGINNING_LINE &&
+ !SRE_IS_LINEBREAK((int) ptr[-1]))
+ {
+ /* fast-forward to the next newline character */
+ while (ptr < end && !SRE_IS_LINEBREAK((int) *ptr)) {
+ ptr++;
+ }
+ if (ptr >= end) {
+ return 0;
+ }
+ }
ptr++;
RESET_CAPTURE_GROUP();
TRACE(("|%p|%p|SEARCH\n", pattern, ptr));
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs
Feature or enhancement
Proposal:
I noticed that for regexes of the form
there's a character by character loop calling
SRE(match)every iteration. That's significantly slower than the regexwhich does a special prefix scan for
"foo"without having to callSRE(match)on each character:cpython/Modules/_sre/sre_lib.h
Line 1747 in 42d645a
cpython/Modules/_sre/sre_lib.h
Lines 1775 to 1776 in 42d645a
I would expect
^fooandfooto have more or less identical performance. A simple patch like this fixes that.Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs