Skip to content

Do not create RegExpContext each time#1849

Merged
kingthorin merged 2 commits into
datafaker-net:mainfrom
snuyanzin:regexp
Jun 22, 2026
Merged

Do not create RegExpContext each time#1849
kingthorin merged 2 commits into
datafaker-net:mainfrom
snuyanzin:regexp

Conversation

@snuyanzin

@snuyanzin snuyanzin commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

RegExpContext is created each time making allocation.
This could be avoided.

Even more small benchmark with methods with expressions

 @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public void one(Blackhole blackhole) {
        blackhole.consume(DATA_FAKER.expression("#{Name.firstName}"));
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public void two(Blackhole blackhole) {
        blackhole.consume(DATA_FAKER.expression("#{Name.firstName} #{Address.city}"));
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public void three(Blackhole blackhole) {
        blackhole.consume(DATA_FAKER.expression("#{Name.firstName} #{Address.city} #{Name.lastName}"));
    }

shows that it can improve throughput about 10%

like
before

Benchmark                    Mode  Cnt     Score     Error   Units
DatafakerExpressions.one    thrpt   10  7778.462 ± 272.241  ops/ms
DatafakerExpressions.three  thrpt   10  2447.599 ±  69.891  ops/ms
DatafakerExpressions.two    thrpt   10  2910.791 ±  58.837  ops/ms

after

 Benchmark                    Mode  Cnt     Score     Error   Units
 DatafakerExpressions.one    thrpt   10  9001.782 ± 608.820  ops/ms
 DatafakerExpressions.three  thrpt   10  2746.053 ± 136.003  ops/ms
 DatafakerExpressions.two    thrpt   10  3260.148 ±  93.052  ops/ms

@what-the-diff

what-the-diff Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR Summary

  • Changed Key Type of REGEXP2SUPPLIER_MAP Map
    The key of REGEXP2SUPPLIER_MAP that was previously a RegExpContext is now modified to a simpler String expression type.

  • Logic Update in resolveExpression Method
    The method resolveExpression now fetches RegExpContext using the expression string instead of the earlier context object, simplifying the process.

  • Signature Update of resExp Method
    The resExp method now accepts a string expression rather than a RegExpContext, making the method easier to use.

  • Improved Logging in resExp Method
    The logging in resExp method is now adjusted to include the string expression, enhancing the trackability of operations.

  • Parameter Order Update in RegExpContext
    The order of parameters in RegExpContext has been updated. It now uses ProviderRegistration, FakerContext, and ValueResolver making the function more intuitive to use.

@codecov-commenter

codecov-commenter commented Jun 21, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.39%. Comparing base (7d76586) to head (a0a77fb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
.../java/net/datafaker/service/FakeValuesService.java 77.77% 1 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1849      +/-   ##
============================================
- Coverage     92.42%   92.39%   -0.03%     
+ Complexity     3519     3516       -3     
============================================
  Files           344      344              
  Lines          6969     6968       -1     
  Branches        684      684              
============================================
- Hits           6441     6438       -3     
- Misses          362      366       +4     
+ Partials        166      164       -2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@asolntsev

Copy link
Copy Markdown
Collaborator

I don't understand how a simple record RegExpContext can have a significant effect on performance.
It's just a Java record.
Java can easily create and remove records. Should not be a problem.

Also, is the new code effectively the same?

  • Previously, RegExpContext contained different records for different Locales and RandomServices.
  • Now, RegExpContext contains only one record per regex. Wouldn't it mix different locales and RegExpContexts?

@kingthorin kingthorin requested review from Copilot and removed request for Copilot June 21, 2026 21:35
@kingthorin

Copy link
Copy Markdown
Collaborator

Seems like these benchmarks could be added to our test suite and that they should be expanded for multi-locale scenarios.

@snuyanzin

Copy link
Copy Markdown
Collaborator Author

Also, is the new code effectively the same?

Previously, RegExpContext contained different records for different Locales and RandomServices.
Now, RegExpContext contains only one record per regex. Wouldn't it mix different locales and RegExpContexts?

there is at least one missing piece here
previously it was static and these could be actual points.

Now it is non static. It means for every instance of Faker there is its own RegExpContext.
The only place where Locale or seed might play a role is doWith however there is a force set/reset of them.

Also there is FakerTest#testDeterministicAndNonDeterministicProvidersReturnValues checking it is different output.

Another test

        System.out.println(faker.expression("#{Name.name}#{Name.name}"));
        System.out.println(faker.expression("#{Name.name}"));
        System.out.println(faker.doWith(() -> faker.expression("#{Name.name}"), Locale.JAPANESE));
        System.out.println(faker.expression("#{Name.name}#{Name.name}"));
        Faker faker2 = new Faker();
        System.out.println(faker2.expression("#{Name.name}#{Name.name}"));
        System.out.println(faker2.expression("#{Name.name}"));
        System.out.println(faker2.doWith(() -> faker2.expression("#{Name.name}"), Locale.JAPANESE));
        System.out.println(faker2.expression("#{Name.name}#{Name.name}"));

shows

Eusebio HickleBetsy Block
Hal Wilderman DVM
西沢 恵子
Brian HansenAraceli Haley
Roscoe BorerShon Senger
Valentine Franecki
関口 萌
Mr. Garfield WillmsWalker Little

@snuyanzin

Copy link
Copy Markdown
Collaborator Author

Java can easily create and remove records. Should not be a problem.

even if it is fast it doesn't mean it is free

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public void one(Blackhole blackhole) {

        blackhole.consume(0);
    }

vs

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public void one(Blackhole blackhole) {

        blackhole.consume(new myRecord(reg, context, resolver)); // all arg are consts here
    }

shows

Benchmark                  Mode  Cnt        Score       Error   Units
DatafakerExpressions.one  thrpt   10  4508344.705 ± 96119.272  ops/ms

vs

Benchmark                  Mode  Cnt       Score      Error   Units
DatafakerExpressions.one  thrpt   10  709334.454 ± 9181.791  ops/ms

@snuyanzin

Copy link
Copy Markdown
Collaborator Author

Seems like these benchmarks could be added to our test suite and that they should be expanded for multi-locale scenarios.

IIRC the license might be an issue since there is another license for jmh

that's why we have a separate project https://github.com/datafaker-net/datafaker-benchmark

Comment thread src/main/java/net/datafaker/service/FakeValuesService.java Outdated
@asolntsev asolntsev added this to the 3.0.0 milestone Jun 22, 2026
@kingthorin kingthorin merged commit a179fdd into datafaker-net:main Jun 22, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants