Skip to content

[generator.c] Use RB_ENC_CODERANGE to check the cached coderange before calling rb_enc_str_coderange#974

Merged
byroot merged 1 commit intoruby:masterfrom
samyron:sm/coderange
Apr 19, 2026
Merged

[generator.c] Use RB_ENC_CODERANGE to check the cached coderange before calling rb_enc_str_coderange#974
byroot merged 1 commit intoruby:masterfrom
samyron:sm/coderange

Conversation

@samyron
Copy link
Copy Markdown
Contributor

@samyron samyron commented Apr 19, 2026

This PR introduces a function json_str_coderange to first check the cached coderange via RB_ENC_CODERANGE before calling rb_enc_str_coderange.

As per the samply profiler when profiling activitypub.json, we can see we're spending 3.1% of the time in rb_enc_str_coderange.

image

The code is spending almost all of the time in the early exit path + returning to the caller:

image

It could be possible this is just an artifact of the benchmark. We load the activitypub.json once, then constantly generate JSON. After the first scan, every string should have it's coderange known. I'm not sure if this holds in general though.

I did find pretty much the same code in re.c - str_coderange.

Benchmarks

Benchmarks run on an M1 Macbook Air.

== Encoding activitypub.json (52595 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     2.840k i/100ms
Calculating -------------------------------------
               after     28.777k (± 1.7%) i/s   (34.75 μs/i) -    144.840k in   5.034576s

Comparison:
              before:    27075.2 i/s
               after:    28777.2 i/s - 1.06x  faster


== Encoding citm_catalog.json (500298 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   141.000 i/100ms
Calculating -------------------------------------
               after      1.419k (± 1.1%) i/s  (704.61 μs/i) -      7.191k in   5.067439s

Comparison:
              before:     1360.1 i/s
               after:     1419.2 i/s - 1.04x  faster


== Encoding twitter.json (466906 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   299.000 i/100ms
Calculating -------------------------------------
               after      2.954k (± 1.3%) i/s  (338.51 μs/i) -     14.950k in   5.061485s

Comparison:
              before:     2780.3 i/s
               after:     2954.2 i/s - 1.06x  faster


== Encoding ohai.json (20145 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     3.783k i/100ms
Calculating -------------------------------------
               after     37.564k (± 3.7%) i/s   (26.62 μs/i) -    189.150k in   5.044004s

Comparison:
              before:    34472.6 i/s
               after:    37564.0 i/s - 1.09x  faster

…ng rb_enc_str_coderange if the coderange is unknown.
@byroot
Copy link
Copy Markdown
Member

byroot commented Apr 19, 2026

We load the activitypub.json once, then constantly generate JSON. After the first scan, every string should have it's coderange known. I'm not sure if this holds in general though.

Probably not for all strings, but for some of them, particularly keys, it should yes. Also even for brand new strings, some other code might have calculated the coderange first, given that many string methods do, so I don't think the benchmark is too unrealistic (even though it would be nice if some misses were exercised too, but it would be tricky).

Comment on lines +822 to +828
static inline int json_str_coderange(VALUE str) {
int coderange = RB_ENC_CODERANGE(str);
if (coderange == RUBY_ENC_CODERANGE_UNKNOWN) {
coderange = rb_enc_str_coderange(str);
}
return coderange;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rb_enc_str_coderange should probably be this inline helper. I'll see about making the change upstream.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see about making the change upstream.

ruby/ruby#16771

@byroot byroot merged commit 4ef7a45 into ruby:master Apr 19, 2026
41 checks passed
byroot added a commit to byroot/ruby that referenced this pull request Apr 19, 2026
This is a generalization of the optimization done in re.c
as part of d0fbdb0.

Code that deal with coderange can benefit significantly from
avoiding that function call, assuming coderange is often already
known.

Ref: ruby/json#974
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants