[generator.c] Use RB_ENC_CODERANGE to check the cached coderange before calling rb_enc_str_coderange#974
Conversation
…ng rb_enc_str_coderange if the coderange is unknown.
Probably not for all strings, but for some of them, particularly keys, it should yes. Also even for brand new strings, some other code might have calculated the coderange first, given that many string methods do, so I don't think the benchmark is too unrealistic (even though it would be nice if some misses were exercised too, but it would be tricky). |
| static inline int json_str_coderange(VALUE str) { | ||
| int coderange = RB_ENC_CODERANGE(str); | ||
| if (coderange == RUBY_ENC_CODERANGE_UNKNOWN) { | ||
| coderange = rb_enc_str_coderange(str); | ||
| } | ||
| return coderange; | ||
| } |
There was a problem hiding this comment.
rb_enc_str_coderange should probably be this inline helper. I'll see about making the change upstream.
There was a problem hiding this comment.
I'll see about making the change upstream.
This is a generalization of the optimization done in re.c as part of d0fbdb0. Code that deal with coderange can benefit significantly from avoiding that function call, assuming coderange is often already known. Ref: ruby/json#974
This PR introduces a function
json_str_coderangeto first check the cached coderange viaRB_ENC_CODERANGEbefore callingrb_enc_str_coderange.As per the
samplyprofiler when profilingactivitypub.json, we can see we're spending 3.1% of the time inrb_enc_str_coderange.The code is spending almost all of the time in the early exit path + returning to the caller:
It could be possible this is just an artifact of the benchmark. We load the
activitypub.jsononce, then constantly generate JSON. After the first scan, every string should have it's coderange known. I'm not sure if this holds in general though.I did find pretty much the same code in re.c - str_coderange.
Benchmarks
Benchmarks run on an M1 Macbook Air.