perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805
perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805mykaul wants to merge 11 commits into
Conversation
709beae to
2818aef
Compare
Follow-up commit: cache column_names/column_types on PreparedStatementCommit: ChangeFor prepared statements with This commit:
Benchmark results (
|
Follow-up: Reorder RESULT_KIND dispatch and replace getattr with direct accessCommit: 5bbbc90 What changedTwo micro-optimizations in
Benchmark results (Python 3.14, 2M iterations)
Testing
|
5bbbc90 to
30e01b8
Compare
|
v3: CI fix — added The previous push caused 11 CI failures with: Root cause: The direct attribute access optimization ( Fix: Added All 611 unit tests pass. Benchmarks unchanged:
|
|
v4: Added session.cluster local caching in New commit: cache Changes:
Benchmark (5M iters):
All 611 unit tests pass. |
d73e202 to
48fe3a3
Compare
|
I kinda remember some collision between slots and something... Unsure now. |
Remove 'results = None' (never assigned or read in production code), duplicate 'kind = None' (declared twice), and duplicate 'paging_state = None' (declared at lines 663 and 681). The canonical declarations at lines 672-685 are kept.
When trace_id, custom_payload, and warnings are None (the common case for non-traced, non-warned messages), skip the instance attribute assignment. The class-level defaults on _MessageType already provide None, so reading these attributes returns the correct value without the per-instance __dict__ write.
Extract _wait_for_result() from result() to return the raw result without wrapping in a ResultSet. Use it in fetch_next_page() to avoid creating a full ResultSet object just to extract _current_rows. For paged queries with N pages, this eliminates N-1 throwaway ResultSet allocations (each with 6 attributes).
…sage Add __slots__ to eliminate per-instance __dict__ for ResultMessage, the most common response type. ResultMessage has 19 instance attributes; eliminating the __dict__ saves ~200-400 bytes per decoded result message. Changes: - _MessageType: __slots__ = () to signal slots-awareness to subclasses - ResultMessage: full __slots__ with all 19 instance attributes initialized in __init__ (replaces class-level defaults that are incompatible with slots) - FastResultMessage: __slots__ = () to inherit parent slots without __dict__ - _get_params: handle slotted objects gracefully for __repr__ - Remove dead 'response.results' assignment from test mock
- Replace double dict lookup ('in' + .get()) with single .get() + None check
- Use tuple unpacking instead of three separate indexing operations
- Guard add_tablet with 'if tablet is not None' (from_row can return None
for empty replicas — previously this was silently passed through)
- Inline single-use locals (protocol, keyspace, table)
Saves ~13 ns on the tablet-hit path (noise-level, but cleaner code).
… attr access
Move the isinstance(response, ResultMessage) check to the top of
_set_result so the hot path (successful query results) uses direct
attribute access instead of getattr() with defaults.
ResultMessage has trace_id, warnings, and custom_payload in __slots__,
always initialised in __init__, so direct access is safe and avoids the
overhead of 3 getattr() calls + 1 isinstance() that was previously done
unconditionally before the type dispatch.
For the cold ErrorMessage path, getattr() is still used because
ErrorMessage does not declare trace_id in __slots__ or __init__.
ConnectionException and generic Exception branches explicitly clear
_warnings/_custom_payload to avoid stale values from prior retries.
Benchmark (best-of-7, 500k iterations, Python 3.14, pure-Python):
_set_result ROWS hot path (no tablets, no tracing):
Before: 326 ns
After: 271 ns (-55 ns, 1.20x)
…er-result list comprehensions For prepared statements with skip_meta=True (the common case), column_metadata is the same object every time, yet column_names and column_types lists are rebuilt via list comprehension on every result set. Pre-compute and cache these lists on PreparedStatement at prepare time. In _set_result, use the cached lists directly instead of the per-response lists. The cache is invalidated when result_metadata is updated during re-prepare. Benchmark (column_names + column_types extraction): 5 cols: 226 ns -> 30 ns (7.4x) 10 cols: 340 ns -> 28 ns (12.2x) 20 cols: 589 ns -> 31 ns (18.9x) 50 cols: 1160 ns -> 29 ns (39.6x)
…cess Two micro-optimizations in _set_result() hot path: 1. Reorder the RESULT_KIND if/elif chain to check ROWS first (was third), since it is by far the most common result type. VOID is second. SET_KEYSPACE and SCHEMA_CHANGE (rare) are now last. 2. Add continuous_paging_options = None class attribute to _QueryMessage, allowing direct attribute access instead of getattr(self.message, 'continuous_paging_options', None). Benchmark (2M iters, Python 3.14): RESULT_KIND reorder: 35.5 -> 24.3 ns (1.46x, -11.2 ns/dispatch) getattr -> direct: 32.0 -> 18.3 ns (1.75x, -13.7 ns/access) Combined: ~25 ns saved per query
… double-lookup In the ResultMessage hot path, self.session.cluster was accessed 3 times in the tablet routing block plus additional times in SET_KEYSPACE and SCHEMA_CHANGE branches. Cache session = self.session and cluster = session.cluster once at entry to eliminate redundant attribute-chain lookups. Also reuse the cached 'session' local for the SET_KEYSPACE and SCHEMA_CHANGE branches instead of re-reading self.session. Benchmark (5M iters): 3x self.session.cluster (old): 66.2 ns 1x local + 3x local (new): 39.9 ns Saving: 26.3 ns (1.66x)
…lidation to wait_for_schema_agreement
48fe3a3 to
9aa01fc
Compare
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Optimize the result message processing hot path (
_set_result,ResultMessage) for memory and speed. Each commit is an independent, focused optimization.Commits
7df233481ResultMessage3185a0bddNoneattribute assignments indecode_message7b0191c09ResultSetcreation infetch_next_page3256340b9__slots__to_MessageType,ResultMessage, andFastResultMessagecacc76328_set_result4f7c4103bisinstance(ResultMessage)first for direct attribute accessbd4cb8f16column_names/column_typesonPreparedStatement4104aa6a1RESULT_KINDdispatch and replacegetattrwith direct accesse76de060asession.clusteras local in_set_result6106d61__repr__toResultMessagefor slotted attribute display9aa01fc_set_keyspace_for_all_poolserrors dict,wait_for_schema_agreementscope validationDetails
Commits 1-9 (original PR)
See individual commit messages for details.
Commit 10:
__repr__forResultMessageResultMessage.__slots__eliminates__dict__, which broke the inherited_MessageType.__repr__(it only iterates__dict__). Added a__repr__override that walkstype(self).__mro__to collect all__slots__from the full class hierarchy (handlesFastResultMessagewhich has__slots__ = ()). Only displays non-None, non-False attributes for clean debug output.Commit 11: Pre-existing bug fixes
Two bugs found on
origin/master:_set_keyspace_for_all_pools:callback(host_errors)passed only the last pool's errors instead of the accumulatederrorsdictwait_for_schema_agreement: Missingscopevalidation despite the docstring promisingValueErrorBenchmarks
All benchmarks: Python 3.14, pure-Python
.py(not Cython.so).column_names / column_types extraction
_set_result dispatch (2M iters)
session.cluster caching (5M iters)
Net impact per query
__dict__eliminated)Total per-query savings on the hot path: ~65 ns latency + ~200 bytes memory.
Test results
ResultSet.__slots__removed from the PR to avoid API breakage for users who subclass (no memory regression vsorigin/master—__slots__wasn't onorigin/mastereither)origin/masterbugs fixed (see commit 11)