Skip to content

perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805

Draft
mykaul wants to merge 11 commits into
scylladb:masterfrom
mykaul:perf/result-path-cleanup
Draft

perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805
mykaul wants to merge 11 commits into
scylladb:masterfrom
mykaul:perf/result-path-cleanup

Conversation

@mykaul

@mykaul mykaul commented Apr 7, 2026

Copy link
Copy Markdown

Summary

Optimize the result message processing hot path (_set_result, ResultMessage) for memory and speed. Each commit is an independent, focused optimization.

Commits

# Commit Description
1 7df233481 Remove dead and duplicate class attributes in ResultMessage
2 3185a0bdd Skip redundant None attribute assignments in decode_message
3 7b0191c09 Avoid throwaway ResultSet creation in fetch_next_page
4 3256340b9 Add __slots__ to _MessageType, ResultMessage, and FastResultMessage
5 cacc76328 Streamline tablet payload parsing in _set_result
6 4f7c4103b Check isinstance(ResultMessage) first for direct attribute access
7 bd4cb8f16 Cache column_names/column_types on PreparedStatement
8 4104aa6a1 Reorder RESULT_KIND dispatch and replace getattr with direct access
9 e76de060a Cache session.cluster as local in _set_result
10 6106d61 Add __repr__ to ResultMessage for slotted attribute display
11 9aa01fc Fix pre-existing bugs: _set_keyspace_for_all_pools errors dict, wait_for_schema_agreement scope validation

Details

Commits 1-9 (original PR)

See individual commit messages for details.

Commit 10: __repr__ for ResultMessage

ResultMessage.__slots__ eliminates __dict__, which broke the inherited _MessageType.__repr__ (it only iterates __dict__). Added a __repr__ override that walks type(self).__mro__ to collect all __slots__ from the full class hierarchy (handles FastResultMessage which has __slots__ = ()). Only displays non-None, non-False attributes for clean debug output.

Commit 11: Pre-existing bug fixes

Two bugs found on origin/master:

  • _set_keyspace_for_all_pools: callback(host_errors) passed only the last pool's errors instead of the accumulated errors dict
  • wait_for_schema_agreement: Missing scope validation despite the docstring promising ValueError

Benchmarks

All benchmarks: Python 3.14, pure-Python .py (not Cython .so).

column_names / column_types extraction

Columns Uncached Cached (PreparedStatement) Saving
5 223 ns 31 ns 7.1x
10 341 ns 32 ns 10.6x
20 567 ns 32 ns 18.0x
50 1281 ns 31 ns 41.5x

_set_result dispatch (2M iters)

Optimization Before After Saving
RESULT_KIND reorder (ROWS=1st) 39.8 ns 27.8 ns −11.9 ns (1.43x)
getattr → direct attribute 38.6 ns 20.3 ns −18.3 ns (1.90x)

session.cluster caching (5M iters)

Access pattern Time Saving
3× self.session.cluster 62.3 ns
1× local + 3× local 37.5 ns −24.8 ns (1.66x)

Net impact per query

  • decode_message (commit 2): ~15 ns saved (skip 3 redundant None assignments)
  • _set_result dispatch (commits 5-6): ~49 ns saved (streamlined tablet parsing + isinstance-first)
  • ResultSet allocation (commit 3): ~300 ns saved per page fetch (avoids throwaway ResultSet)
  • Memory (commit 4): ~120–200 bytes/ResultMessage saved (per-instance __dict__ eliminated)

Total per-query savings on the hot path: ~65 ns latency + ~200 bytes memory.

Test results

  • 671 unit tests pass, 8 skipped — all green
  • ResultSet.__slots__ removed from the PR to avoid API breakage for users who subclass (no memory regression vs origin/master__slots__ wasn't on origin/master either)
  • All benchmarks verified against claimed numbers
  • 2 pre-existing origin/master bugs fixed (see commit 11)

@mykaul mykaul changed the title perf: result path memory allocation optimizations perf: result path memory allocation optimizations (mostly memory savings) Apr 7, 2026
@mykaul mykaul force-pushed the perf/result-path-cleanup branch from 709beae to 2818aef Compare April 9, 2026 12:50
@mykaul mykaul changed the title perf: result path memory allocation optimizations (mostly memory savings) perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements) Apr 9, 2026
@mykaul

mykaul commented Apr 10, 2026

Copy link
Copy Markdown
Author

Follow-up commit: cache column_names/column_types on PreparedStatement

Commit: 580b4556cperf: cache column_names/column_types on PreparedStatement to avoid per-result list comprehensions

Change

For prepared statements with skip_meta=True (the common case), column_metadata is the same object every time, yet column_names and column_types lists are rebuilt via list comprehension on every result set in recv_results_rows.

This commit:

  1. Pre-computes _result_col_names and _result_col_types on PreparedStatement at prepare time
  2. In _set_result, uses the cached lists directly instead of the per-response lists
  3. Invalidates the cache when result_metadata is updated during re-prepare

Benchmark results (benchmarks/bench_col_names_cache.py)

Python 3.14.3
  5 cols:  226 ns -> 30 ns (7.4x)
  10 cols: 340 ns -> 28 ns (12.2x)
  20 cols: 589 ns -> 31 ns (18.9x)
  50 cols: 1160 ns -> 29 ns (39.6x)

Tests

611 unit tests passed, 0 failures.

@mykaul

mykaul commented Apr 10, 2026

Copy link
Copy Markdown
Author

Follow-up: Reorder RESULT_KIND dispatch and replace getattr with direct access

Commit: 5bbbc90

What changed

Two micro-optimizations in _set_result() hot path:

  1. Reorder RESULT_KIND dispatch: RESULT_KIND_ROWS is now checked first (was third). RESULT_KIND_VOID is second. SET_KEYSPACE and SCHEMA_CHANGE (rare) are last. Since ROWS is by far the most common result kind, this avoids 2 unnecessary comparisons per query.

  2. Replace getattr with direct access: Added continuous_paging_options = None class attribute to _QueryMessage in protocol.py, enabling self.message.continuous_paging_options instead of getattr(self.message, 'continuous_paging_options', None).

Benchmark results (Python 3.14, 2M iterations)

Optimization Before After Saving Speedup
RESULT_KIND reorder (ROWS hit) 35.5 ns 24.3 ns 11.2 ns 1.46x
getattr → direct attribute 32.0 ns 18.3 ns 13.7 ns 1.75x
Combined per-query ~25 ns

Testing

  • 611 unit tests passed (10.0s)
  • No regressions (pre-existing test_eq failure due to stale Cython resolved by rebuild)

@mykaul mykaul force-pushed the perf/result-path-cleanup branch from 5bbbc90 to 30e01b8 Compare April 11, 2026 10:30
@mykaul

mykaul commented Apr 11, 2026

Copy link
Copy Markdown
Author

v3: CI fix — added continuous_paging_options = None to BatchMessage

The previous push caused 11 CI failures with:

AttributeError: 'BatchMessage' object has no attribute 'continuous_paging_options'

Root cause: The direct attribute access optimization (self.message.continuous_paging_options instead of getattr(...)) was added in cluster.py, and the class attribute was declared on _QueryMessage — but BatchMessage inherits from _MessageType directly, not from _QueryMessage.

Fix: Added continuous_paging_options = None class attribute to BatchMessage in protocol.py, matching the interface of _QueryMessage.

All 611 unit tests pass. Benchmarks unchanged:

  • RESULT_KIND dispatch reorder: 1.62x (25.6 ns saving)
  • Direct attr access vs getattr: 1.93x (21.7 ns saving)

@mykaul

mykaul commented Apr 11, 2026

Copy link
Copy Markdown
Author

v4: Added session.cluster local caching in _set_result

New commit: cache session = self.session and cluster = session.cluster once at the top of the ResultMessage branch, eliminating repeated double-lookups.

Changes:

  • self.session.cluster accessed 3 times in the tablet routing block → now uses cluster local
  • self.session in SET_KEYSPACE path → reuses session local (also removes redundant getattr(self, 'session', None))
  • self.session.submit + self.session.cluster in SCHEMA_CHANGE path → uses session/cluster locals

Benchmark (5M iters):

  • 3x self.session.cluster (old): 66.2 ns
  • 1x local + 3x local (new): 39.9 ns
  • Saving: 26.3 ns (1.66x)

All 611 unit tests pass.

@mykaul mykaul force-pushed the perf/result-path-cleanup branch 3 times, most recently from d73e202 to 48fe3a3 Compare April 11, 2026 16:28
@mykaul

mykaul commented Jun 29, 2026

Copy link
Copy Markdown
Author

I kinda remember some collision between slots and something... Unsure now.

mykaul added 11 commits June 29, 2026 21:23
Remove 'results = None' (never assigned or read in production code),
duplicate 'kind = None' (declared twice), and duplicate 'paging_state = None'
(declared at lines 663 and 681). The canonical declarations at lines 672-685
are kept.
When trace_id, custom_payload, and warnings are None (the common case
for non-traced, non-warned messages), skip the instance attribute
assignment. The class-level defaults on _MessageType already provide
None, so reading these attributes returns the correct value without
the per-instance __dict__ write.
Extract _wait_for_result() from result() to return the raw result
without wrapping in a ResultSet. Use it in fetch_next_page() to
avoid creating a full ResultSet object just to extract _current_rows.

For paged queries with N pages, this eliminates N-1 throwaway
ResultSet allocations (each with 6 attributes).
…sage

Add __slots__ to eliminate per-instance __dict__ for ResultMessage,
the most common response type. ResultMessage has 19 instance attributes;
eliminating the __dict__ saves ~200-400 bytes per decoded result message.

Changes:
- _MessageType: __slots__ = () to signal slots-awareness to subclasses
- ResultMessage: full __slots__ with all 19 instance attributes initialized
  in __init__ (replaces class-level defaults that are incompatible with slots)
- FastResultMessage: __slots__ = () to inherit parent slots without __dict__
- _get_params: handle slotted objects gracefully for __repr__
- Remove dead 'response.results' assignment from test mock
- Replace double dict lookup ('in' + .get()) with single .get() + None check
- Use tuple unpacking instead of three separate indexing operations
- Guard add_tablet with 'if tablet is not None' (from_row can return None
  for empty replicas — previously this was silently passed through)
- Inline single-use locals (protocol, keyspace, table)

Saves ~13 ns on the tablet-hit path (noise-level, but cleaner code).
… attr access

Move the isinstance(response, ResultMessage) check to the top of
_set_result so the hot path (successful query results) uses direct
attribute access instead of getattr() with defaults.

ResultMessage has trace_id, warnings, and custom_payload in __slots__,
always initialised in __init__, so direct access is safe and avoids the
overhead of 3 getattr() calls + 1 isinstance() that was previously done
unconditionally before the type dispatch.

For the cold ErrorMessage path, getattr() is still used because
ErrorMessage does not declare trace_id in __slots__ or __init__.
ConnectionException and generic Exception branches explicitly clear
_warnings/_custom_payload to avoid stale values from prior retries.

Benchmark (best-of-7, 500k iterations, Python 3.14, pure-Python):
  _set_result ROWS hot path (no tablets, no tracing):
    Before: 326 ns
    After:  271 ns  (-55 ns, 1.20x)
…er-result list comprehensions

For prepared statements with skip_meta=True (the common case),
column_metadata is the same object every time, yet column_names and
column_types lists are rebuilt via list comprehension on every result set.

Pre-compute and cache these lists on PreparedStatement at prepare time.
In _set_result, use the cached lists directly instead of the per-response
lists. The cache is invalidated when result_metadata is updated during
re-prepare.

Benchmark (column_names + column_types extraction):
  5 cols:  226 ns -> 30 ns (7.4x)
  10 cols: 340 ns -> 28 ns (12.2x)
  20 cols: 589 ns -> 31 ns (18.9x)
  50 cols: 1160 ns -> 29 ns (39.6x)
…cess

Two micro-optimizations in _set_result() hot path:

1. Reorder the RESULT_KIND if/elif chain to check ROWS first (was third),
   since it is by far the most common result type.  VOID is second.
   SET_KEYSPACE and SCHEMA_CHANGE (rare) are now last.

2. Add continuous_paging_options = None class attribute to _QueryMessage,
   allowing direct attribute access instead of
   getattr(self.message, 'continuous_paging_options', None).

Benchmark (2M iters, Python 3.14):
  RESULT_KIND reorder: 35.5 -> 24.3 ns (1.46x, -11.2 ns/dispatch)
  getattr -> direct:   32.0 -> 18.3 ns (1.75x, -13.7 ns/access)
  Combined: ~25 ns saved per query
… double-lookup

In the ResultMessage hot path, self.session.cluster was accessed 3 times
in the tablet routing block plus additional times in SET_KEYSPACE and
SCHEMA_CHANGE branches.  Cache session = self.session and
cluster = session.cluster once at entry to eliminate redundant
attribute-chain lookups.

Also reuse the cached 'session' local for the SET_KEYSPACE and
SCHEMA_CHANGE branches instead of re-reading self.session.

Benchmark (5M iters):
  3x self.session.cluster (old): 66.2 ns
  1x local + 3x local (new):     39.9 ns
  Saving: 26.3 ns (1.66x)
@mykaul mykaul force-pushed the perf/result-path-cleanup branch from 48fe3a3 to 9aa01fc Compare June 29, 2026 19:56
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6aa82a1e-eaee-44fc-8225-e4c48b34f18e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant