perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements) by mykaul · Pull Request #805 · scylladb/python-driver

mykaul · 2026-04-07T06:16:44Z

Summary

Optimize the result message processing hot path (_set_result, ResultMessage) for memory and speed. Each commit is an independent, focused optimization.

Commits

#	Commit	Description
1	`7df233481`	Remove dead and duplicate class attributes in `ResultMessage`
2	`3185a0bdd`	Skip redundant `None` attribute assignments in `decode_message`
3	`7b0191c09`	Avoid throwaway `ResultSet` creation in `fetch_next_page`
4	`3256340b9`	Add `__slots__` to `_MessageType`, `ResultMessage`, and `FastResultMessage`
5	`cacc76328`	Streamline tablet payload parsing in `_set_result`
6	`4f7c4103b`	Check `isinstance(ResultMessage)` first for direct attribute access
7	`bd4cb8f16`	Cache `column_names`/`column_types` on `PreparedStatement`
8	`4104aa6a1`	Reorder `RESULT_KIND` dispatch and replace `getattr` with direct access
9	`e76de060a`	Cache `session.cluster` as local in `_set_result`
10	`6106d61`	Add `__repr__` to `ResultMessage` for slotted attribute display
11	`9aa01fc`	Fix pre-existing bugs: `_set_keyspace_for_all_pools` errors dict, `wait_for_schema_agreement` scope validation

Details

Commits 1-9 (original PR)

See individual commit messages for details.

Commit 10: `repr` for `ResultMessage`

ResultMessage.__slots__ eliminates __dict__, which broke the inherited _MessageType.__repr__ (it only iterates __dict__). Added a __repr__ override that walks type(self).__mro__ to collect all __slots__ from the full class hierarchy (handles FastResultMessage which has __slots__ = ()). Only displays non-None, non-False attributes for clean debug output.

Commit 11: Pre-existing bug fixes

Two bugs found on origin/master:

_set_keyspace_for_all_pools: callback(host_errors) passed only the last pool's errors instead of the accumulated errors dict
wait_for_schema_agreement: Missing scope validation despite the docstring promising ValueError

Benchmarks

All benchmarks: Python 3.14, pure-Python .py (not Cython .so).

column_names / column_types extraction

Columns	Uncached	Cached (PreparedStatement)	Saving
5	223 ns	31 ns	7.1x
10	341 ns	32 ns	10.6x
20	567 ns	32 ns	18.0x
50	1281 ns	31 ns	41.5x

_set_result dispatch (2M iters)

Optimization	Before	After	Saving
RESULT_KIND reorder (ROWS=1st)	39.8 ns	27.8 ns	−11.9 ns (1.43x)
getattr → direct attribute	38.6 ns	20.3 ns	−18.3 ns (1.90x)

session.cluster caching (5M iters)

Access pattern	Time	Saving
3× self.session.cluster	62.3 ns	—
1× local + 3× local	37.5 ns	−24.8 ns (1.66x)

Net impact per query

decode_message (commit 2): ~15 ns saved (skip 3 redundant None assignments)
_set_result dispatch (commits 5-6): ~49 ns saved (streamlined tablet parsing + isinstance-first)
ResultSet allocation (commit 3): ~300 ns saved per page fetch (avoids throwaway ResultSet)
Memory (commit 4): ~120–200 bytes/ResultMessage saved (per-instance __dict__ eliminated)

Total per-query savings on the hot path: ~65 ns latency + ~200 bytes memory.

Test results

671 unit tests pass, 8 skipped — all green
ResultSet.__slots__ removed from the PR to avoid API breakage for users who subclass (no memory regression vs origin/master — __slots__ wasn't on origin/master either)
All benchmarks verified against claimed numbers
2 pre-existing origin/master bugs fixed (see commit 11)

mykaul · 2026-04-10T21:02:37Z

Follow-up commit: cache column_names/column_types on PreparedStatement

Commit: 580b4556c — perf: cache column_names/column_types on PreparedStatement to avoid per-result list comprehensions

Change

For prepared statements with skip_meta=True (the common case), column_metadata is the same object every time, yet column_names and column_types lists are rebuilt via list comprehension on every result set in recv_results_rows.

This commit:

Pre-computes _result_col_names and _result_col_types on PreparedStatement at prepare time
In _set_result, uses the cached lists directly instead of the per-response lists
Invalidates the cache when result_metadata is updated during re-prepare

Benchmark results (`benchmarks/bench_col_names_cache.py`)

Python 3.14.3
  5 cols:  226 ns -> 30 ns (7.4x)
  10 cols: 340 ns -> 28 ns (12.2x)
  20 cols: 589 ns -> 31 ns (18.9x)
  50 cols: 1160 ns -> 29 ns (39.6x)

Tests

611 unit tests passed, 0 failures.

mykaul · 2026-04-10T21:28:41Z

Follow-up: Reorder RESULT_KIND dispatch and replace getattr with direct access

Commit: 5bbbc90

What changed

Two micro-optimizations in _set_result() hot path:

Reorder RESULT_KIND dispatch: RESULT_KIND_ROWS is now checked first (was third). RESULT_KIND_VOID is second. SET_KEYSPACE and SCHEMA_CHANGE (rare) are last. Since ROWS is by far the most common result kind, this avoids 2 unnecessary comparisons per query.
Replace getattr with direct access: Added continuous_paging_options = None class attribute to _QueryMessage in protocol.py, enabling self.message.continuous_paging_options instead of getattr(self.message, 'continuous_paging_options', None).

Benchmark results (Python 3.14, 2M iterations)

Optimization	Before	After	Saving	Speedup
RESULT_KIND reorder (ROWS hit)	35.5 ns	24.3 ns	11.2 ns	1.46x
getattr → direct attribute	32.0 ns	18.3 ns	13.7 ns	1.75x
Combined per-query			~25 ns

Testing

611 unit tests passed (10.0s)
No regressions (pre-existing test_eq failure due to stale Cython resolved by rebuild)

mykaul · 2026-04-11T10:30:49Z

v3: CI fix — added continuous_paging_options = None to BatchMessage

The previous push caused 11 CI failures with:

AttributeError: 'BatchMessage' object has no attribute 'continuous_paging_options'

Root cause: The direct attribute access optimization (self.message.continuous_paging_options instead of getattr(...)) was added in cluster.py, and the class attribute was declared on _QueryMessage — but BatchMessage inherits from _MessageType directly, not from _QueryMessage.

Fix: Added continuous_paging_options = None class attribute to BatchMessage in protocol.py, matching the interface of _QueryMessage.

All 611 unit tests pass. Benchmarks unchanged:

RESULT_KIND dispatch reorder: 1.62x (25.6 ns saving)
Direct attr access vs getattr: 1.93x (21.7 ns saving)

mykaul · 2026-04-11T10:44:13Z

v4: Added session.cluster local caching in _set_result

New commit: cache session = self.session and cluster = session.cluster once at the top of the ResultMessage branch, eliminating repeated double-lookups.

Changes:

self.session.cluster accessed 3 times in the tablet routing block → now uses cluster local
self.session in SET_KEYSPACE path → reuses session local (also removes redundant getattr(self, 'session', None))
self.session.submit + self.session.cluster in SCHEMA_CHANGE path → uses session/cluster locals

Benchmark (5M iters):

3x self.session.cluster (old): 66.2 ns
1x local + 3x local (new): 39.9 ns
Saving: 26.3 ns (1.66x)

All 611 unit tests pass.

mykaul · 2026-06-29T18:21:54Z

I kinda remember some collision between slots and something... Unsure now.

Remove 'results = None' (never assigned or read in production code), duplicate 'kind = None' (declared twice), and duplicate 'paging_state = None' (declared at lines 663 and 681). The canonical declarations at lines 672-685 are kept.

When trace_id, custom_payload, and warnings are None (the common case for non-traced, non-warned messages), skip the instance attribute assignment. The class-level defaults on _MessageType already provide None, so reading these attributes returns the correct value without the per-instance __dict__ write.

Extract _wait_for_result() from result() to return the raw result without wrapping in a ResultSet. Use it in fetch_next_page() to avoid creating a full ResultSet object just to extract _current_rows. For paged queries with N pages, this eliminates N-1 throwaway ResultSet allocations (each with 6 attributes).

…sage Add __slots__ to eliminate per-instance __dict__ for ResultMessage, the most common response type. ResultMessage has 19 instance attributes; eliminating the __dict__ saves ~200-400 bytes per decoded result message. Changes: - _MessageType: __slots__ = () to signal slots-awareness to subclasses - ResultMessage: full __slots__ with all 19 instance attributes initialized in __init__ (replaces class-level defaults that are incompatible with slots) - FastResultMessage: __slots__ = () to inherit parent slots without __dict__ - _get_params: handle slotted objects gracefully for __repr__ - Remove dead 'response.results' assignment from test mock

- Replace double dict lookup ('in' + .get()) with single .get() + None check - Use tuple unpacking instead of three separate indexing operations - Guard add_tablet with 'if tablet is not None' (from_row can return None for empty replicas — previously this was silently passed through) - Inline single-use locals (protocol, keyspace, table) Saves ~13 ns on the tablet-hit path (noise-level, but cleaner code).

… attr access Move the isinstance(response, ResultMessage) check to the top of _set_result so the hot path (successful query results) uses direct attribute access instead of getattr() with defaults. ResultMessage has trace_id, warnings, and custom_payload in __slots__, always initialised in __init__, so direct access is safe and avoids the overhead of 3 getattr() calls + 1 isinstance() that was previously done unconditionally before the type dispatch. For the cold ErrorMessage path, getattr() is still used because ErrorMessage does not declare trace_id in __slots__ or __init__. ConnectionException and generic Exception branches explicitly clear _warnings/_custom_payload to avoid stale values from prior retries. Benchmark (best-of-7, 500k iterations, Python 3.14, pure-Python): _set_result ROWS hot path (no tablets, no tracing): Before: 326 ns After: 271 ns (-55 ns, 1.20x)

…er-result list comprehensions For prepared statements with skip_meta=True (the common case), column_metadata is the same object every time, yet column_names and column_types lists are rebuilt via list comprehension on every result set. Pre-compute and cache these lists on PreparedStatement at prepare time. In _set_result, use the cached lists directly instead of the per-response lists. The cache is invalidated when result_metadata is updated during re-prepare. Benchmark (column_names + column_types extraction): 5 cols: 226 ns -> 30 ns (7.4x) 10 cols: 340 ns -> 28 ns (12.2x) 20 cols: 589 ns -> 31 ns (18.9x) 50 cols: 1160 ns -> 29 ns (39.6x)

…cess Two micro-optimizations in _set_result() hot path: 1. Reorder the RESULT_KIND if/elif chain to check ROWS first (was third), since it is by far the most common result type. VOID is second. SET_KEYSPACE and SCHEMA_CHANGE (rare) are now last. 2. Add continuous_paging_options = None class attribute to _QueryMessage, allowing direct attribute access instead of getattr(self.message, 'continuous_paging_options', None). Benchmark (2M iters, Python 3.14): RESULT_KIND reorder: 35.5 -> 24.3 ns (1.46x, -11.2 ns/dispatch) getattr -> direct: 32.0 -> 18.3 ns (1.75x, -13.7 ns/access) Combined: ~25 ns saved per query

… double-lookup In the ResultMessage hot path, self.session.cluster was accessed 3 times in the tablet routing block plus additional times in SET_KEYSPACE and SCHEMA_CHANGE branches. Cache session = self.session and cluster = session.cluster once at entry to eliminate redundant attribute-chain lookups. Also reuse the cached 'session' local for the SET_KEYSPACE and SCHEMA_CHANGE branches instead of re-reading self.session. Benchmark (5M iters): 3x self.session.cluster (old): 66.2 ns 1x local + 3x local (new): 39.9 ns Saving: 26.3 ns (1.66x)

…lidation to wait_for_schema_agreement

coderabbitai · 2026-06-29T19:56:07Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6aa82a1e-eaee-44fc-8225-e4c48b34f18e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

mykaul changed the title ~~perf: result path memory allocation optimizations~~ perf: result path memory allocation optimizations (mostly memory savings) Apr 7, 2026

mykaul force-pushed the perf/result-path-cleanup branch from 709beae to 2818aef Compare April 9, 2026 12:50

mykaul changed the title ~~perf: result path memory allocation optimizations (mostly memory savings)~~ perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements) Apr 9, 2026

mykaul force-pushed the perf/result-path-cleanup branch from 5bbbc90 to 30e01b8 Compare April 11, 2026 10:30

mykaul force-pushed the perf/result-path-cleanup branch 3 times, most recently from d73e202 to 48fe3a3 Compare April 11, 2026 16:28

Lorak-mmk force-pushed the master branch from f2a9e87 to 763af09 Compare June 15, 2026 10:57

mykaul added 11 commits June 29, 2026 21:23

perf: add __repr__ to ResultMessage for slotted attribute display

6106d61

fix: pre-existing bug in _set_keyspace_for_all_pools and add scope va…

9aa01fc

…lidation to wait_for_schema_agreement

mykaul force-pushed the perf/result-path-cleanup branch from 48fe3a3 to 9aa01fc Compare June 29, 2026 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805

perf: result path memory allocation optimizations (mostly memory savings, 10's of ns improvements)#805
mykaul wants to merge 11 commits into
scylladb:masterfrom
mykaul:perf/result-path-cleanup

mykaul commented Apr 7, 2026 •

edited

Loading

Uh oh!

mykaul commented Apr 10, 2026

Uh oh!

mykaul commented Apr 10, 2026

Uh oh!

mykaul commented Apr 11, 2026

Uh oh!

mykaul commented Apr 11, 2026

Uh oh!

mykaul commented Jun 29, 2026

Uh oh!

coderabbitai Bot commented Jun 29, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mykaul commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits

Details

Commits 1-9 (original PR)

Commit 10: __repr__ for ResultMessage

Commit 11: Pre-existing bug fixes

Benchmarks

column_names / column_types extraction

_set_result dispatch (2M iters)

session.cluster caching (5M iters)

Net impact per query

Test results

Uh oh!

mykaul commented Apr 10, 2026

Follow-up commit: cache column_names/column_types on PreparedStatement

Change

Benchmark results (benchmarks/bench_col_names_cache.py)

Tests

Uh oh!

mykaul commented Apr 10, 2026

Follow-up: Reorder RESULT_KIND dispatch and replace getattr with direct access

What changed

Benchmark results (Python 3.14, 2M iterations)

Testing

Uh oh!

mykaul commented Apr 11, 2026

Uh oh!

mykaul commented Apr 11, 2026

Uh oh!

mykaul commented Jun 29, 2026

Uh oh!

coderabbitai Bot commented Jun 29, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Apr 7, 2026 •

edited

Loading

Commit 10: `repr` for `ResultMessage`

Benchmark results (`benchmarks/bench_col_names_cache.py`)