(Improvement) reduced memory for metadata schema parsing (main improvement - Select only needed columns from system_schema.column ) by mykaul · Pull Request #745 · scylladb/python-driver

mykaul · 2026-03-14T09:10:36Z

Reduce the time and memory required to parse schema metadata when refreshing. The key changes are:

Select only needed columns from system_schema.columns — the biggest network/memory win for large schemas
Replace dict_factory with _RowView — lightweight tuple-backed Mapping that shares a single index map across all rows (5.7x memory reduction per row)
Pure __slots__ on all metadata classes — no __dict__ overhead, saving ~264 bytes per ColumnMetadata instance (~516 KB for 2000-column schemas)
Single-pass column classification in _build_table_columns — avoids repeated list-filtering passes
Replace OrderedDict with dict — Python 3.7+ guarantees insertion order

Performance results (Python 3.14, rebased on master)

Metric	_RowView (new)	dict_factory (old)	Improvement
Row creation + access	651 ns/row	886 ns/row	1.36x faster
Full pipeline (100 tables × 20 cols)	3.37 ms	4.26 ms	1.26x faster
Memory per row	48 bytes	272 bytes	5.7x reduction
Bulk memory (2000 rows)	113 KB	561 KB	80% reduction
ColumnMetadata instance	80 bytes	344 bytes	264 bytes saved

_RowView now implements the full collections.abc.Mapping protocol (keys(), values(), items(), len(), iteration).

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

Copilot

Pull request overview

This PR aims to reduce CPU time and memory usage during schema metadata refresh by avoiding per-row dict allocations and trimming schema column queries, while also shrinking metadata object overhead.

Changes:

Introduces an internal _RowView + _row_factory and routes schema query result handling through it to reduce per-row allocations.
Adds __slots__ to several metadata model classes and replaces some OrderedDict usages with plain dict to reduce memory overhead.
Narrows the system_schema.columns query in SchemaParserV3 to only the fields needed by the parser and refactors _build_table_columns to classify rows in a single pass.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

This PR optimizes schema metadata refresh by reducing per-row allocations during schema parsing and narrowing the system_schema.columns select list to only the fields needed for building table/column metadata.

Changes:

Introduces an internal lightweight row representation (_RowView + _row_factory) and uses it in schema parser result handling to reduce time/memory overhead.
Reduces the system_schema.columns query to fetch only required columns and refactors _build_table_columns to classify rows in a single pass.
Adds unit tests for _RowView and _row_factory behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
cassandra/metadata.py	Adds `_RowView`/`_row_factory`, switches schema parser row handling away from per-row dicts, tightens `system_schema.columns` query, and updates metadata classes/docstrings/slots.
tests/unit/test_metadata.py	Adds unit tests validating `_RowView` and `_row_factory` semantics (getitem/get/contains/read-only/shared index map).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai · 2026-06-29T15:40:28Z

Warning

Review limit reached

@mykaul, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 58 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 395dda60-fdc2-4838-9b17-8dda5f355cef

📥 Commits

Reviewing files that changed from the base of the PR and between 576c932 and ef455ae.

📒 Files selected for processing (2)

cassandra/metadata.py
tests/unit/test_metadata.py

📝 Walkthrough

Walkthrough

The change introduces _RowView, a read-only Mapping backed by a shared name-to-index map over a row tuple, and _row_factory to construct these views. _SchemaParser._handle_results and get_column_from_system_local are switched from dict_factory to _row_factory. SchemaParserV3._SELECT_COLUMNS is narrowed to an explicit column projection, and _build_table_columns is refactored to a single-pass partition/clustering/other classification. SchemaParserV4 and SchemaParserDSE68 keyspace builders are updated to use row.get instead of mutation. TableMetadata, MaterializedViewMetadata column/trigger fields are changed from OrderedDict to plain dict.

Possibly related issues

Tracking: General (non-vector) performance improvement PRs #747: Directly related — both target schema parsing optimizations in cassandra/metadata.py, including row handling and parser refactors.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is related to the main change, but it is wordy and less concise than ideal.
Description check	✅ Passed	The description follows the template, summarizes the optimization work, and includes the checklist and performance rationale.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (2)

cassandra/metadata.py (2)
61-61: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Sort __slots__ to satisfy Ruff RUF023.
Proposed fix
-    __slots__ = ("_row", "_index_map")
+    __slots__ = ("_index_map", "_row")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cassandra/metadata.py` at line 61, The __slots__ definition in the metadata
class needs to be sorted to satisfy Ruff RUF023. Update the __slots__ tuple in
the class that defines _row and _index_map so the slot names are in the expected
sorted order and keep the declaration consistent with the rest of the file.
Source: Linters/SAST tools

56-58: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Drop the custom values()/items() generators Mapping already provides reusable view objects here; these overrides turn them into one-shot iterators and lose the usual size/reuse semantics. cassandra/metadata.py:82-85
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cassandra/metadata.py` around lines 56 - 58, The custom values() and items()
generators in the Mapping implementation should be removed so the class can use
the default Mapping view behavior. Update the metadata mapping class in
cassandra/metadata.py by dropping the overridden values() and items() methods,
and rely on the inherited reusable view objects from collections.abc.Mapping
instead.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cassandra/metadata.py`:
- Line 61: The __slots__ definition in the metadata class needs to be sorted to
satisfy Ruff RUF023. Update the __slots__ tuple in the class that defines _row
and _index_map so the slot names are in the expected sorted order and keep the
declaration consistent with the rest of the file.
- Around line 56-58: The custom values() and items() generators in the Mapping
implementation should be removed so the class can use the default Mapping view
behavior. Update the metadata mapping class in cassandra/metadata.py by dropping
the overridden values() and items() methods, and rely on the inherited reusable
view objects from collections.abc.Mapping instead.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: db48b541-ca15-4faa-92a1-9a2e6fd2a764

📥 Commits

Reviewing files that changed from the base of the PR and between c1bfd54 and 576c932.

📒 Files selected for processing (2)

cassandra/metadata.py
tests/unit/test_metadata.py

Introduce _RowView, a __slots__-based read-only row wrapper that stores data as tuples with a shared column-name-to-index map, and _row_factory that creates these views. _RowView inherits from collections.abc.Mapping, providing a complete dict-like read interface. This eliminates per-row dict allocation during schema parsing. All rows from the same result set share a single index map object.

Python 3.7+ guarantees dict preserves insertion order, making OrderedDict unnecessary. Replace OrderedDict() with {} in TableMetadata.columns, TableMetadata.triggers, and MaterializedViewMetadata.columns. Remove the now-unused OrderedDict import.

….columns Replace SELECT * with an explicit column list for the system_schema.columns query in SchemaParserV3 (inherited by V4). Only the 7 columns actually consumed by the parser are fetched: keyspace_name, table_name, column_name, clustering_order, kind, position, type. This reduces network transfer and deserialization overhead during schema refresh.

Replace dict_factory in _SchemaParser._handle_results and get_column_from_system_local with _row_factory, eliminating per-row dict allocation during schema parsing. Also refactor SchemaParserV4._build_keyspace_metadata_internal to read from the row without mutating it, since _RowView is read-only. Note: V22-only dict_factory call sites are left unchanged as they do not affect the V3/V4 code path (V3 and V4 fully override _query_all).

Replace three list comprehension passes over col_rows with a single classification loop that sorts columns into partition, clustering, and other buckets. Also use in-place sort() instead of sorted() and reuse the already-built column_meta instead of a redundant dict lookup.

Cover __getitem__, get(), __contains__, __repr__, shared index map, read-only enforcement, empty input, single-column, and multi-row scenarios.

mykaul marked this pull request as draft March 14, 2026 09:10

mykaul changed the title ~~(Improvement) metadata schema parsing~~ (Improvement) faster metadata schema parsing Mar 14, 2026

mykaul requested a review from Copilot March 14, 2026 09:11

Copilot started reviewing on behalf of mykaul March 14, 2026 09:11 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Comment thread cassandra/metadata.py Outdated

Comment thread cassandra/metadata.py

Comment thread cassandra/metadata.py Outdated

This was referenced Mar 14, 2026

Tracking: Vector search (VectorType) performance improvement PRs #746

Open

Tracking: General (non-vector) performance improvement PRs #747

Open

mykaul force-pushed the improvement/metadata-schema-parsing branch from 3a7f4bc to 4f9e4f5 Compare March 14, 2026 21:42

mykaul requested a review from Copilot March 15, 2026 15:40

Copilot started reviewing on behalf of mykaul March 15, 2026 15:41 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

mykaul self-assigned this Mar 16, 2026

mykaul force-pushed the improvement/metadata-schema-parsing branch 2 times, most recently from 069ce0f to eda1086 Compare April 3, 2026 15:39

mykaul changed the title ~~(Improvement) faster metadata schema parsing~~ (Improvement) reduced memory for metadata schema parsing Apr 3, 2026

mykaul changed the title ~~(Improvement) reduced memory for metadata schema parsing~~ (Improvement) reduced memory for metadata schema parsing (main improvement - Select only needed columns from system_schema.column ) Apr 3, 2026

Lorak-mmk force-pushed the master branch from f2a9e87 to 763af09 Compare June 15, 2026 10:57

mykaul force-pushed the improvement/metadata-schema-parsing branch from 8b0a29f to 576c932 Compare June 29, 2026 15:42

mykaul marked this pull request as ready for review June 29, 2026 17:35

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

mykaul added 6 commits June 29, 2026 23:21

tests: add _RowView and _row_factory unit tests

ef455ae

Cover __getitem__, get(), __contains__, __repr__, shared index map, read-only enforcement, empty input, single-column, and multi-row scenarios.

mykaul force-pushed the improvement/metadata-schema-parsing branch from 576c932 to ef455ae Compare June 29, 2026 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Improvement) reduced memory for metadata schema parsing (main improvement - Select only needed columns from system_schema.column )#745

(Improvement) reduced memory for metadata schema parsing (main improvement - Select only needed columns from system_schema.column )#745
mykaul wants to merge 6 commits into
scylladb:masterfrom
mykaul:improvement/metadata-schema-parsing

mykaul commented Mar 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading

Review limit reached

Walkthrough

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mykaul commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance results (Python 3.14, rebased on master)

Pre-review checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mykaul commented Mar 14, 2026 •

edited

Loading

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading