Support BigQuery nested STRUCT fields in anomaly tests by tlangton3 · Pull Request #1012 · elementary-data/dbt-data-reliability

tlangton3 · 2026-05-22T10:35:56Z

Allows column_anomalies and dimension_anomalies to reference nested STRUCT leaves on BigQuery (e.g. user.address.city) instead of only top-level columns.

A single column-discovery wrapper segment-quotes nested references (`a`.`b`.`c`) and projects the monitored column with a dot-free CTE alias so the path survives into downstream aggregates. Non-nested columns and non-BigQuery adapters are byte-equivalent to today's behaviour. REPEATED ancestors are out of scope (would require UNNEST). test_all_columns_anomalies is unchanged — users opt in by passing column_name=user.address.city explicitly to avoid ballooning the test surface on wide STRUCT schemas.

What changes

get_column_obj_and_monitors flattens BigQuery STRUCT columns via BigQueryColumn.flatten() and wraps each discovered column with a dict carrying .name (dotted display form), .quoted (segment-quoted SQL ref), and .safe_alias (dot-free identifier). Top-level STRUCTs are kept alongside their leaves so existing column_name=user behaviour is preserved.
column_monitoring_query projects the monitored column as <quoted> as <safe_alias> and references the alias in metric aggregates. select_dimensions_columns applies the same pattern to nested dimensions.
dimension_monitoring_query segment-quotes dimension expressions before they are concatenated into dimension_value.

Why two representations

BigQueryColumn.quoted wraps the whole string in one set of backticks, so a flattened nested column's .quoted is `user.address.city` — which BigQuery treats as a single column literally named user.address.city. Even with correct segment-quoting, projecting select user.address.city from t into a CTE without an alias names the resulting column city, losing the path. The wrapper exposes both .quoted (segment-quoted source ref) and .safe_alias (dot-free CTE alias) so the projection-alias pattern composes cleanly and downstream macros stay nesting-agnostic.

Testing

Local validation via dbt parse and a run-operation harness confirmed every SQL fingerprint:

Segment-quoting: user.address.city → `user`.`address`.`city`
Projection: select `user`.`address`.`city` as user__address__city from t
Downstream aggregate: coalesce(sum(case when user__address__city is null then 1 else 0 end), 0) as null_count
Stored column_name: user.address.city (dotted display preserved for alerts)
get_column_data_type BigQuery dispatch works on the wrapped dict via subscript access
Non-nested columns / non-BigQuery: byte-equivalent compiled SQL to current behaviour

End-to-end execution against BigQuery to follow.

Summary by CodeRabbit

Bug Fixes
- Better handling of nested/struct fields in BigQuery so monitors correctly detect and report on dotted/nested column leaf values.
- Safer column and dimension aliasing to avoid invalid identifiers in monitoring outputs.
Refactor
- Reworked monitor selection and dimension concatenation logic for more reliable results with structured data types and complex naming.

Allows column_anomalies and dimension_anomalies to reference nested STRUCT leaves on BigQuery (e.g. user.address.city) instead of only top-level columns. A single column-discovery wrapper segment-quotes nested references (`a`.`b`.`c`) and projects the monitored column with a dot-free CTE alias so the path survives into downstream aggregates. Non-nested columns and non-BigQuery adapters are byte-equivalent to today's behaviour. REPEATED ancestors are out of scope (would require UNNEST). test_all_columns_anomalies is unchanged - users opt in by passing column_name=user.address.city explicitly to avoid ballooning the test surface on wide STRUCT schemas.

coderabbitai · 2026-05-22T10:36:05Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fef8e3dc-7d12-4c67-a11f-6a460ae68a44

📥 Commits

Reviewing files that changed from the base of the PR and between d45a775 and 8c7b36e.

📒 Files selected for processing (2)

macros/edr/data_monitoring/data_monitors_configuration/get_column_monitors.sql
macros/edr/data_monitoring/monitors_query/column_monitoring_query.sql

📝 Walkthrough

Walkthrough

Adds BigQuery-safe segment quoting, dot-free aliasing, and struct-wrapping helpers, then applies them across column monitor selection, the column monitoring query (projections and metric expressions), and dimension concatenation/bucketing logic.

Changes

BigQuery Nested Field Support via Safe Aliasing

Layer / File(s)	Summary
Helper macros for safe BigQuery column handling `macros/edr/data_monitoring/monitors_query/column_monitoring_query.sql`	Adds `bq_segment_quote`, `bq_safe_alias`, `wrap_column_for_struct_support`, plus `bq_safe_leaf_names` and `_bq_walk_collect` for STRUCT leaf discovery; overhauls `select_dimensions_columns` to segment-quote sources and generate dot-free alias suffixes for nested fields.
Column monitoring query integration `macros/edr/data_monitoring/monitors_query/column_monitoring_query.sql`	`column_monitoring_query` now projects monitored columns using `column_obj.safe_alias` and uses that alias for metric expressions; `prefixed_dimensions` builds `"dimension_*"` aliases with `bq_safe_alias()`.
Column monitor configuration wrapping `macros/edr/data_monitoring/data_monitors_configuration/get_column_monitors.sql`	`get_column_obj_and_monitors` and `get_all_column_obj_and_monitors` wrap `column_obj` via `wrap_column_for_struct_support` before deriving data types and selecting monitors; returned column values are the wrapped objects.
Dimension monitoring query updates `macros/edr/data_monitoring/monitors_query/dimension_monitoring_query.sql`	Builds concatenated dimension expressions using `bq_segment_quote` per segment; enforces `having sum(metric_value) > 0` for `training_set_dimensions`; adjusts `dimensions_buckets` join and row-count hydration to the new structure and removes several inline comments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through dotted fields with glee,
Quoted each segment so queries run free,
Dots turned to underscores, tidy and bright,
Wrapped structs now yield metrics just right,
A small rabbit cheer for safer SQL tonight!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Support BigQuery nested STRUCT fields in anomaly tests' clearly and directly summarizes the main change: enabling nested STRUCT field support in anomaly tests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-22T10:36:06Z

👋 @tlangton3
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

tlangton3 · 2026-05-22T13:50:48Z

End-to-end validated against a real BigQuery dataset.

column_anomalies on a three-level nested STRUCT field (<parent>.<intermediate>.<leaf>) compiles with segment-quoted SQL, executes against real data, and writes a row to data_monitoring_metrics with the dotted column_name preserved.
Discovery layer correctly flattens parent STRUCTs via BigQueryColumn.flatten(); the wrapper exposes .name (dotted display), .quoted (segment-quoted SQL ref), and .safe_alias (dot-free CTE alias) as designed.
Ran the new nested test alongside 10+ existing non-nested column_anomalies tests in a single dbt test invocation — all 15 PASS with no interference, confirming the projection-alias pattern is backwards-compatible.
Re-ran with --defer --favor-state against a prod manifest so the non-nested tests had data and history; metrics for nested and non-nested columns land in data_monitoring_metrics and elementary_test_results with identical schema. The dotted column_name is just a longer string in an otherwise unchanged structure.
elementary.on_run_end upload hook works unchanged with the override — metric history persists correctly.

Tested against:

dbt-core 1.11.8 / dbt-bigquery 1.11.1
elementary package version 0.23.x (this branch)

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@macros/edr/data_monitoring/data_monitors_configuration/get_column_monitors.sql`:
- Around line 10-13: The loop currently excludes only leaves whose own leaf.mode
== 'REPEATED', but needs to exclude any leaf that has a REPEATED ancestor so
downstream UNNESTs aren't missed; change the logic around the col.flatten()
iteration to skip a leaf if any ancestor in its flattened path is REPEATED
(e.g., inspect the leaf's ancestry/path metadata returned by col.flatten() or
augment flatten to return ancestor modes), and only do expanded.append(leaf)
when no ancestor mode == 'REPEATED' (retain the existing reference to
col.flatten(), leaf.mode, and expanded.append in your change).

In `@macros/edr/data_monitoring/monitors_query/column_monitoring_query.sql`:
- Around line 402-423: The macro wrap_column_for_struct_support currently always
includes 'fields': column_obj.fields which breaks non-BigQuery adapters because
dbt's base Column lacks a fields attribute; update the macro to only set the
'fields' key when the attribute exists (e.g. when target.type == 'bigquery' and
column_obj.fields is defined) or use a defined-check (column_obj.fields is
defined) and otherwise omit or set fields to null/empty, ensuring all references
inside the returned dict (name, column, quoted, safe_alias, dtype, data_type,
fields) remain valid for non-BigQuery Column objects.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 281a98fa-e3f9-47ef-b12d-ec7d113d1681

📥 Commits

Reviewing files that changed from the base of the PR and between ab1a10b and d45a775.

📒 Files selected for processing (3)

macros/edr/data_monitoring/data_monitors_configuration/get_column_monitors.sql
macros/edr/data_monitoring/monitors_query/column_monitoring_query.sql
macros/edr/data_monitoring/monitors_query/dimension_monitoring_query.sql

Address CodeRabbit findings: 1. `BigQueryColumn.flatten()` discards ancestor modes, so a NULLABLE leaf under a REPEATED ancestor still satisfied the previous `leaf.mode != 'REPEATED'` filter. Add `bq_safe_leaf_names` + `_bq_walk_collect`, an ancestor-aware walker that returns only leaves with no REPEATED ancestor in their path. Filter `flatten()` output against this set. 2. `wrap_column_for_struct_support` unconditionally read `column_obj.fields`, which raised on non-BigQuery adapters (base `Column` lacks `fields`). Guard with `column_obj.fields is defined` and default to an empty list, so the wrapper is safe on Snowflake, Postgres, Redshift, etc.

tlangton3 requested a deployment to elementary_test_env May 22, 2026 10:36 — with GitHub Actions Waiting

tlangton3 marked this pull request as ready for review May 22, 2026 13:51

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

Comment thread macros/edr/data_monitoring/data_monitors_configuration/get_column_monitors.sql

Comment thread macros/edr/data_monitoring/monitors_query/column_monitoring_query.sql

tlangton3 requested a deployment to elementary_test_env May 22, 2026 14:14 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support BigQuery nested STRUCT fields in anomaly tests#1012

Support BigQuery nested STRUCT fields in anomaly tests#1012
tlangton3 wants to merge 2 commits into
elementary-data:masterfrom
tlangton3:bigquery-nested-struct-support

tlangton3 commented May 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

tlangton3 commented May 22, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tlangton3 commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes

Why two representations

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

tlangton3 commented May 22, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tlangton3 commented May 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading