Skip to content

APP-1086: fix BigQuery nanosecond timestamp cast failure in edr_cast_as_timestamp#1003

Open
MikaKerman wants to merge 1 commit into
masterfrom
app-1086-fix-bigquery-nanosecond-timestamp-cast-failure-in
Open

APP-1086: fix BigQuery nanosecond timestamp cast failure in edr_cast_as_timestamp#1003
MikaKerman wants to merge 1 commit into
masterfrom
app-1086-fix-bigquery-nanosecond-timestamp-cast-failure-in

Conversation

@MikaKerman
Copy link
Copy Markdown
Contributor

@MikaKerman MikaKerman commented May 6, 2026

Summary

Adds a BigQuery-specific override for edr_cast_as_timestamp that truncates sub-microsecond fractional digits before casting to TIMESTAMP.

BigQuery's TIMESTAMP only supports microsecond precision (6 fractional digits). Some runtimes (e.g. dbt-fusion) write nanosecond-precision strings like 2026-04-03T10:50:50.961498756Z into Elementary's dbt_run_results columns (execute_started_at, execute_completed_at, compile_started_at, compile_completed_at). The default cast(field as timestamp) then fails with Database Error: Invalid timestamp: '...' and blocks edr send-report / edr report until the bad rows age out.

The new bigquery__edr_cast_as_timestamp macro wraps the value in regexp_replace(..., r'(\.\d{6})\d+', r'\1') before the cast, truncating any extra fractional digits to exactly 6. Strings with ≤ 6 fractional digits are unaffected. This follows the same dispatch pattern as the existing dremio__edr_cast_as_timestamp truncation. bigquery__edr_cast_as_date chains through edr_cast_as_timestamp, so the fix applies there transparently as well.

Linear: APP-1086

Review & Testing Checklist for Human

  • On a BigQuery target, seed dbt_run_results with at least one row where execute_completed_at has 9 fractional digits (e.g. 2026-04-03T10:50:50.961498756Z) and confirm edr send-report / edr report no longer fails on the days_back filter.
  • Confirm existing rows with ≤ 6 fractional digits (or no fractional part) still cast correctly and produce identical results to before this change.
  • Run BigQuery integration tests to verify no regression in anomaly detection / monitoring queries that go through edr_cast_as_timestamp.

Notes

  • No version bump or release work is included here per the ticket assignee.
  • Postgres/Snowflake/Databricks/Spark are unaffected (native nanosecond support or string parsing); Athena/Trino already use try_cast which handles this; Dremio already truncates to millisecond precision.

Link to Devin session: https://app.devin.ai/sessions/3a3eac19b8924571820addc7d6b6a877
Requested by: @MikaKerman

Summary by CodeRabbit

  • New Features
    • Improved BigQuery timestamp handling: input timestamps now truncate sub-microsecond fractional digits before casting, preserving microsecond precision and preventing incorrect conversions. This enhances reliability when importing or transforming timestamp data.

Review Change Stack

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@linear
Copy link
Copy Markdown

linear Bot commented May 6, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

👋 @MikaKerman
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

📝 Walkthrough

Walkthrough

A BigQuery-specific macro bigquery__edr_cast_as_timestamp(timestamp_field) is added; it converts the input to string, uses a regex to truncate fractional digits beyond microseconds, then casts back to the project timestamp type.

Changes

Timestamp Casting Macro

Layer / File(s) Summary
BigQuery Timestamp Precision
macros/utils/data_types/cast_column.sql
New macro bigquery__edr_cast_as_timestamp(timestamp_field) casts timestamp to string, truncates sub-microsecond precision using regexp_replace, and casts back to the internal edr_type_timestamp().

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A macro hops in, swift and clean,
BigQuery timestamps, now pristine!
Sub-microseconds trimmed with care,
Regex magic through the air! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: fixing BigQuery's nanosecond timestamp cast failure by adding a BigQuery-specific override macro that truncates sub-microsecond fractional digits.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch app-1086-fix-bigquery-nanosecond-timestamp-cast-failure-in

Comment @coderabbitai help to get the list of available commands and usage tips.

…cond precision

Co-Authored-By: mika@elementary-data.com <mika.kerman@gmail.com>
@devin-ai-integration devin-ai-integration Bot force-pushed the app-1086-fix-bigquery-nanosecond-timestamp-cast-failure-in branch from 6b65aa5 to 7df3be9 Compare May 24, 2026 08:43
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@macros/utils/data_types/cast_column.sql`:
- Around line 19-27: The regexp replacement in macro
bigquery__edr_cast_as_timestamp is using a Python/PCRE-style backreference (\1)
which BigQuery expects as $1; update the replacement argument in regexp_replace
from r'\1' (or '\1') to r'$1' (or '$1') so the captured fractional-second group
is preserved, leaving the rest of the expression (the pattern r'(\.\d{6})\d+'
and the surrounding cast to elementary.edr_type_timestamp()) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a2bb989-42b3-424e-ac88-8e07b81ce223

📥 Commits

Reviewing files that changed from the base of the PR and between 6b65aa5 and 7df3be9.

📒 Files selected for processing (1)
  • macros/utils/data_types/cast_column.sql

Comment on lines +19 to +27
{%- macro bigquery__edr_cast_as_timestamp(timestamp_field) -%}
cast(
regexp_replace(
cast({{ timestamp_field }} as {{ elementary.edr_type_string() }}),
r'(\.\d{6})\d+',
r'\1'
) as {{ elementary.edr_type_timestamp() }}
)
{%- endmacro -%}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

BigQuery REGEXP_REPLACE backreference syntax \1 or $1

💡 Result:

In BigQuery (GoogleSQL), REGEXP_REPLACE uses capture-group backreferences in the replacement string with the $n form (e.g., $1, $2), not \1-style backreferences. [1][2] However, when writing the replacement as a SQL string literal, you must escape the $ if BigQuery (or your surrounding tool) interprets $n as something else (for example, a parameter). In those cases, use an escaped dollar like '\2' inside the literal so that the replacement engine sees \2 -> $2. [3] Example pattern with 3 capture groups: SELECT note, REGEXP_REPLACE(note, r'(.+)(£\ *[\d.,]+)(.+)', '\2') AS note2 FROM project.dataset.table; [3] So: - Prefer: '$1' / '$2' in the replacement expression (conceptually the syntax is $n). [1][2] - If $n is being misinterpreted in your environment: escape so the engine receives the backreference token; one workaround shown is using '\2' to yield group 2. [3]

Citations:


Fix BigQuery REGEXP_REPLACE backreference syntax ($1 instead of \1)

bigquery__edr_cast_as_timestamp uses regexp_replace(..., r'(\.\d{6})\d+', r'\1'), but BigQuery capture-group backreferences in the replacement expression use the $n form (e.g., $1), not \1. Update the replacement to '$1' (or r'$1') so the truncated fractional seconds are applied correctly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@macros/utils/data_types/cast_column.sql` around lines 19 - 27, The regexp
replacement in macro bigquery__edr_cast_as_timestamp is using a
Python/PCRE-style backreference (\1) which BigQuery expects as $1; update the
replacement argument in regexp_replace from r'\1' (or '\1') to r'$1' (or '$1')
so the captured fractional-second group is preserved, leaving the rest of the
expression (the pattern r'(\.\d{6})\d+' and the surrounding cast to
elementary.edr_type_timestamp()) unchanged.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai BigQuery's official docs for REGEXP_REPLACE state: "You can use backslashed-escaped digits (\1 to \9) within the replacement argument to insert text matching the corresponding parenthesized group"$N is not supported. The Stack Overflow answer your web query cited also uses \\2 (i.e. \2 after string-literal escaping), not $2.

Since I'm using a raw string r'\1', that's exactly the literal \1 token BigQuery expects. The Dremio macro in this file uses $1 because Dremio runs Gandiva/Arrow, which is a different engine.

Keeping the code as-is.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped: comment is from another GitHub bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants