APP-1086: fix BigQuery nanosecond timestamp cast failure in edr_cast_as_timestamp#1003
APP-1086: fix BigQuery nanosecond timestamp cast failure in edr_cast_as_timestamp#1003MikaKerman wants to merge 1 commit into
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @MikaKerman |
📝 WalkthroughWalkthroughA BigQuery-specific macro ChangesTimestamp Casting Macro
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…cond precision Co-Authored-By: mika@elementary-data.com <mika.kerman@gmail.com>
6b65aa5 to
7df3be9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@macros/utils/data_types/cast_column.sql`:
- Around line 19-27: The regexp replacement in macro
bigquery__edr_cast_as_timestamp is using a Python/PCRE-style backreference (\1)
which BigQuery expects as $1; update the replacement argument in regexp_replace
from r'\1' (or '\1') to r'$1' (or '$1') so the captured fractional-second group
is preserved, leaving the rest of the expression (the pattern r'(\.\d{6})\d+'
and the surrounding cast to elementary.edr_type_timestamp()) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7a2bb989-42b3-424e-ac88-8e07b81ce223
📒 Files selected for processing (1)
macros/utils/data_types/cast_column.sql
| {%- macro bigquery__edr_cast_as_timestamp(timestamp_field) -%} | ||
| cast( | ||
| regexp_replace( | ||
| cast({{ timestamp_field }} as {{ elementary.edr_type_string() }}), | ||
| r'(\.\d{6})\d+', | ||
| r'\1' | ||
| ) as {{ elementary.edr_type_timestamp() }} | ||
| ) | ||
| {%- endmacro -%} |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
BigQuery REGEXP_REPLACE backreference syntax \1 or $1
💡 Result:
In BigQuery (GoogleSQL), REGEXP_REPLACE uses capture-group backreferences in the replacement string with the $n form (e.g., $1, $2), not \1-style backreferences. [1][2] However, when writing the replacement as a SQL string literal, you must escape the $ if BigQuery (or your surrounding tool) interprets $n as something else (for example, a parameter). In those cases, use an escaped dollar like '\2' inside the literal so that the replacement engine sees \2 -> $2. [3] Example pattern with 3 capture groups: SELECT note, REGEXP_REPLACE(note, r'(.+)(£\ *[\d.,]+)(.+)', '\2') AS note2 FROM project.dataset.table; [3] So: - Prefer: '$1' / '$2' in the replacement expression (conceptually the syntax is $n). [1][2] - If $n is being misinterpreted in your environment: escape so the engine receives the backreference token; one workaround shown is using '\2' to yield group 2. [3]
Citations:
- 1: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators
- 2: https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/string_functions
- 3: https://stackoverflow.com/questions/67487886/bigquery-regexp-replace-referencing-capture-group-in-the-replacement-expression
Fix BigQuery REGEXP_REPLACE backreference syntax ($1 instead of \1)
bigquery__edr_cast_as_timestamp uses regexp_replace(..., r'(\.\d{6})\d+', r'\1'), but BigQuery capture-group backreferences in the replacement expression use the $n form (e.g., $1), not \1. Update the replacement to '$1' (or r'$1') so the truncated fractional seconds are applied correctly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@macros/utils/data_types/cast_column.sql` around lines 19 - 27, The regexp
replacement in macro bigquery__edr_cast_as_timestamp is using a
Python/PCRE-style backreference (\1) which BigQuery expects as $1; update the
replacement argument in regexp_replace from r'\1' (or '\1') to r'$1' (or '$1')
so the captured fractional-second group is preserved, leaving the rest of the
expression (the pattern r'(\.\d{6})\d+' and the surrounding cast to
elementary.edr_type_timestamp()) unchanged.
There was a problem hiding this comment.
@coderabbitai BigQuery's official docs for REGEXP_REPLACE state: "You can use backslashed-escaped digits (\1 to \9) within the replacement argument to insert text matching the corresponding parenthesized group" — $N is not supported. The Stack Overflow answer your web query cited also uses \\2 (i.e. \2 after string-literal escaping), not $2.
Since I'm using a raw string r'\1', that's exactly the literal \1 token BigQuery expects. The Dremio macro in this file uses $1 because Dremio runs Gandiva/Arrow, which is a different engine.
Keeping the code as-is.
There was a problem hiding this comment.
Skipped: comment is from another GitHub bot.
Summary
Adds a BigQuery-specific override for
edr_cast_as_timestampthat truncates sub-microsecond fractional digits before casting toTIMESTAMP.BigQuery's
TIMESTAMPonly supports microsecond precision (6 fractional digits). Some runtimes (e.g. dbt-fusion) write nanosecond-precision strings like2026-04-03T10:50:50.961498756Zinto Elementary'sdbt_run_resultscolumns (execute_started_at,execute_completed_at,compile_started_at,compile_completed_at). The defaultcast(field as timestamp)then fails withDatabase Error: Invalid timestamp: '...'and blocksedr send-report/edr reportuntil the bad rows age out.The new
bigquery__edr_cast_as_timestampmacro wraps the value inregexp_replace(..., r'(\.\d{6})\d+', r'\1')before the cast, truncating any extra fractional digits to exactly 6. Strings with ≤ 6 fractional digits are unaffected. This follows the same dispatch pattern as the existingdremio__edr_cast_as_timestamptruncation.bigquery__edr_cast_as_datechains throughedr_cast_as_timestamp, so the fix applies there transparently as well.Linear: APP-1086
Review & Testing Checklist for Human
dbt_run_resultswith at least one row whereexecute_completed_athas 9 fractional digits (e.g.2026-04-03T10:50:50.961498756Z) and confirmedr send-report/edr reportno longer fails on thedays_backfilter.edr_cast_as_timestamp.Notes
try_castwhich handles this; Dremio already truncates to millisecond precision.Link to Devin session: https://app.devin.ai/sessions/3a3eac19b8924571820addc7d6b6a877
Requested by: @MikaKerman
Summary by CodeRabbit