[WIP] Replace failed-tests matviews with direct partitioned query#3647
[WIP] Replace failed-tests matviews with direct partitioned query#3647mstaeble wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
WalkthroughRemoves ChangesFailed-Tests Matview Removal and Live Query Migration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 18 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (18 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mstaeble The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
d553fc3 to
20e1d35
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/api/job_analysis.go (1)
112-114: 💤 Low valueConsider moving period validation earlier to ensure both queries use the same value.
This validation only affects the test-failure query (lines 116-127). The earlier job-runs aggregation (lines 51-67) passes
perioddirectly to GORM. If an invalid period reaches this function, the two queries could behave inconsistently. Moving the validation before line 51 would ensure uniform behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/api/job_analysis.go` around lines 112 - 114, The period validation logic that checks if period is not equal to PeriodDay and PeriodHour (and defaults to PeriodDay if invalid) is currently positioned after the job-runs aggregation query. Move this validation block to the beginning of the function, before the aggregation query that uses the period parameter, so that both the early aggregation query and the later test-failure query operate with the same validated period value and ensure consistent behavior throughout the function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/api/job_analysis.go`:
- Line 124: The GROUP BY clause at line 124 includes prow_job_id, which causes
the query to return separate counts for each job. However, when aggregating
results later (line 132), the code uses assignment rather than addition, causing
per-job counts to be overwritten and discarded. To count total test failures
across all selected jobs per period, remove pjrt.prow_job_id from the Group()
method call in the query construction so the results are grouped only by test
name and date truncation without the job ID dimension.
---
Nitpick comments:
In `@pkg/api/job_analysis.go`:
- Around line 112-114: The period validation logic that checks if period is not
equal to PeriodDay and PeriodHour (and defaults to PeriodDay if invalid) is
currently positioned after the job-runs aggregation query. Move this validation
block to the beginning of the function, before the aggregation query that uses
the period parameter, so that both the early aggregation query and the later
test-failure query operate with the same validated period value and ensure
consistent behavior throughout the function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: cf4f597c-12b0-43d8-bba6-3422519a9f0c
📒 Files selected for processing (3)
pkg/api/job_analysis.gopkg/db/views.gopkg/flags/postgres_benchmarking_test.go
💤 Files with no reviewable changes (2)
- pkg/db/views.go
- pkg/flags/postgres_benchmarking_test.go
20e1d35 to
7f1bea2
Compare
The prow_job_failed_tests_by_day and prow_job_failed_tests_by_hour materialized views scanned all ~3,500 partitions of prow_job_run_tests with no time or release filter, taking ~3 minutes per refresh. A direct query with partition pruning keys (prow_job_run_release and prow_job_run_timestamp BETWEEN start AND end) runs in ~13ms — faster than reading the matview itself (46ms) — making the matviews unnecessary overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7f1bea2 to
935b968
Compare
Summary
prow_job_failed_tests_by_dayandprow_job_failed_tests_by_hourmaterialized views scanned all ~3,500 partitions ofprow_job_run_testswith no time or release filter, taking ~3 minutes per refresh.prow_job_run_release+prow_job_run_timestamp BETWEEN start AND end) eliminates the matview refresh entirely and queries the partitioned table directly.PrintJobAnalysisJSONFromDBwith a direct query using the release and time range already available from the caller.Benchmarks (staging, release 4.22 with ~1,000 job IDs,
enable_partitionwise_join = on)The direct query is slower than reading the pre-aggregated matview (249ms vs 46ms), but eliminates the 3-minute background refresh cycle. End-to-end page load times observed: ~2.5s cold after server restart, <1.5s warm.
Test plan
go build ./...go vetclean🤖 Generated with Claude Code