[WIP] Replace failed-tests matviews with direct partitioned query by mstaeble · Pull Request #3647 · openshift/sippy

mstaeble · 2026-06-17T19:42:30Z

Summary

The prow_job_failed_tests_by_day and prow_job_failed_tests_by_hour materialized views scanned all ~3,500 partitions of prow_job_run_tests with no time or release filter, taking ~3 minutes per refresh.
A direct query with partition pruning keys (prow_job_run_release + prow_job_run_timestamp BETWEEN start AND end) eliminates the matview refresh entirely and queries the partitioned table directly.
Removes the matview definitions and replaces the matview read in PrintJobAnalysisJSONFromDB with a direct query using the release and time range already available from the caller.
Adds error checking on the query result.
Tested against staging database via the local UI — job analysis page renders test failure counts correctly.

Benchmarks (staging, release 4.22 with ~1,000 job IDs, `enable_partitionwise_join = on`)

Approach	Cold	Warm
Matview refresh (no filters)	~3 minutes	~3 minutes
Matview read	53ms	46ms
Direct query (release + 14-day range)	257ms	249ms

The direct query is slower than reading the pre-aggregated matview (249ms vs 46ms), but eliminates the 3-minute background refresh cycle. End-to-end page load times observed: ~2.5s cold after server restart, <1.5s warm.

Test plan

Full project compiles with go build ./...
go vet clean
Manually tested job analysis page against staging database
CI passes

🤖 Generated with Claude Code

openshift-merge-bot · 2026-06-17T19:42:32Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-06-17T19:42:40Z

Walkthrough

Removes prow_job_failed_tests_by_day_matview and prow_job_failed_tests_by_hour_matview from PostgresMatViews registry and their SQL definition. PrintJobAnalysisJSONFromDB is updated to query prow_job_run_tests directly with date_trunc-based period grouping. Benchmark cases for the removed matviews are also removed.

Changes

Failed-Tests Matview Removal and Live Query Migration

Layer / File(s)	Summary
Remove matview definitions from database registry `pkg/db/views.go`	Removes `prow_job_failed_tests_by_day_matview` and `prow_job_failed_tests_by_hour_matview` entries from `PostgresMatViews` and deletes the `prowJobFailedTestsMatView` SQL constant.
Replace matview query with live join and validation `pkg/api/job_analysis.go`	`PrintJobAnalysisJSONFromDB` now queries `prow_job_run_tests` joined to `tests` directly, computing period via `date_trunc` on `prow_job_run_timestamp`, filtering by failure status and release, constraining by timestamp window, and adding explicit `Scan` error handling.
Remove matview benchmark cases `pkg/flags/postgres_benchmarking_test.go`	Removes `MatviewFailedTestsByDay` and `MatviewFailedTestsByHour` benchmark entries from `getMatviewBenchmarkCases` as those matviews no longer exist.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

ready-for-human-review

Suggested reviewers

deepsm007
xueqzhan

🚥 Pre-merge checks | ✅ 18 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling	⚠️ Warning	Line 69 in pkg/api/job_analysis.go contains `sumResults.Scan(&sums)` which ignores the returned error without checking or handling it, violating the custom check requirement to never ignore returne...	Check the error from `sumResults.Scan(&sums)` and return it if non-nil, similar to the pattern used in lines 112-122 for the test failures query: `if err := sumResults.Scan(&sums).Error; err != nil { return result, err }`
Test Coverage For New Features	⚠️ Warning	PR modifies PrintJobAnalysisJSONFromDB (an existing function handling DB queries) to replace matview reads with direct partitioned queries, including error handling addition. No unit tests were add...	Add unit test for PrintJobAnalysisJSONFromDB covering both the successful query path and error scenarios, or alternatively document why existing integration tests are sufficient.

✅ Passed checks (18 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Sql Injection Prevention	✅ Passed	All SQL queries in the modified code use GORM's parameterized query API with proper placeholders (?). No direct string concatenation or unsafe SQL construction found. All user inputs (release, star...
Excessive Css In React Should Use Styles	✅ Passed	PR contains only Go backend code changes (database queries, materialized views, benchmarks). No React components or inline CSS are modified, making the check inapplicable.
Single Responsibility And Clear Naming	✅ Passed	PR improves SRP by removing unnecessary materialized views, consolidating test failure logic, and adding explicit error checking. Naming remains consistent with codebase conventions; no new naming...
Feature Documentation	✅ Passed	Changes are database infrastructure optimizations (materialized view removal) in job_analysis.go and views.go. Feature documentation (docs/features/job-analysis-symptoms.md) covers artifact pattern...
Stable And Deterministic Test Names	✅ Passed	No Ginkgo tests are used in this codebase or this PR. All benchmark test names are static and deterministic with no dynamic values, timestamps, or generated identifiers.
Test Structure And Quality	✅ Passed	The custom check for Ginkgo test code is not applicable to this PR. The repository uses standard Go testing (testing package), not Ginkgo. No Ginkgo tests exist in the codebase or in the PR changes.
Microshift Test Compatibility	✅ Passed	This PR does not add any new Ginkgo e2e tests. The changes are database query optimizations and benchmark test removals, not e2e test additions. The custom check is not applicable.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR does not add any Ginkgo e2e tests. It only modifies database queries and schema (job_analysis.go, views.go) and Go standard benchmark tests (postgres_benchmarking_test.go). SNO compatibility che...
Topology-Aware Scheduling Compatibility	✅ Passed	PR contains only database/Go library changes (query optimization, materialized view removal, benchmark updates) with no Kubernetes manifests, operator code, controllers, or scheduling constraints t...
Ote Binary Stdout Contract	✅ Passed	All modified files maintain OTE Binary Stdout Contract: library packages contain no process-level stdout writes, and test file writes are confined to test function bodies (allowed per spec).
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	This PR does not add any new Ginkgo e2e tests. It modifies API logic, removes DB view definitions, and removes benchmark cases from standard Go tests. The custom check only applies when new Ginkgo...
No-Weak-Crypto	✅ Passed	PR modifies database optimization only (replaces materialized views with direct queries). No crypto imports, weak algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementation...
Container-Privileges	✅ Passed	PR modifies only Go source files (pkg/api/job_analysis.go, pkg/db/views.go, pkg/flags/postgres_benchmarking_test.go) for database query optimization. No container or Kubernetes manifest files were...
No-Sensitive-Data-In-Logs	✅ Passed	No logging statements exposing sensitive data (passwords, tokens, API keys, PII, session IDs, or hostnames) were added. Error handling returns errors without logging them.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically describes the main change: replacing failed-tests materialized views with direct partitioned queries, which is the core objective of this PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-17T19:42:53Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2026-06-17T19:44:03Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mstaeble
Once this PR has been reviewed and has the lgtm label, please assign stbenjam for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pkg/api/job_analysis.go (1)
112-114: 💤 Low value

Consider moving period validation earlier to ensure both queries use the same value.

This validation only affects the test-failure query (lines 116-127). The earlier job-runs aggregation (lines 51-67) passes period directly to GORM. If an invalid period reaches this function, the two queries could behave inconsistently. Moving the validation before line 51 would ensure uniform behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/job_analysis.go` around lines 112 - 114, The period validation logic
that checks if period is not equal to PeriodDay and PeriodHour (and defaults to
PeriodDay if invalid) is currently positioned after the job-runs aggregation
query. Move this validation block to the beginning of the function, before the
aggregation query that uses the period parameter, so that both the early
aggregation query and the later test-failure query operate with the same
validated period value and ensure consistent behavior throughout the function.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/job_analysis.go`:
- Line 124: The GROUP BY clause at line 124 includes prow_job_id, which causes
the query to return separate counts for each job. However, when aggregating
results later (line 132), the code uses assignment rather than addition, causing
per-job counts to be overwritten and discarded. To count total test failures
across all selected jobs per period, remove pjrt.prow_job_id from the Group()
method call in the query construction so the results are grouped only by test
name and date truncation without the job ID dimension.

---

Nitpick comments:
In `@pkg/api/job_analysis.go`:
- Around line 112-114: The period validation logic that checks if period is not
equal to PeriodDay and PeriodHour (and defaults to PeriodDay if invalid) is
currently positioned after the job-runs aggregation query. Move this validation
block to the beginning of the function, before the aggregation query that uses
the period parameter, so that both the early aggregation query and the later
test-failure query operate with the same validated period value and ensure
consistent behavior throughout the function.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: cf4f597c-12b0-43d8-bba6-3422519a9f0c

📥 Commits

Reviewing files that changed from the base of the PR and between 0ecf781 and d553fc3.

📒 Files selected for processing (3)

pkg/api/job_analysis.go
pkg/db/views.go
pkg/flags/postgres_benchmarking_test.go

💤 Files with no reviewable changes (2)

pkg/db/views.go
pkg/flags/postgres_benchmarking_test.go

The prow_job_failed_tests_by_day and prow_job_failed_tests_by_hour materialized views scanned all ~3,500 partitions of prow_job_run_tests with no time or release filter, taking ~3 minutes per refresh. A direct query with partition pruning keys (prow_job_run_release and prow_job_run_timestamp BETWEEN start AND end) runs in ~13ms — faster than reading the matview itself (46ms) — making the matviews unnecessary overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 17, 2026

mstaeble force-pushed the remove-failed-tests-matview branch 2 times, most recently from d553fc3 to 20e1d35 Compare June 17, 2026 21:38

coderabbitai Bot requested changes Jun 17, 2026

View reviewed changes

Comment thread pkg/api/job_analysis.go Outdated

mstaeble force-pushed the remove-failed-tests-matview branch from 20e1d35 to 7f1bea2 Compare June 17, 2026 21:56

mstaeble force-pushed the remove-failed-tests-matview branch from 7f1bea2 to 935b968 Compare June 18, 2026 13:48

coderabbitai Bot approved these changes Jun 18, 2026

View reviewed changes

openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 18, 2026

mstaeble mentioned this pull request Jun 18, 2026

[WIP] Add 14-day time bound to failed-tests matviews #3651

Draft

4 tasks

mstaeble changed the title ~~Replace failed-tests matviews with direct partitioned query~~ [WIP] Replace failed-tests matviews with direct partitioned query Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Replace failed-tests matviews with direct partitioned query#3647

[WIP] Replace failed-tests matviews with direct partitioned query#3647
mstaeble wants to merge 1 commit into
openshift:mainfrom
mstaeble:remove-failed-tests-matview

mstaeble commented Jun 17, 2026 •

edited

Loading

Uh oh!

openshift-merge-bot Bot commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

❌ Failed checks (3 warnings)

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mstaeble commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks (staging, release 4.22 with ~1,000 job IDs, enable_partitionwise_join = on)

Test plan

Uh oh!

openshift-merge-bot Bot commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (3 warnings)

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mstaeble commented Jun 17, 2026 •

edited

Loading

Benchmarks (staging, release 4.22 with ~1,000 job IDs, `enable_partitionwise_join = on`)

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading