feat: compare by run#462
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a4855d2. Configure here.
| const data = query.data as WorkflowInfoResponse; | ||
| if (!data.changelogs || data.changelogs.length === 0) continue; | ||
|
|
||
| // Group changelog entries by the run that produced them. In the API |
There was a problem hiding this comment.
Skips dates lacking changelog rows
High Severity
When building comparison changelogs, a date is dropped entirely if changelogs is empty, even when runConfigs lists benchmark runs for that day. Per-run comparison and plain-date expansion never receive runConfigs, so newest data-only runs stay hidden from the changelog and chart sync logic.
Reviewed by Cursor Bugbot for commit a4855d2. Configure here.
| .filter((item) => item.entries.length > 0 || pinnedDates.has(item.date)) | ||
| .toSorted((a, b) => new Date(a.date).getTime() - new Date(b.date).getTime()); | ||
| }, [changelogs, selectedGPUs, selectedPrecisions, pinnedDates]); | ||
| }, [changelogs, modelDbKeys, selectedGPUs, selectedPrecisions, pinnedDates]); |
There was a problem hiding this comment.
Filters out data-only changelog dates
Medium Severity
filteredChangelogs removes a date when every changelog entry fails the model, precision, or GPU filter, even if runConfigs still has multiple runs for the selected model. The per-run changelog UI and add-to-chart actions for that day never render.
Reviewed by Cursor Bugbot for commit a4855d2. Configure here.


Note
Medium Risk
Touches benchmark API routing, new DB queries, and a wide inference comparison UI path (selection state, parallel fetches, legend/changelog sync); behavior changes for multi-run same-day dates but is covered by new unit tests.
Overview
Adds per-run GPU comparison so multiple workflow runs on the same day can appear as separate chart series instead of collapsing to “latest for that date.”
API / data:
GET /api/v1/benchmarksgainsexactRun=truewith a numericrunId, routed to newgetBenchmarksForRun(separate cache prefixbenchmarks-run).GET /api/v1/workflow-infonow includesrunConfigsviagetRunConfigsByDateso the UI can list runs that produced data even without changelog rows.Client model: Comparison selections use encoded entries like
2026-06-14~r27489075807(helpers incomparisonEntry/runEnumeration).useChartDataissues exact-run benchmark queries for those entries;fetchBenchmarksand React Query keys distinguish as-of vs exact-run with the samerunId.UI: The comparison changelog is run-aware (per-run add/remove, model-scoped config filtering, “Add all” at run granularity).
ChartDisplayauto-expands a plain date into one entry per run when multiple runs exist, shares stable run numbering withGPUGraphlegends, and allows comparison with a date range or individually added runs. Legend removal updatesselectedDates; roofline CSS selectors escape~in series ids.Reviewed by Cursor Bugbot for commit a4855d2. Bugbot is set up for automated code reviews on this repo. Configure here.