Skip to content

feat(workflows): add label-driven bug-test workflow (#3239)#3257

Merged
mnriem merged 2 commits into
mainfrom
benbtg-feat-3239-bug-test-workflow
Jul 1, 2026
Merged

feat(workflows): add label-driven bug-test workflow (#3239)#3257
mnriem merged 2 commits into
mainfrom
benbtg-feat-3239-bug-test-workflow

Conversation

@BenBtg

@BenBtg BenBtg commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the third stage (assess → fix → test) of the semi-automated, human-gated bug pipeline, closing #3239. A new gh-aw agentic workflow bug-test triggers when a maintainer applies the bug-test label, runs the relevant tests in isolation against the fix, compiles a readable pass/fail report, and posts it back as a single comment on the originating issue.

Modeled on the existing bug-assess workflow for safety and trigger parity, and decoupled from Spec Kit specifics so other projects can reuse it.

What's included

  • .github/workflows/bug-test.md — hand-authored agentic workflow source.
  • .github/workflows/bug-test.lock.yml — compiled with gh aw compile v0.79.8 (do not hand-edit).

Behavior

  • Label-driven trigger: issues: labeled gated to bug-test; bot-skip parity with bug-assess.
  • Locates the fix under test: linked PR → named fix branch → current-checkout fallback, only ever from origin (untrusted references are recorded, never fetched/executed).
  • Stack-agnostic test detection: uv+pytest (default for this repo), npm/pnpm/yarn, go, make — no hardcoded ecosystem.
  • Isolated execution: tests run inside the firewalled runner, wrapped in a timeout, treated as untrusted code; raw logs kept in $RUNNER_TEMP, never written to the working tree.
  • Compiles a report: structured test-report.md with a one-line verdict, counts table, failures, and caveats.
  • Verification mode: compares a generated fix against the historical fix for old/closed bugs to surface discrepancies and improve pipeline reliability.
  • Posts back: one comment (≤65k chars) + one optional result label (tests-passing / tests-failing / tests-inconclusive).
  • Safety parity: scoped read-only permissions (contents, issues, pull-requests), identical URL-safety / untrusted-input guardrails, maintainer remains the gatekeeper.
  • Action consistency: pinned to actions/checkout@v7.0.0 to align with other workflows in the repo.

Acceptance criteria

  • bug-test markdown workflow added under .github/workflows/ and compiled to its .lock.yml.
  • Triggered by applying the test label; runs tests in isolation against the fix.
  • Compiles and posts the test outcome back to the issue.
  • Supports validation against an old/closed bug to compare generated vs. historical fix.
  • Maintainer remains the gatekeeper; consistent safety model with the other stages.

Notes

  • The bug-fix stage (Implement label-driven bug fix workflow #3238) is not yet merged; bug-test consumes its output (PR/branch) but degrades gracefully when no fix artifact is found, reporting inconclusive.
  • Result labels (tests-passing / tests-failing / tests-inconclusive) are applied only if they exist in the repo; a missing label is a soft no-op and does not block the comment.
  • Compiled with gh aw v0.79.8. actions/checkout was manually pinned to v7.0.0 to align with repo standards (similar to dependabot PR chore(deps): bump actions/checkout from 6.0.3 to 7.0.0 #3064).

🤖 This PR was authored autonomously by GitHub Copilot (model: Claude Opus 4.8) on behalf of @BenBtg. Each commit carries an Assisted-by: trailer.

Add the third stage (assess → fix → test) of the semi-automated, human-gated
bug pipeline. The `bug-test` agentic workflow triggers when a maintainer applies
the `bug-test` label, runs the relevant tests in isolation against the fix,
compiles a readable pass/fail report, and posts it back as a single issue
comment.

- Locates the fix under test: linked PR → named fix branch → current checkout
  fallback, only ever from origin.
- Stack-agnostic test detection (uv+pytest, npm/pnpm/yarn, go, make) so it is
  decoupled from Spec Kit specifics and reusable by other projects.
- Runs tests under a timeout as untrusted code; scoped read-only permissions;
  same URL-safety / untrusted-input guardrails as bug-assess.
- Verification mode compares a generated fix against the historical fix for
  old/closed bugs to surface discrepancies.
- Optional single result label (tests-passing / tests-failing /
  tests-inconclusive).

Compiled bug-test.lock.yml with `gh aw compile`.

Assisted-by: GitHub Copilot (model: Claude Opus 4.8, autonomous)
Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 30, 2026 14:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new label-driven gh-aw agentic workflow stage (bug-test) to run relevant tests against a proposed bug fix and post a single compiled test report back to the originating issue, completing the assess → fix → test pipeline.

Changes:

  • Introduces a hand-authored .github/workflows/bug-test.md workflow prompt/source for the test stage.
  • Adds the compiled .github/workflows/bug-test.lock.yml generated by gh aw compile for execution in GitHub Actions.
Show a summary per file
File Description
.github/workflows/bug-test.md Defines the label-triggered “bug-test” agent behavior (locate fix artifact, detect test stack, run with timeout, compile report, post comment/label).
.github/workflows/bug-test.lock.yml Compiled, pinned GitHub Actions workflow generated from bug-test.md for actual execution.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/2 changed files
  • Comments generated: 2
  • Review effort level: Low

Comment thread .github/workflows/bug-test.lock.yml
Comment thread .github/workflows/bug-test.lock.yml
@BenBtg BenBtg marked this pull request as ready for review June 30, 2026 14:19
@BenBtg BenBtg requested a review from mnriem as a code owner June 30, 2026 14:19
@BenBtg BenBtg marked this pull request as draft June 30, 2026 14:24
@BenBtg BenBtg marked this pull request as ready for review June 30, 2026 14:27
@BenBtg BenBtg self-assigned this Jun 30, 2026
@BenBtg BenBtg marked this pull request as draft June 30, 2026 14:28
… workflow

Align with repo standards (e.g. dependabot PR #3064, other workflows).
Manually pinned in the compiled lock file for consistency.

Assisted-by: GitHub Copilot (model: Claude Opus 4.8, autonomous)
Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
@BenBtg BenBtg marked this pull request as ready for review July 1, 2026 16:22
Copilot AI review requested due to automatic review settings July 1, 2026 16:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review details

  • Files reviewed: 1/2 changed files
  • Comments generated: 0 new
  • Review effort level: Low

@mnriem mnriem merged commit ac6eef4 into main Jul 1, 2026
14 checks passed
@mnriem

mnriem commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants