Add opt-in Intent discovery eval suite#173
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughAdds a new ChangesIntent Discovery Eval Suite
Sequence Diagram(s)sequenceDiagram
participant EvalTest
participant liveCopilotHarness
participant prepareFixtureWorkspace
participant applyIntentCondition
participant runCopilotTask
participant copilot-cli-adapter
rect rgba(100, 149, 237, 0.5)
note over EvalTest,liveCopilotHarness: Live Copilot Eval Run
EvalTest->>liveCopilotHarness: run(task, context)
liveCopilotHarness->>prepareFixtureWorkspace: fixture id
prepareFixtureWorkspace-->>liveCopilotHarness: workspacePath + cleanup()
liveCopilotHarness->>applyIntentCondition: condition + expectedSkillAreas + workspacePath
applyIntentCondition-->>liveCopilotHarness: AppliedIntentCondition (filesWritten)
liveCopilotHarness->>runCopilotTask: RunCopilotTaskInput
runCopilotTask->>copilot-cli-adapter: spawn with task env vars
copilot-cli-adapter-->>runCopilotTask: stdout transcript + TRANSCRIPT_PATH
runCopilotTask->>runCopilotTask: parseIntentCommand lines → tool call records
runCopilotTask->>runCopilotTask: write transcript to runs/latest/transcripts/
runCopilotTask->>runCopilotTask: collectFileDiff (source vs workspace)
runCopilotTask-->>liveCopilotHarness: CopilotTaskRun (messages, toolCalls, loadedSkills, diff)
liveCopilotHarness-->>EvalTest: HarnessRun with artifacts + traces
end
rect rgba(152, 251, 152, 0.5)
note over EvalTest,liveCopilotHarness: Grading
EvalTest->>EvalTest: strictIntentInvocation(run)
EvalTest->>EvalTest: correctSkillLoaded(run, expectedSkillAreas)
EvalTest->>EvalTest: classifyFailure(run, expectedSkillAreas)
EvalTest->>EvalTest: attachEvalMetadata(task, scores, run)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (16)
evals/intent-discovery/harness/prepare-fixture.ts-47-51 (1)
47-51:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake the copy filter path-separator safe.
Line 50 uses a hardcoded
/path fragment. On Windows this won’t match\, sorunsartifacts may be copied into workspaces.Suggested diff
import { basename, dirname, join } from 'node:path' +import { sep } from 'node:path' @@ cpSync(sourcePath, workspacePath, { recursive: true, verbatimSymlinks: true, - filter: (source) => !source.includes(`${fixturesDir}/runs/`), + filter: (source) => + !source.includes(`${fixturesDir}${sep}runs${sep}`), })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/harness/prepare-fixture.ts` around lines 47 - 51, The filter function in the cpSync call uses a hardcoded forward slash in the path pattern `${fixturesDir}/runs/`, which is not cross-platform compatible and will fail to match paths on Windows that use backslashes instead. Update the filter logic in the cpSync function to use a path-separator-safe approach, such as using path.sep from the path module or normalizing the path so that the pattern correctly excludes the runs directory on both Windows and Unix systems regardless of the operating system's path separator.evals/intent-discovery/harness/setup-intent-condition.ts-7-13 (1)
7-13:⚠️ Potential issue | 🟡 MinorReorder imports to satisfy
import/order.The type-only imports (lines 7–9) should come after the regular sibling imports (lines 10–13), as the
import/orderrule places thetypegroup last.Suggested diff
import { buildIntentSkillGuidanceBlock, buildIntentSkillsBlock, } from '../../../packages/intent/src/commands/install-writer.js' +import { + expectedSkillUseByArea, + packageAllowlistByArea, +} from '../corpus/skill-uses' import type { IntentDiscoveryCondition } from '../corpus/conditions' import type { ExpectedSkillArea } from '../corpus/tasks' import type { ScanResult } from '../../../packages/intent/src/types.js' -import { - expectedSkillUseByArea, - packageAllowlistByArea, -} from '../corpus/skill-uses'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/harness/setup-intent-condition.ts` around lines 7 - 13, The imports violate the `import/order` rule which requires type-only imports to come after regular sibling imports. Reorder the import statements so that the regular imports from '../corpus/skill-uses' containing expectedSkillUseByArea and packageAllowlistByArea appear first, followed by the type-only imports (type { IntentDiscoveryCondition }, type { ExpectedSkillArea }, and type { ScanResult }).Source: Linters/SAST tools
evals/intent-discovery/fixtures/saved-transcripts.ts-1-4 (1)
1-4:⚠️ Potential issue | 🟡 MinorFix import order to satisfy lint.
Line 3 violates the configured
import/orderrule. The parent value import must come before type imports.Suggested diff
-import type { NormalizedMessage, ToolCallRecord } from 'vitest-evals' -import type { IntentDiscoveryTask } from '../corpus/tasks' import { tasks } from '../corpus/tasks' +import type { NormalizedMessage, ToolCallRecord } from 'vitest-evals' +import type { IntentDiscoveryTask } from '../corpus/tasks'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/fixtures/saved-transcripts.ts` around lines 1 - 4, The import statements in the saved-transcripts.ts file violate the import/order linting rule. Currently, type imports from 'vitest-evals' and '../corpus/tasks' appear before the value import from '../corpus/tasks'. Reorder the imports so that the value import `import { tasks } from '../corpus/tasks'` comes before the type imports `import type { NormalizedMessage, ToolCallRecord } from 'vitest-evals'` and `import type { IntentDiscoveryTask } from '../corpus/tasks'`. This ensures parent value imports are placed before type imports as required by the lint configuration.Source: Linters/SAST tools
evals/intent-discovery/graders/correct-skill-loaded.ts-1-4 (1)
1-4:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winResolve import-order lint errors in this module.
The current import order violates
import/orderand can fail lint gates.Suggested patch
-import type { HarnessRun } from 'vitest-evals' -import type { ExpectedSkillArea } from '../corpus/tasks' import { loadedSkillUsesFromRun } from '../harness/parse-intent-commands' import { listIncludesExpectedSkillArea } from './skill-areas' +import type { HarnessRun } from 'vitest-evals' +import type { ExpectedSkillArea } from '../corpus/tasks'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/graders/correct-skill-loaded.ts` around lines 1 - 4, The imports in this module violate the import/order linting rule. Reorganize the four import statements by grouping external/third-party imports before relative imports, and within each group maintain alphabetical order. The type imports from vitest-evals and ../corpus/tasks should come first, followed by the regular imports from ../harness/parse-intent-commands and ./skill-areas. Ensure that all type imports are grouped together at the top, and then relative imports are ordered alphabetically (with .. paths before . paths if they exist in the same group).Source: Linters/SAST tools
evals/intent-discovery/graders/failure-classifier.ts-1-8 (1)
1-8:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdjust import order to match lint configuration.
Type imports on Line 1-5 should be moved after value imports to satisfy the configured
import/orderrule.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/graders/failure-classifier.ts` around lines 1 - 8, The import statements violate the configured import/order rule because type imports are positioned before value imports. Reorder the imports in the failure-classifier.ts file so that the value imports from './correct-skill-loaded', './reference-only', and './strict-invocation' appear first, followed by the type imports from 'vitest-evals' and '../corpus/tasks' (the imports of HarnessRun, ExpectedSkillArea, and IntentDiscoveryFailureClass).Source: Linters/SAST tools
evals/intent-discovery/graders/strict-invocation.ts-1-2 (1)
1-2:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix import ordering to satisfy configured ESLint rules.
Line 2 should be ordered before the type-only
vitest-evalsimport per the activeimport/orderrule.Proposed fix
-import type { HarnessRun } from 'vitest-evals' import { intentCommandsFromRun } from '../harness/parse-intent-commands' +import type { HarnessRun } from 'vitest-evals'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/graders/strict-invocation.ts` around lines 1 - 2, The import statement order does not comply with the configured ESLint import/order rule. Reorder the imports at the top of the strict-invocation.ts file so that the regular import from '../harness/parse-intent-commands' (the intentCommandsFromRun import) comes before the type-only import from 'vitest-evals' (the HarnessRun type import).Source: Linters/SAST tools
evals/intent-discovery/graders/eval-metadata.ts-36-37 (1)
36-37:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGuard against empty score arrays when computing
avgScore.If
scores.lengthis0, Line 37 yieldsNaN, which can pollute downstream report data.Proposed fix
- const avgScore = - scores.reduce((total, item) => total + (item.score ?? 0), 0) / scores.length + const avgScore = + scores.length === 0 + ? 0 + : scores.reduce((total, item) => total + (item.score ?? 0), 0) / + scores.length🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/graders/eval-metadata.ts` around lines 36 - 37, The avgScore calculation divides by scores.length without checking if the array is empty, which results in NaN when scores.length is 0. Add a guard condition before computing avgScore to check if the scores array is empty and return an appropriate default value (such as 0) in that case, otherwise proceed with the existing reduce operation and division.evals/intent-discovery/graders/reference-only.ts-1-4 (1)
1-4:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReorder imports to clear
import/ordererrors.Line 3 and Line 4 value imports should be placed before type-only imports from
vitest-evals.Proposed fix
-import type { HarnessRun } from 'vitest-evals' import type { ExpectedSkillArea } from '../corpus/tasks' import { jsonToSearchableText, textMatchesSkillArea } from './skill-areas' import { strictIntentInvocation } from './strict-invocation' +import type { HarnessRun } from 'vitest-evals'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/graders/reference-only.ts` around lines 1 - 4, The import statements in this file violate the import/order rule. Reorder the imports so that all value imports (the ones without the 'type' keyword) are placed before type-only imports. Specifically, move the value imports from './skill-areas' and './strict-invocation' to appear before the type imports from 'vitest-evals' and '../corpus/tasks'.Source: Linters/SAST tools
evals/intent-discovery/graders/eval-metadata.ts-1-2 (1)
1-2:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix import sort/order to pass configured ESLint rules.
Line 1-2 currently violate
sort-importsandimport/order.Proposed fix
-import type { HarnessRun, JudgeResult, JsonValue } from 'vitest-evals' import { toolCalls } from 'vitest-evals' +import type { HarnessRun, JsonValue, JudgeResult } from 'vitest-evals'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/graders/eval-metadata.ts` around lines 1 - 2, The imports from 'vitest-evals' on lines 1-2 are not in the correct order according to the configured ESLint rules for sort-imports and import/order. Reorganize the import statements so that type imports are properly separated from value imports and the imported names are in alphabetical order. Consider combining or reordering the imports from the same module ('vitest-evals') to ensure toolCalls and the type imports from HarnessRun, JudgeResult, JsonValue follow the ESLint import/order configuration.Source: Linters/SAST tools
evals/intent-discovery/harness/saved-transcript-harness.ts-94-99 (1)
94-99:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAvoid duplicating tool calls when assistant messages already include them.
Line 98 blindly concatenates
message.toolCallswithtoolCalls. If saved cases include both, this double-counts the same commands and can skew downstream grading signals.Suggested fix
return messages.map((message, index) => index === firstAssistantIndex ? { ...message, - toolCalls: [...(message.toolCalls ?? []), ...toolCalls], + toolCalls: (() => { + const existing = message.toolCalls ?? [] + const seen = new Set( + existing.map( + (call) => `${call.name}:${JSON.stringify(call.arguments ?? {})}`, + ), + ) + return [ + ...existing, + ...toolCalls.filter((call) => { + const key = `${call.name}:${JSON.stringify(call.arguments ?? {})}` + if (seen.has(key)) return false + seen.add(key) + return true + }), + ] + })(), } : message, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/harness/saved-transcript-harness.ts` around lines 94 - 99, The toolCalls concatenation in the map function at index === firstAssistantIndex is duplicating tool calls without checking if they already exist in message.toolCalls. Instead of blindly spreading both arrays together, filter the toolCalls array to exclude any tool calls that already exist in message.toolCalls before concatenating. This prevents double-counting the same commands in the merged tool calls array.evals/intent-discovery/intent-discovery.eval.ts-1-14 (1)
1-14:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winResolve import lint errors in the header block.
The current import block violates
import/order,sort-imports, andimport/consistent-type-specifier-style, which will keep lint red.Suggested fix
-import type { HarnessContext, HarnessRun } from 'vitest-evals' import { describe, expect, it } from 'vitest' import { failedSpans, toolCalls } from 'vitest-evals' import { countsTowardAutonomousScore } from './corpus/conditions' import { correctSkillLoaded } from './graders/correct-skill-loaded' import { attachEvalMetadata, score } from './graders/eval-metadata' import { classifyFailure } from './graders/failure-classifier' import { referenceOnly } from './graders/reference-only' import { strictIntentInvocation } from './graders/strict-invocation' import { savedTranscriptCases } from './fixtures/saved-transcripts' -import { - savedTranscriptHarness, - type IntentDiscoveryOutput, -} from './harness/saved-transcript-harness' +import { savedTranscriptHarness } from './harness/saved-transcript-harness' +import type { IntentDiscoveryOutput } from './harness/saved-transcript-harness' +import type { HarnessContext, HarnessRun } from 'vitest-evals'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/intent-discovery.eval.ts` around lines 1 - 14, Reorder and organize the imports in the header block to comply with linting rules. Separate the type imports (those using the `type` keyword) from regular imports, grouping all type imports together using consistent type specifier syntax. Within each group, sort the imports alphabetically and organize them by module source following the pattern of external packages first (like vitest and vitest-evals), then local paths (starting with dot notation like './corpus', './graders', './fixtures', './harness'). Ensure the import statement starting with `import type { HarnessContext, HarnessRun }` is grouped with other type imports and that regular imports like those from 'vitest' are kept separate and properly sorted.Source: Linters/SAST tools
evals/intent-discovery/harness-capture.eval.ts-1-6 (1)
1-6:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winConsolidate duplicated
node:fsimports and normalize import order.
node:fsis imported twice (Lines 1–2), and the current ordering also triggers lint in this block.Suggested fix
-import { existsSync, mkdirSync, readFileSync } from 'node:fs' -import { mkdtempSync, rmSync } from 'node:fs' +import { + existsSync, + mkdirSync, + mkdtempSync, + readFileSync, + rmSync, +} from 'node:fs' import { tmpdir } from 'node:os' import { join } from 'node:path' import { describe, expect, it } from 'vitest' -import type { ToolCallRecord } from 'vitest-evals' import { fixtures } from './corpus/fixtures' import { tasks } from './corpus/tasks' import { intentCommandsFromToolCalls, parseIntentCommand, } from './harness/parse-intent-commands' import { prepareFixtureWorkspace } from './harness/prepare-fixture' +import type { ToolCallRecord } from 'vitest-evals'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/harness-capture.eval.ts` around lines 1 - 6, Consolidate the two separate imports from 'node:fs' into a single import statement. Combine existsSync, mkdirSync, readFileSync from the first import with mkdtempSync and rmSync from the second import into one import statement from 'node:fs', and ensure the combined imports follow proper alphabetical or stylistic ordering to satisfy linting rules.Source: Linters/SAST tools
evals/intent-discovery/live-copilot-harness.eval.ts-1-20 (1)
1-20:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winConsolidate and normalize imports to clear current lint failures.
There are active import lint errors here (
import/no-duplicates,import/order,sort-imports, and type-specifier style), so this file will stay lint-red until the block is normalized.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/live-copilot-harness.eval.ts` around lines 1 - 20, Consolidate the duplicate imports from node:fs by combining the existsSync and writeFileSync imports with mkdtempSync and rmSync into a single import statement. Then organize all imports in the file according to the project's lint rules: group external dependencies (node: modules and npm packages) before local imports, and place type imports separately or according to the configured import order. Verify that the import ordering follows the patterns expected by import/order and sort-imports lint rules to clear the lint failures in this file.Source: Linters/SAST tools
evals/intent-discovery/fixture-corpus.eval.ts-6-6 (1)
6-6:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdjust import declaration style/order to satisfy ESLint.
This import currently violates configured
sort-importsandimport/consistent-type-specifier-stylerules.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/fixture-corpus.eval.ts` at line 6, The import statement for tasks and ExpectedSkillArea violates ESLint's sort-imports and import/consistent-type-specifier-style rules. Separate the type import from the value import by creating two distinct import statements: one using import type for ExpectedSkillArea and another for the tasks value import, ensuring they are ordered correctly according to the configured ESLint rules.Source: Linters/SAST tools
evals/intent-discovery/harness/live-copilot-harness.ts-1-10 (1)
1-10:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix import order to satisfy the configured ESLint rules.
Imports currently violate
import/order.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/harness/live-copilot-harness.ts` around lines 1 - 10, The imports in the file are not ordered according to the ESLint `import/order` rule. Reorganize the imports by grouping them in the following order: first place external/third-party package imports (like createHarness from vitest-evals), then type imports from relative paths, and finally regular imports from relative paths ordered from parent directories (..) to same directory imports (.). Make sure the type import statement for IntentDiscoveryTask is properly separated and grouped with other type imports if any exist.Source: Linters/SAST tools
evals/intent-discovery/harness/run-copilot-task.ts-1-11 (1)
1-11:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winResolve ESLint import-order violations in this module.
The current import block violates the configured
import/orderrule and will keep lint red.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/harness/run-copilot-task.ts` around lines 1 - 11, The imports in the module are not ordered according to the configured import/order ESLint rule. Reorganize the import statements by grouping them in the correct order: first group all type imports together (vitest-evals and ../corpus/tasks), then group all Node.js built-in imports (node:fs, node:path, node:url, node:child_process), then group third-party packages, and finally group relative local imports (./parse-intent-commands). Ensure type imports are clearly separated from regular imports and follow the rule's grouping expectations.Source: Linters/SAST tools
🧹 Nitpick comments (2)
evals/intent-discovery/fixtures/saved-transcripts.ts (1)
148-151: ⚡ Quick winNarrow
taskIdto the task-id union for compile-time safety.Line 149 currently accepts any string; typos are only caught at runtime. Use
IntentDiscoveryTask['id']to fail earlier.Suggested diff
function savedTranscript( - taskId: string, + taskId: IntentDiscoveryTask['id'], transcript: Omit<SavedTranscriptCase, keyof IntentDiscoveryTask>, ): SavedTranscriptCase {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/fixtures/saved-transcripts.ts` around lines 148 - 151, The taskId parameter in the savedTranscript function is currently typed as a generic string, which allows any string value and only catches type errors at runtime. Change the taskId parameter type from string to IntentDiscoveryTask['id'] to narrow it to the specific union of valid task IDs, providing compile-time safety and catching typos and invalid values earlier in the development process.evals/intent-discovery/condition-setup.eval.ts (1)
8-90: ⚡ Quick winAdd a case for the explicit Intent-control condition.
This suite validates three setup modes, but not the explicit Intent-control path from the condition matrix. That leaves a regression gap in the scenario this PR says it supports.
Suggested test addition
describe('Intent discovery condition setup', () => { + it('writes explicit Intent control guidance', () => { + const prepared = prepareInTemp() + + try { + const result = applyIntentCondition({ + condition: 'explicit-intent', + expectedSkillAreas: ['router'], + workspacePath: prepared.workspacePath, + }) + const agents = readFileSync( + join(prepared.workspacePath, 'AGENTS.md'), + 'utf8', + ) + + expect(result.filesWritten.length).toBeGreaterThan(0) + expect(agents).toContain('intent') + expect(agents).toContain('`@tanstack/router`#routing') + } finally { + prepared.cleanup() + } + }) + it('leaves no-intent workspaces without Intent guidance', () => {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/condition-setup.eval.ts` around lines 8 - 90, The test suite in the Intent discovery condition setup describe block currently covers three condition types (no-intent, current-intent, and mapped-intent) but is missing a test case for the explicit Intent-control condition. Add a new test case using the it() function that follows the same pattern as the existing tests: call prepareInTemp() to set up a workspace, invoke applyIntentCondition with the Intent-control condition and expectedSkillAreas, validate the expected outcomes (such as files written and content assertions), and ensure cleanup is called in the finally block. This will complete the regression test coverage for all supported condition modes mentioned in the PR.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@evals/intent-discovery/bin/llm-judge.mjs`:
- Around line 68-95: The judgeCase function's fetch call to the OpenAI API lacks
a timeout mechanism, which could cause the entire evaluation run to hang
indefinitely if a network request stalls. Add a 30-second timeout by creating an
AbortController instance before the fetch call, passing the controller's signal
to the fetch options, and setting a timeout that aborts the controller after 30
seconds. Wrap the fetch call in a try-catch block to handle AbortError
exceptions that occur when the timeout is triggered.
- Around line 104-110: The JSON.parse(content) call lacks error handling, which
can cause a single malformed response from the API to throw an exception and
abort the entire batch processing loop. Even though the fallback to '{}'
provides protection for missing content, the API could return valid JSON with
invalid content inside that would still cause JSON.parse to throw. Wrap the
JSON.parse(content) call in a try-catch block and return an error object
consistent with the existing error-handling pattern at lines 97-101 so that
malformed responses are handled gracefully without terminating remaining
judgments.
In `@evals/intent-discovery/fixtures/router-basic/package.json`:
- Around line 6-8: Replace the "latest" version specifiers for the dependencies
`@tanstack/react-router`, react, and react-dom with pinned version numbers (e.g.,
specific version numbers like "18.2.0" instead of "latest"). This ensures that
the eval fixture has deterministic dependency resolution across different
installations and repeated runs, preventing non-determinism in evaluation
outcomes.
In `@evals/intent-discovery/fixtures/start-basic/package.json`:
- Around line 6-9: The package.json for the intent-discovery eval fixture uses
floating version strings "latest" for `@tanstack/react-router`, react, and
react-dom, while `@tanstack/react-start` is pinned to 1.168.26. This inconsistency
causes non-deterministic builds that can vary between runs. Replace all "latest"
version strings in the dependencies section (for `@tanstack/react-router`, react,
and react-dom) with exact pinned version numbers to ensure reproducible eval
runs across all invocations.
In `@evals/intent-discovery/fixtures/table-v9-basic/package.json`:
- Around line 7-8: The react and react-dom dependencies in the package.json file
are currently set to "latest" which causes non-deterministic behavior in the
eval fixtures as versions can change between runs. Replace the "latest" version
specifiers for both the "react" and "react-dom" dependencies with specific
pinned version numbers (e.g., a version like "18.2.0") to ensure consistent and
deterministic fixture behavior across evaluation runs.
In `@evals/intent-discovery/graders/skill-areas.ts`:
- Line 7: The regex pattern `/v9/i` in the table-v9 array is too generic and
matches "v9" anywhere in text, causing false positives for unrelated content.
Remove this overly broad pattern or replace it with a more specific one that
includes context like "table" to ensure matches are specifically about TanStack
Table v9 and not other unrelated version references.
In `@evals/intent-discovery/harness/live-copilot-harness.ts`:
- Around line 23-38: Move the initialization code that calls
prepareFixtureWorkspace, applyIntentCondition, and setCommonArtifacts into the
try block before the existing try statement. Currently if either
prepareFixtureWorkspace or applyIntentCondition throws an error, the cleanup
handler in the finally block will never execute, causing workspace state to
leak. By moving these setup calls inside the try block, the corresponding
cleanup logic will properly execute in the finally block regardless of whether
an error occurs during setup or main execution.
In `@evals/intent-discovery/harness/parse-intent-commands.ts`:
- Around line 8-15: The command string union type in the parseIntentCommands
function or related parser is missing non-@latest variants for dlx and bunx
commands. Add additional command string entries to the union that include
versions without the `@latest` suffix for commands like bunx `@tanstack/intent` (in
addition to bunx `@tanstack/intent`@latest), yarn dlx `@tanstack/intent` (in
addition to yarn dlx `@tanstack/intent`@latest), and any other dlx/bunx variants
that currently only include the `@latest` version. This ensures the parser
correctly recognizes valid intent invocations using the default latest version
behavior without explicitly specifying `@latest`.
In `@evals/intent-discovery/harness/run-copilot-task.ts`:
- Around line 120-154: The runCommand function spawns a child process without
timeout protection, which can cause the Promise to hang indefinitely if the
process stalls. Add a configurable timeout mechanism (default 5 minutes) using
setTimeout that forcibly terminates the child process via child.kill() and
rejects the Promise if the timeout expires before the process naturally closes.
Implement a settled flag to prevent race conditions between the timeout firing
and the process exiting normally, ensuring that whichever completes first
controls the final resolution or rejection of the Promise.
---
Minor comments:
In `@evals/intent-discovery/fixture-corpus.eval.ts`:
- Line 6: The import statement for tasks and ExpectedSkillArea violates ESLint's
sort-imports and import/consistent-type-specifier-style rules. Separate the type
import from the value import by creating two distinct import statements: one
using import type for ExpectedSkillArea and another for the tasks value import,
ensuring they are ordered correctly according to the configured ESLint rules.
In `@evals/intent-discovery/fixtures/saved-transcripts.ts`:
- Around line 1-4: The import statements in the saved-transcripts.ts file
violate the import/order linting rule. Currently, type imports from
'vitest-evals' and '../corpus/tasks' appear before the value import from
'../corpus/tasks'. Reorder the imports so that the value import `import { tasks
} from '../corpus/tasks'` comes before the type imports `import type {
NormalizedMessage, ToolCallRecord } from 'vitest-evals'` and `import type {
IntentDiscoveryTask } from '../corpus/tasks'`. This ensures parent value imports
are placed before type imports as required by the lint configuration.
In `@evals/intent-discovery/graders/correct-skill-loaded.ts`:
- Around line 1-4: The imports in this module violate the import/order linting
rule. Reorganize the four import statements by grouping external/third-party
imports before relative imports, and within each group maintain alphabetical
order. The type imports from vitest-evals and ../corpus/tasks should come first,
followed by the regular imports from ../harness/parse-intent-commands and
./skill-areas. Ensure that all type imports are grouped together at the top, and
then relative imports are ordered alphabetically (with .. paths before . paths
if they exist in the same group).
In `@evals/intent-discovery/graders/eval-metadata.ts`:
- Around line 36-37: The avgScore calculation divides by scores.length without
checking if the array is empty, which results in NaN when scores.length is 0.
Add a guard condition before computing avgScore to check if the scores array is
empty and return an appropriate default value (such as 0) in that case,
otherwise proceed with the existing reduce operation and division.
- Around line 1-2: The imports from 'vitest-evals' on lines 1-2 are not in the
correct order according to the configured ESLint rules for sort-imports and
import/order. Reorganize the import statements so that type imports are properly
separated from value imports and the imported names are in alphabetical order.
Consider combining or reordering the imports from the same module
('vitest-evals') to ensure toolCalls and the type imports from HarnessRun,
JudgeResult, JsonValue follow the ESLint import/order configuration.
In `@evals/intent-discovery/graders/failure-classifier.ts`:
- Around line 1-8: The import statements violate the configured import/order
rule because type imports are positioned before value imports. Reorder the
imports in the failure-classifier.ts file so that the value imports from
'./correct-skill-loaded', './reference-only', and './strict-invocation' appear
first, followed by the type imports from 'vitest-evals' and '../corpus/tasks'
(the imports of HarnessRun, ExpectedSkillArea, and IntentDiscoveryFailureClass).
In `@evals/intent-discovery/graders/reference-only.ts`:
- Around line 1-4: The import statements in this file violate the import/order
rule. Reorder the imports so that all value imports (the ones without the 'type'
keyword) are placed before type-only imports. Specifically, move the value
imports from './skill-areas' and './strict-invocation' to appear before the type
imports from 'vitest-evals' and '../corpus/tasks'.
In `@evals/intent-discovery/graders/strict-invocation.ts`:
- Around line 1-2: The import statement order does not comply with the
configured ESLint import/order rule. Reorder the imports at the top of the
strict-invocation.ts file so that the regular import from
'../harness/parse-intent-commands' (the intentCommandsFromRun import) comes
before the type-only import from 'vitest-evals' (the HarnessRun type import).
In `@evals/intent-discovery/harness-capture.eval.ts`:
- Around line 1-6: Consolidate the two separate imports from 'node:fs' into a
single import statement. Combine existsSync, mkdirSync, readFileSync from the
first import with mkdtempSync and rmSync from the second import into one import
statement from 'node:fs', and ensure the combined imports follow proper
alphabetical or stylistic ordering to satisfy linting rules.
In `@evals/intent-discovery/harness/live-copilot-harness.ts`:
- Around line 1-10: The imports in the file are not ordered according to the
ESLint `import/order` rule. Reorganize the imports by grouping them in the
following order: first place external/third-party package imports (like
createHarness from vitest-evals), then type imports from relative paths, and
finally regular imports from relative paths ordered from parent directories (..)
to same directory imports (.). Make sure the type import statement for
IntentDiscoveryTask is properly separated and grouped with other type imports if
any exist.
In `@evals/intent-discovery/harness/prepare-fixture.ts`:
- Around line 47-51: The filter function in the cpSync call uses a hardcoded
forward slash in the path pattern `${fixturesDir}/runs/`, which is not
cross-platform compatible and will fail to match paths on Windows that use
backslashes instead. Update the filter logic in the cpSync function to use a
path-separator-safe approach, such as using path.sep from the path module or
normalizing the path so that the pattern correctly excludes the runs directory
on both Windows and Unix systems regardless of the operating system's path
separator.
In `@evals/intent-discovery/harness/run-copilot-task.ts`:
- Around line 1-11: The imports in the module are not ordered according to the
configured import/order ESLint rule. Reorganize the import statements by
grouping them in the correct order: first group all type imports together
(vitest-evals and ../corpus/tasks), then group all Node.js built-in imports
(node:fs, node:path, node:url, node:child_process), then group third-party
packages, and finally group relative local imports (./parse-intent-commands).
Ensure type imports are clearly separated from regular imports and follow the
rule's grouping expectations.
In `@evals/intent-discovery/harness/saved-transcript-harness.ts`:
- Around line 94-99: The toolCalls concatenation in the map function at index
=== firstAssistantIndex is duplicating tool calls without checking if they
already exist in message.toolCalls. Instead of blindly spreading both arrays
together, filter the toolCalls array to exclude any tool calls that already
exist in message.toolCalls before concatenating. This prevents double-counting
the same commands in the merged tool calls array.
In `@evals/intent-discovery/harness/setup-intent-condition.ts`:
- Around line 7-13: The imports violate the `import/order` rule which requires
type-only imports to come after regular sibling imports. Reorder the import
statements so that the regular imports from '../corpus/skill-uses' containing
expectedSkillUseByArea and packageAllowlistByArea appear first, followed by the
type-only imports (type { IntentDiscoveryCondition }, type { ExpectedSkillArea
}, and type { ScanResult }).
In `@evals/intent-discovery/intent-discovery.eval.ts`:
- Around line 1-14: Reorder and organize the imports in the header block to
comply with linting rules. Separate the type imports (those using the `type`
keyword) from regular imports, grouping all type imports together using
consistent type specifier syntax. Within each group, sort the imports
alphabetically and organize them by module source following the pattern of
external packages first (like vitest and vitest-evals), then local paths
(starting with dot notation like './corpus', './graders', './fixtures',
'./harness'). Ensure the import statement starting with `import type {
HarnessContext, HarnessRun }` is grouped with other type imports and that
regular imports like those from 'vitest' are kept separate and properly sorted.
In `@evals/intent-discovery/live-copilot-harness.eval.ts`:
- Around line 1-20: Consolidate the duplicate imports from node:fs by combining
the existsSync and writeFileSync imports with mkdtempSync and rmSync into a
single import statement. Then organize all imports in the file according to the
project's lint rules: group external dependencies (node: modules and npm
packages) before local imports, and place type imports separately or according
to the configured import order. Verify that the import ordering follows the
patterns expected by import/order and sort-imports lint rules to clear the lint
failures in this file.
---
Nitpick comments:
In `@evals/intent-discovery/condition-setup.eval.ts`:
- Around line 8-90: The test suite in the Intent discovery condition setup
describe block currently covers three condition types (no-intent,
current-intent, and mapped-intent) but is missing a test case for the explicit
Intent-control condition. Add a new test case using the it() function that
follows the same pattern as the existing tests: call prepareInTemp() to set up a
workspace, invoke applyIntentCondition with the Intent-control condition and
expectedSkillAreas, validate the expected outcomes (such as files written and
content assertions), and ensure cleanup is called in the finally block. This
will complete the regression test coverage for all supported condition modes
mentioned in the PR.
In `@evals/intent-discovery/fixtures/saved-transcripts.ts`:
- Around line 148-151: The taskId parameter in the savedTranscript function is
currently typed as a generic string, which allows any string value and only
catches type errors at runtime. Change the taskId parameter type from string to
IntentDiscoveryTask['id'] to narrow it to the specific union of valid task IDs,
providing compile-time safety and catching typos and invalid values earlier in
the development process.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 70517ebf-574c-4e5b-a40a-5b7b82a10c23
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (41)
.gitignoreV1-RFC.mdeslint.config.jsevals/intent-discovery/README.mdevals/intent-discovery/bin/copilot-cli-adapter.mjsevals/intent-discovery/bin/llm-judge.mjsevals/intent-discovery/bin/summarize-results.mjsevals/intent-discovery/condition-setup.eval.tsevals/intent-discovery/corpus/conditions.tsevals/intent-discovery/corpus/fixtures.tsevals/intent-discovery/corpus/live-tasks.tsevals/intent-discovery/corpus/skill-uses.tsevals/intent-discovery/corpus/tasks.tsevals/intent-discovery/fixture-corpus.eval.tsevals/intent-discovery/fixtures/router-basic/package.jsonevals/intent-discovery/fixtures/router-basic/src/routes/users.$userId.tsxevals/intent-discovery/fixtures/saved-transcripts.tsevals/intent-discovery/fixtures/start-basic/package.jsonevals/intent-discovery/fixtures/start-basic/src/routes/users.tsxevals/intent-discovery/fixtures/table-v9-basic/package.jsonevals/intent-discovery/fixtures/table-v9-basic/src/user-table.tsxevals/intent-discovery/graders/correct-skill-loaded.tsevals/intent-discovery/graders/eval-metadata.tsevals/intent-discovery/graders/failure-classifier.tsevals/intent-discovery/graders/reference-only.tsevals/intent-discovery/graders/skill-areas.tsevals/intent-discovery/graders/strict-invocation.tsevals/intent-discovery/harness-capture.eval.tsevals/intent-discovery/harness/live-copilot-harness.tsevals/intent-discovery/harness/parse-intent-commands.tsevals/intent-discovery/harness/prepare-fixture.tsevals/intent-discovery/harness/run-copilot-task.tsevals/intent-discovery/harness/saved-transcript-harness.tsevals/intent-discovery/harness/setup-intent-condition.tsevals/intent-discovery/intent-discovery.eval.tsevals/intent-discovery/live-copilot-harness.eval.tsevals/intent-discovery/runs/.gitkeepevals/intent-discovery/runs/latest/.gitkeepevals/intent-discovery/tsconfig.jsonevals/intent-discovery/vitest.evals.config.tspackage.json
- Introduced new README.md for intent discovery eval with usage instructions. - Added conditions and tasks for intent discovery evaluation. - Implemented grading functions for skill loading and failure classification. - Created harness for running saved transcripts in evaluations. - Configured Vitest for running intent discovery tests. - Updated package.json and pnpm-lock.yaml to include vitest-evals dependency. - Added .gitignore entries for evaluation runs and vitest artifacts.
…ure command, skill, transcript, and diff evidence
… intent discovery
…d skill package handling
… generation, update command parsing for various package managers
77c61ee to
818d211
Compare
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
|
View your CI Pipeline Execution ↗ for commit 818d211
☁️ Nx Cloud last updated this comment at |
|
View your CI Pipeline Execution ↗ for commit 818d211
☁️ Nx Cloud last updated this comment at |
commit: |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
evals/intent-discovery/harness/run-copilot-task.ts (1)
120-154:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd timeout guards for spawned processes to prevent hung eval runs.
Both subprocess paths can block forever if the child never exits, which can wedge the suite.
Suggested fix
+const COMMAND_TIMEOUT_MS = Number( + process.env.INTENT_DISCOVERY_COMMAND_TIMEOUT_MS ?? '300000', +) + async function runCommand({ command, input, }: { command: string input: RunCopilotTaskInput }): Promise<CommandResult> { return new Promise((resolve, reject) => { + let settled = false const child = spawn(command, { @@ }) + const timeout = setTimeout(() => { + if (settled) return + settled = true + child.kill('SIGKILL') + reject(new Error(`Copilot command timed out after ${COMMAND_TIMEOUT_MS}ms`)) + }, COMMAND_TIMEOUT_MS) @@ - child.on('error', reject) + child.on('error', (error) => { + if (settled) return + settled = true + clearTimeout(timeout) + reject(error) + }) child.on('close', (exitCode) => { + if (settled) return + settled = true + clearTimeout(timeout) resolve({ @@ async function runDiff( sourcePath: string, workspacePath: string, ): Promise<CommandResult> { return new Promise((resolve, reject) => { + let settled = false const child = spawn('diff', ['-ruN', sourcePath, workspacePath]) + const timeout = setTimeout(() => { + if (settled) return + settled = true + child.kill('SIGKILL') + reject(new Error(`diff timed out after ${COMMAND_TIMEOUT_MS}ms`)) + }, COMMAND_TIMEOUT_MS) @@ - child.on('error', reject) + child.on('error', (error) => { + if (settled) return + settled = true + clearTimeout(timeout) + reject(error) + }) child.on('close', (exitCode) => { + if (settled) return + settled = true + clearTimeout(timeout) resolve({Also applies to: 242-262
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/harness/run-copilot-task.ts` around lines 120 - 154, The runCommand function spawns a child process and waits indefinitely for it to close, which can cause eval runs to hang if the child never exits. Implement a timeout guard by wrapping the Promise with a timeout mechanism that rejects after a reasonable duration (e.g., 30 seconds). When the timeout is triggered, explicitly kill the child process using child.kill() to ensure proper cleanup and prevent resource leaks. Apply this same timeout pattern to the other subprocess path mentioned at lines 242-262 to ensure consistent behavior across all spawned processes.evals/intent-discovery/harness/live-copilot-harness.ts (1)
23-38:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove setup into the
tryso workspace cleanup is guaranteed.If setup throws before Line 38, Line 143 never executes and the prepared workspace can leak.
Suggested fix
run: async ({ input, setArtifact }) => { const runId = `live:${input.id}` const prepared = prepareFixtureWorkspace({ fixture: input.fixture }) - const appliedCondition = applyIntentCondition({ - condition: input.condition, - expectedSkillAreas: input.expectedSkillAreas, - workspacePath: prepared.workspacePath, - }) - - setCommonArtifacts({ - input, - runId, - setupFilesWritten: appliedCondition.filesWritten, - workspacePath: prepared.workspacePath, - setArtifact, - }) try { + const appliedCondition = applyIntentCondition({ + condition: input.condition, + expectedSkillAreas: input.expectedSkillAreas, + workspacePath: prepared.workspacePath, + }) + + setCommonArtifacts({ + input, + runId, + setupFilesWritten: appliedCondition.filesWritten, + workspacePath: prepared.workspacePath, + setArtifact, + }) + const run = await runCopilotTask({Also applies to: 142-144
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/intent-discovery/harness/live-copilot-harness.ts` around lines 23 - 38, Move the setup operations including the prepareFixtureWorkspace call, applyIntentCondition call, and setCommonArtifacts call into the try block so they are protected by the cleanup logic. Currently these operations execute before the try block starts, which means if any of them throw an error, the workspace cleanup code at line 143 will never execute and the prepared workspace will leak. Relocate these three setup function calls to be the first statements within the try block.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@evals/intent-discovery/fixture-corpus.eval.ts`:
- Around line 24-31: The test asserts fixture is defined on line 26, but lines
28-31 immediately dereference fixture properties (fixture.skillAreas and
fixture.id) without guarding against the case where fixture is undefined. If the
fixture lookup fails, the code will throw before reaching the intended assertion
message. Add a guard condition or early return after the first expect statement
to prevent the subsequent assertion block from executing if fixture is not
defined, ensuring the code does not attempt to access fixture.skillAreas or
fixture.id when fixture is null or undefined.
In `@evals/intent-discovery/graders/eval-metadata.ts`:
- Around line 36-37: The avgScore calculation divides by scores.length without
checking if the array is empty, which results in NaN when scores is an empty
array. Add a guard condition before calculating avgScore to check if
scores.length is greater than 0, and if not, set avgScore to a sensible default
value (such as 0). This ensures avgScore always has a defined numeric value
instead of NaN, preventing issues in downstream consumers that depend on
deterministic summary/report data.
In `@evals/intent-discovery/graders/reference-only.ts`:
- Around line 14-16: The `referenceOnly` grader is being biased by user prompts
because it includes all messages from the session when building the
transcriptText. Filter the run.session.messages array to only include messages
from the assistant (exclude user messages) before mapping them to searchable
text. This ensures the reference-only classification is based solely on what the
assistant said, not what the user asked, preventing false positives when users
mention skill areas in their prompts.
In `@evals/intent-discovery/harness-capture.eval.ts`:
- Around line 1-2: The file has duplicate imports from the same module node:fs
on consecutive lines. Consolidate the two import statements into a single import
that includes all the functions being imported: existsSync, mkdirSync,
readFileSync from the first import statement and mkdtempSync, rmSync from the
second import statement. Replace both lines with one combined import statement
from node:fs containing all five functions.
In `@evals/intent-discovery/live-copilot-harness.eval.ts`:
- Around line 1-2: The import statements for the node:fs module are split across
two lines, importing existsSync and writeFileSync on the first line, and
mkdtempSync and rmSync on the second line. Consolidate these into a single
import statement by combining all four functions (existsSync, writeFileSync,
mkdtempSync, rmSync) into one import from node:fs to satisfy linting rules and
maintain consistency.
---
Duplicate comments:
In `@evals/intent-discovery/harness/live-copilot-harness.ts`:
- Around line 23-38: Move the setup operations including the
prepareFixtureWorkspace call, applyIntentCondition call, and setCommonArtifacts
call into the try block so they are protected by the cleanup logic. Currently
these operations execute before the try block starts, which means if any of them
throw an error, the workspace cleanup code at line 143 will never execute and
the prepared workspace will leak. Relocate these three setup function calls to
be the first statements within the try block.
In `@evals/intent-discovery/harness/run-copilot-task.ts`:
- Around line 120-154: The runCommand function spawns a child process and waits
indefinitely for it to close, which can cause eval runs to hang if the child
never exits. Implement a timeout guard by wrapping the Promise with a timeout
mechanism that rejects after a reasonable duration (e.g., 30 seconds). When the
timeout is triggered, explicitly kill the child process using child.kill() to
ensure proper cleanup and prevent resource leaks. Apply this same timeout
pattern to the other subprocess path mentioned at lines 242-262 to ensure
consistent behavior across all spawned processes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b0ceebc7-8ab2-4d40-aefe-b8724c418e8e
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (40)
.gitignoreeslint.config.jsevals/intent-discovery/README.mdevals/intent-discovery/bin/copilot-cli-adapter.mjsevals/intent-discovery/bin/llm-judge.mjsevals/intent-discovery/bin/summarize-results.mjsevals/intent-discovery/condition-setup.eval.tsevals/intent-discovery/corpus/conditions.tsevals/intent-discovery/corpus/fixtures.tsevals/intent-discovery/corpus/live-tasks.tsevals/intent-discovery/corpus/skill-uses.tsevals/intent-discovery/corpus/tasks.tsevals/intent-discovery/fixture-corpus.eval.tsevals/intent-discovery/fixtures/router-basic/package.jsonevals/intent-discovery/fixtures/router-basic/src/routes/users.$userId.tsxevals/intent-discovery/fixtures/saved-transcripts.tsevals/intent-discovery/fixtures/start-basic/package.jsonevals/intent-discovery/fixtures/start-basic/src/routes/users.tsxevals/intent-discovery/fixtures/table-v9-basic/package.jsonevals/intent-discovery/fixtures/table-v9-basic/src/user-table.tsxevals/intent-discovery/graders/correct-skill-loaded.tsevals/intent-discovery/graders/eval-metadata.tsevals/intent-discovery/graders/failure-classifier.tsevals/intent-discovery/graders/reference-only.tsevals/intent-discovery/graders/skill-areas.tsevals/intent-discovery/graders/strict-invocation.tsevals/intent-discovery/harness-capture.eval.tsevals/intent-discovery/harness/live-copilot-harness.tsevals/intent-discovery/harness/parse-intent-commands.tsevals/intent-discovery/harness/prepare-fixture.tsevals/intent-discovery/harness/run-copilot-task.tsevals/intent-discovery/harness/saved-transcript-harness.tsevals/intent-discovery/harness/setup-intent-condition.tsevals/intent-discovery/intent-discovery.eval.tsevals/intent-discovery/live-copilot-harness.eval.tsevals/intent-discovery/runs/.gitkeepevals/intent-discovery/runs/latest/.gitkeepevals/intent-discovery/tsconfig.jsonevals/intent-discovery/vitest.evals.config.tspackage.json
✅ Files skipped from review due to trivial changes (8)
- evals/intent-discovery/fixtures/table-v9-basic/package.json
- eslint.config.js
- evals/intent-discovery/corpus/live-tasks.ts
- evals/intent-discovery/fixtures/start-basic/package.json
- evals/intent-discovery/fixtures/router-basic/package.json
- evals/intent-discovery/tsconfig.json
- evals/intent-discovery/README.md
- .gitignore
🚧 Files skipped from review as they are similar to previous changes (17)
- evals/intent-discovery/fixtures/router-basic/src/routes/users.$userId.tsx
- evals/intent-discovery/vitest.evals.config.ts
- evals/intent-discovery/fixtures/table-v9-basic/src/user-table.tsx
- evals/intent-discovery/corpus/fixtures.ts
- evals/intent-discovery/corpus/tasks.ts
- evals/intent-discovery/corpus/conditions.ts
- evals/intent-discovery/harness/saved-transcript-harness.ts
- evals/intent-discovery/graders/skill-areas.ts
- evals/intent-discovery/fixtures/start-basic/src/routes/users.tsx
- evals/intent-discovery/bin/copilot-cli-adapter.mjs
- evals/intent-discovery/corpus/skill-uses.ts
- package.json
- evals/intent-discovery/condition-setup.eval.ts
- evals/intent-discovery/harness/prepare-fixture.ts
- evals/intent-discovery/bin/llm-judge.mjs
- evals/intent-discovery/bin/summarize-results.mjs
- evals/intent-discovery/harness/parse-intent-commands.ts
…e, improve error handling, and update fixture dependencies
…ernal types, update fixture handling, and enhance reference-only logic
Adds an opt-in eval suite for measuring whether Copilot discovers and invokes Intent during normal coding tasks.
This PR introduces:
eval:intent-discoverycommand surface, separate from normal test/build/CI gates.INTENT_DISCOVERY_RUN_COUNT.vitest-evals.The suite is intentionally opt-in and report-oriented. It does not run as part of package correctness, release readiness, or normal CI.
Summary by CodeRabbit