feat(coverage): static coverage ingestion (Istanbul + LCOV) — impl of PR #56 plan#57
Conversation
First tracer of the static coverage ingestion plan (docs/plans/coverage-ingestion.md). Pure schema change, no engine yet. What lands: - New `coverage` table with natural-key PK (file_path, name, line_start) per D6. Deliberately NOT a FK to symbols.id: dropAll() drops symbols on every --full reindex and the recreated rows get fresh AUTOINCREMENT ids; CASCADE would wipe coverage on every full rebuild. Natural key sidesteps the entire FK/CASCADE hazard. Orphan cleanup (file deleted from project) lives at the end of every ingest in tracer 2a — exercised by the new test using a raw DELETE so the contract is locked in before the engine code lands. - `idx_coverage_file_name` mirrors the typical join shape (the killer recipe joins symbols ↔ coverage on file_path/name/line_start) and the (file_path, name) prefix also covers the GROUP BY file_path scan used by the bundled files-by-coverage recipe (D2 + D13). - `coverage` table is intentionally absent from `dropAll()` (joins the query_baselines precedent), so user rows survive --full and SCHEMA_VERSION-mismatch rebuilds. - SCHEMA_VERSION bumped 5→6. Existing installs auto-rebuild on next `codemap` run via the schema-mismatch path in createSchema(); the new table is empty until first `codemap ingest-coverage` invocation (tracer 3), which is the correct initial state per D12. - New db.test.ts case exercises the full contract: round-trip of a partial-coverage row set including the total_statements=0 → NULL edge (D5), survival across dropAll() + createTables(), and the orphan- cleanup DELETE that ships in tracer 2a's engine.
Pure engine + format-agnostic write path. No CLI / FS side effects beyond
db.run on the caller-supplied SQLite handle.
`upsertCoverageRows({db, projectRoot, rows, format, sourcePath})` —
shared core consumed by both Istanbul (this commit) and LCOV (next):
- Normalises absolute paths via toProjectRelative (D8); files outside
the project root land in skipped.unmatched_files.
- Loads symbols per-file once and projects each statement onto the
innermost enclosing symbol via JS-side (line_end - line_start) ASC
tie-break (D7) — avoids the per-statement SQL round-trip the plan
flagged as the hot path.
- Aggregates per (file_path, name, line_start) bucket; total = 0 →
coverage_pct NULL (D5 edge — "untested" ≠ "no testable code").
- Single transaction: per-file DELETE + bulk INSERT for idempotent
re-ingest, then orphan-cleanup DELETE (D6 — no FK / CASCADE means
deleted-file rows accumulate without this sweep), then writes the
three coverage_last_ingested_* meta keys.
`ingestIstanbul({db, projectRoot, payload, sourcePath})` — Istanbul
parser front-end:
- Subset-typed `IstanbulPayload` reads only statementMap + s; ignores
fnMap / branchMap / inputSourceMap so the format can grow without
churning the signature.
- Inner `path` field takes precedence over the absolute-path key
(handles webpack-style symlinked paths).
- Tolerates malformed file entries (missing statementMap or s) — skips
silently rather than throwing, so one corrupt file doesn't poison
the whole ingest.
10 unit tests cover both pieces:
- Shared core: innermost-wins aggregation, statement-outside-symbol
observability, zero-statement-symbol → no row, UPSERT idempotence,
orphan cleanup after file deletion, project-root normalisation,
unmatched-file skipping, meta-key writes.
- Istanbul parser: real-shape payload end-to-end, malformed-entry
tolerance.
Closes the second format axis from the plan — LCOV (`lcov.info`) ingester
that shares the entire write path with the Istanbul ingester from Tracer 2a.
`ingestLcov({db, projectRoot, payload, sourcePath})` is a pure regex
tokenizer over the LCOV record format. Recognised lines:
- `SF:<path>` — start of a file record (sets the "current file")
- `DA:<line>,<exec_count>[,<checksum>]` — one statement per record
- `end_of_record` — closes the current file record
- Comments (`# …`), blank lines, and CRLF endings tolerated
Ignored lines (everything else: `TN:`, `FN:`, `FNDA:`, `FNF:`, `FNH:`,
`BRDA:`, `BRF:`, `BRH:`, `LF:`, `LH:`) — D5 scope is statement coverage
only; the unused records are skipped without churning the parser.
Throws on `DA:` outside an `SF:` block (malformed LCOV — the file would
have nowhere to attach to). Missing `end_of_record` is tolerated by
construction (next `SF:` resets the current file).
Five unit tests cover both the LCOV parser in isolation and the
cross-format equivalence contract:
- `ingestLcov`: well-formed LCOV with multiple SF records →
identical-shape coverage rows; CRLF + comments + blank-line tolerance;
DA-outside-SF throws; optional `DA:` checksum field parsed.
- Cross-format: identical Istanbul + LCOV inputs produce byte-identical
rows in the `coverage` table — the contract that lets Tracer 4 ship
one set of golden snapshots that asserts both formats land on the
same answer.
15 tests total in coverage-engine.test.ts (10 existing + 5 LCOV).
Zero new write-side code — the LCOV parser is pure parse-and-normalise
into the same `CoverageRow[]` the Istanbul parser produces, then both
hand off to `upsertCoverageRows`.
Wires the engine (Tracer 2a/2b) into the CLI surface end-to-end. Smoke-
tested against a real Istanbul payload pointed at src/db.ts:
ingest writes 2 symbols (openDb 100%, closeDb 0%); query --json reads
them back with the expected coverage_pct shape.
cli/cmd-ingest-coverage.ts:
- Single positional arg + --json. No --source flag (per plan: format
is auto-detectable from extension; flag noise dropped).
- resolveArtifact() probes the path: .json → istanbul, .info → lcov,
directory → looks for both filenames, errors if both or neither
present (no precedence guessing — explicit is better than implicit).
- File reads use the canonical Node-vs-Bun split: Bun.file().json() /
.text() on Bun (native parser perf for multi-MB Istanbul payloads),
readFile + JSON.parse on Node. Mirrors config.ts. See packaging.md.
- Plain-text terminal output by default; --json emits the IngestResult
envelope. Errors emit {"error":"…"} on stdout under --json, plain
message on stderr otherwise. Sets process.exitCode (no process.exit)
so piped stdout isn't truncated.
cli/main.ts: dispatch new verb between snippet and query (lazy import
matches every other cmd).
cli/bootstrap.ts: validateIndexModeArgs() recognises ingest-coverage as
a known subcommand; printCliUsage() lists it in a new "Coverage ingest"
section.
7 parser tests cover all paths: requires the subcommand position,
help on --help/-h, single-path default, --json flag, missing-path
error, unknown-option error, multiple-paths error.
Engine + CLI smoke verified locally:
$ bun src/index.ts ingest-coverage /tmp/cov.json --json
{"ingested":{"symbols":2,"files":1},"skipped":...,"format":"istanbul"}
…acer 4)
Closes the agent-surface contract from D13: every common coverage
question gets a `--recipe` verb. Three v1 recipes ship under
templates/recipes/, auto-discovered by the existing recipes-loader
(`bun src/index.ts query --recipes-json` lists 15 recipes total — was 12).
Bundled recipes:
- `untested-and-dead.{sql,md}` — the killer recipe. Exported functions
with no callers AND zero coverage. Recipe MD documents the v1
name-collision lossiness (D11) and three concrete narrowing patterns
agents can apply: scope by file_path prefix, exclude default exports
(Next.js entry points), restrict to public visibility.
- `files-by-coverage.{sql,md}` — files ranked ascending by statement
coverage. Replaces the deferred `file_coverage` rollup table (D2):
GROUP BY on the symbol-level table is index-bounded by
`idx_coverage_file_name`. NULL coverage_pct (zero-statement files)
sorted last to avoid drowning out actual zero-coverage files.
- `worst-covered-exports.{sql,md}` — top-20 worst-covered exported
functions (LIMIT in the SQL, not configurable in v1 — config-driven
LIMIT defers until a real consumer asks).
Each recipe ships a frontmatter `actions` block (per PR #26) so agents
get a per-row follow-up hint in `--json` output.
Fixture data:
- fixtures/minimal/coverage/coverage-final.json (Istanbul) and
fixtures/minimal/coverage/lcov.info (LCOV) — equivalent partial
coverage shape covering 10 fixture symbols across 6 files. Both
formats use project-relative paths so they work without sed.
Golden runner extension (scripts/query-golden.ts + schema.ts):
- New `setup` array in scenarios.json runs one-time setup steps after
`cm.index()` and before scenarios. Today: `{kind: "ingest-coverage",
path: "..."}` — engine auto-detects format by extension.
- Schema is backward-compatible: parser accepts either the legacy
flat array OR the new `{setup?, scenarios}` object via z.union.
- Setup dispatch reuses the engine functions directly (ingestIstanbul,
ingestLcov) — same write path the CLI verb uses.
5 new golden snapshots prove the surface end-to-end on the fixture
corpus:
- coverage-rows-after-ingest.json: raw `coverage` table contents
- untested-and-dead.json: 6 dead+untested symbols (incl. legacyClient
+ epochMs, both @deprecated)
- files-by-coverage.json: 6 files ranked from 0% (api/client.ts) to
100% (ProductCard, usePermissions)
- worst-covered-exports.json: top exported functions with coverage_pct
- (LCOV cross-format equivalence proven by engine unit test, not
duplicated in goldens — single setup step per scenarios.json keeps
the runner simple)
fixtures/minimal/README.md: new "What's exercised" row + Use section
with `bun src/index.ts ingest-coverage` worked examples.
All 24 golden scenarios pass. Engine unit tests still 15 pass.
… changeset (Tracer 5) Closes the plan execution per docs/README.md Rule 10 (agent rule + skill lockstep) and Rule 9 (new domain nouns → glossary same-PR). docs/architecture.md: - application/ list: add coverage-engine.ts (upsertCoverageRows core + ingestIstanbul / ingestLcov parsers). - New § coverage table under Schema with the natural-key PK rationale cross-referencing query_baselines as the dropAll() exclusion precedent. docs/glossary.md: - New `coverage` table entry: PK shape, NULL semantics, orphan cleanup. - New `codemap ingest-coverage` / Istanbul JSON / LCOV / static coverage ingestion entry: format auto-detection, innermost-wins projection rationale (D7), three bundled recipes, no-half-way principle. .agents/rules/codemap.md + templates/agents/rules/codemap.md (lockstep): - Index intro mentions coverage as part of indexed structure. - New CLI table row for `ingest-coverage`. - Four new trigger-pattern rows: "Is symbol X tested?" → coverage table; "What's structurally dead AND untested?" → --recipe untested-and-dead; "Rank files by test coverage" → --recipe files-by-coverage; "Worst-covered exported functions" → --recipe worst-covered-exports. - Two new quick-reference query rows: symbol coverage + bundled recipe. - Drift between .agents/ and templates/agents/ stays CLI-prefix-only (`bun src/index.ts` vs `codemap`). .agents/skills/codemap/SKILL.md + templates/agents/skills/codemap/SKILL.md (lockstep): new `coverage` table block with full column schema + the three meta keys. docs/research/fallow.md: - C.11 row marked Shipped (link to plan PR + this commit). - Open question "column on symbols vs separate table?" resolved to "separate table" with D1 + D6 cross-reference. docs/roadmap.md: drop the "Static coverage ingestion" backlog row (per Rule 2 — when a backlog item ships, it leaves the roadmap). docs/plans/coverage-ingestion.md: deleted per Rule 3 (plan files have the lifetime of the work; absorbed into architecture / glossary / agent rule on ship). .changeset/coverage-ingestion.md: minor changeset (per .agents/lessons.md "changesets bump policy" — new tables + SCHEMA_VERSION bump = minor). All 24 golden scenarios pass (5 new — coverage-rows-after-ingest, untested-and-dead, files-by-coverage, worst-covered-exports — plus the existing 19). 737 tests across 43 files, 0 fail.
🦋 Changeset detectedLatest commit: 58ae24d The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThe PR implements static coverage ingestion for Codemap by adding a ChangesCoverage Ingestion Feature
Sequence DiagramsequenceDiagram
actor User
participant CLI as codemap CLI
participant Detector as Format Detector
participant Parser as Istanbul/LCOV Parser
participant Engine as Ingestion Engine
participant DB as SQLite DB
participant Recipe as Query Recipe
User->>CLI: ingest-coverage ./coverage-final.json
CLI->>Detector: Resolve artifact path & format
Detector-->>CLI: { format: "istanbul", absPath }
CLI->>CLI: Read JSON payload
CLI->>Parser: ingestIstanbul(payload)
Parser->>Engine: upsertCoverageRows([statementMap + hits])
Engine->>Engine: Normalize file paths to project-relative
Engine->>Engine: Map statements to innermost symbol
Engine->>Engine: Aggregate hit/total per (file, name, line_start)
Engine->>DB: BEGIN transaction
Engine->>DB: DELETE coverage WHERE file_path = ?
Engine->>DB: INSERT INTO coverage (file_path, name, ..., coverage_pct)
Engine->>DB: DELETE orphaned coverage rows
Engine->>DB: UPDATE meta keys (last_ingested_at, format, path)
Engine->>DB: COMMIT
DB-->>Engine: IngestResult { ingested, skipped, pruned }
Engine-->>CLI: Result with counts
CLI-->>User: Terminal summary or --json output
User->>CLI: query --recipe untested-and-dead --json
CLI->>Recipe: SELECT ... FROM symbols LEFT JOIN coverage ...
DB-->>Recipe: Rows with coverage_pct=0 & no callers
Recipe-->>User: JSON array of dead/untested exports
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 34 minutes and 38 seconds.Comment |
CI Build (Node, better-sqlite3) failed with "RangeError: The supplied SQL
string contains no statements" on `node dist/index.mjs --full`. Root
cause is the .agents/lessons.md lesson on naive `;` splitting: the new
coverage index comment contained `symbols.{file_path,name,line_start};`,
and runSql() splits multi-statement strings on `;` for better-sqlite3
(one statement per prepare). The trailing semicolon inside the `--`
comment created an empty fragment that better-sqlite3 rejects.
Reworded the comment to use parentheses + a period — same intent, no
semicolon. Verified locally via `bun run build && node dist/index.mjs
--full` against /tmp project: full index now completes on Node.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/architecture.md (1)
182-182:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSchema version not updated — still says
5, should be6after this PR.The PR objectives state "bump SCHEMA_VERSION 5→6" in
db.ts, but line 182 was not updated to match.📝 Proposed fix
-Current schema version: **5** — see [Schema Versioning](`#schema-versioning`) for details. +Current schema version: **6** — see [Schema Versioning](`#schema-versioning`) for details.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/architecture.md` at line 182, Update the human-readable schema version in docs/architecture.md from "Current schema version: **5**" to "**6**" so it matches the SCHEMA_VERSION constant you bumped in db.ts; search for the literal "Current schema version" or the bolded "5" and replace it with "6" to keep docs and the SCHEMA_VERSION value in sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/architecture.md`:
- Around line 19-25: The markdown link fragment [§ Static coverage
ingestion](`#static-coverage-ingestion`) is broken; update the reference in the
architecture table (the parenthetical after `coverage-engine.ts`) to point to
the actual heading for the coverage section (the heading titled "### `coverage`
— Statement coverage…" or remove the parenthetical entirely). Replace the broken
fragment with the correct slug that matches that heading's anchor (or delete the
parenthetical), so the `coverage-engine.ts` / `ingestIstanbul` / `ingestLcov`
reference links to the real section.
In `@fixtures/minimal/README.md`:
- Line 24: The README line incorrectly implies the bundled coverage recipes
specifically target `@deprecated` symbols; update the description to state that
the bundled recipes (`untested-and-dead`, `files-by-coverage`,
`worst-covered-exports`) operate on exported functions in general and do not
focus on deprecation tags, and note that `legacyClient` only happens to be
`@deprecated` while other exported symbols in the golden snapshot
(`FormatPrice`, `run`, `_epochSeconds`, `nanoseconds`, `_hiResEpoch`) carry
different tags or none; adjust the table cell text to reflect this neutral
behavior toward tags and remove the phrase "exercise the join axis against
`@deprecated` symbols."
In `@templates/recipes/untested-and-dead.md`:
- Line 21: The WHERE clause filtering for symbol kind/visibility is missing
parentheses so the OR binds incorrectly; update the condition that follows the
existing "kind = 'function'" filter to group the OR expression with parentheses
and ensure it is combined with the other predicates, e.g. change the fragment
that references s.visibility (currently "AND s.visibility IS NULL OR
s.visibility = 'public'") to use a grouped expression like "AND (s.visibility IS
NULL OR s.visibility = 'public')" so public/internal checks are applied as
intended.
---
Outside diff comments:
In `@docs/architecture.md`:
- Line 182: Update the human-readable schema version in docs/architecture.md
from "Current schema version: **5**" to "**6**" so it matches the SCHEMA_VERSION
constant you bumped in db.ts; search for the literal "Current schema version" or
the bolded "5" and replace it with "6" to keep docs and the SCHEMA_VERSION value
in sync.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3ef0db9b-5143-4c36-8a3a-87e152810b2d
📒 Files selected for processing (35)
.agents/rules/codemap.md.agents/skills/codemap/SKILL.md.changeset/coverage-ingestion.mddocs/architecture.mddocs/glossary.mddocs/plans/coverage-ingestion.mddocs/research/fallow.mddocs/roadmap.mdfixtures/golden/minimal/coverage-rows-after-ingest.jsonfixtures/golden/minimal/files-by-coverage.jsonfixtures/golden/minimal/files-hashes.jsonfixtures/golden/minimal/untested-and-dead.jsonfixtures/golden/minimal/worst-covered-exports.jsonfixtures/golden/scenarios.jsonfixtures/minimal/README.mdfixtures/minimal/coverage/coverage-final.jsonfixtures/minimal/coverage/lcov.infoscripts/query-golden.tsscripts/query-golden/schema.tssrc/application/coverage-engine.test.tssrc/application/coverage-engine.tssrc/cli/bootstrap.tssrc/cli/cmd-ingest-coverage.test.tssrc/cli/cmd-ingest-coverage.tssrc/cli/main.tssrc/db.test.tssrc/db.tstemplates/agents/rules/codemap.mdtemplates/agents/skills/codemap/SKILL.mdtemplates/recipes/files-by-coverage.mdtemplates/recipes/files-by-coverage.sqltemplates/recipes/untested-and-dead.mdtemplates/recipes/untested-and-dead.sqltemplates/recipes/worst-covered-exports.mdtemplates/recipes/worst-covered-exports.sql
💤 Files with no reviewable changes (2)
- docs/roadmap.md
- docs/plans/coverage-ingestion.md
All three CodeRabbit comments fact-check as ✅ Correct. Apply each: 1. **architecture.md:25 — broken anchor** `#static-coverage-ingestion` doesn't exist (the section heading is `### coverage — Statement coverage…`, not "Static coverage ingestion"). Fix: drop the parenthetical anchor link, replace with "schema in § Schema → coverage" pointing at the existing § Schema anchor that does exist. 2. **fixtures/minimal/README.md:24 — overstated deprecation focus.** Said "exercise the join axis against `@deprecated` symbols" but only 1 of 6 golden rows is `@deprecated` (legacyClient); the others (`FormatPrice`, `run`, `_epochSeconds`, `nanoseconds`, `_hiResEpoch`) carry `@internal`/`@alpha`/`@private`/no tag. Reword to "across exported functions of every visibility tag" — accurate and tag-neutral. 3. **untested-and-dead.md:21 — SQL precedence bug in narrowing pattern.** `AND s.visibility IS NULL OR s.visibility = 'public'` parses as `(... AND s.visibility IS NULL) OR (s.visibility = 'public')` per SQL precedence (AND binds tighter than OR). Every row with `visibility = 'public'` would bypass every other WHERE predicate — the agent following this pattern would get a much larger result set than intended. Fix: wrap in parentheses + explicit comment that the parens are load-bearing so future edits don't drop them. files-hashes.json refreshed for the README touch.
CodeRabbit outside-diff comment on PR #57 (architecture.md:182) — caught that I bumped SCHEMA_VERSION 5 → 6 in db.ts (Tracer 1) but left the human-readable callout in architecture.md unchanged. Now in sync. Per `docs/README.md` Rule 6, "schema version" is explicitly listed as a decision value (not inventory) so the hardcoded number is fine — it just needs to track the constant.
…edoc PR #57's body shipped with literal backslash-backtick artifacts everywhere because the heredoc-into-`gh pr create --body` path shell-escaped every backtick. Add it to .agents/lessons.md alongside the existing template- literal backtick lesson — same root cause family (backticks + nested quoting), different surface (gh CLI vs TS template strings). Pattern: Write body → temp file → `gh pr <verb> --body-file <path>` → delete. One extra tool call, zero rendering surprises.
Summary
Implements PR #56's plan —
codemap ingest-coverage <path>reads Istanbul JSON or LCOV into a newcoveragetable joinable tosymbols. Six tracer-bullet commits, each shippable on its own.Killer recipe (now bundled)
What landed (per plan)
coveragetable DDL +SCHEMA_VERSION5→6 +dropAll()exclusion + orphan-sweep testupsertCoverageRowscore + Istanbul parser (10 unit tests; innermost-wins projection per D7)SF:/DA:/end_of_recordregex tokenizer; 5 unit tests; cross-format equivalence proven)codemap ingest-coverage <path> [--json](no--sourceflag — auto-detected by extension)untested-and-dead,files-by-coverage,worst-covered-exports) + 5 golden snapshots + scenarios.json setup hook"No half-way APIs" principle (per plan v3 + v4)
bun test --coverageuser not waiting)--sourceflag — format auto-detected (no API noise).md(D11)coverage_pct = NULLfor zero-statement scopes (untested ≠ no testable code)Test plan
bun test— 737 tests across 43 files, 0 fail (15 new incoverage-engine, 7 incmd-ingest-coverage, 1 indbschema)bun scripts/query-golden.ts— all 24 scenarios pass (5 new)bun run typecheck— 0 errorsbun run lint— 0 warnings, 0 errorssrc/db.ts→ ingest writes 2 symbols, query returns expectedcoverage_pctshapea342d48,277dcff)Workflow notes
CodeRabbit's outside-diff detection caught one cross-file consistency bug a single-file inline review couldn't surface (
docs/architecture.md:182schema-version mention left at5after thedb.tsconstant bumped to6). All other plan / impl decisions held under review.