Surface untested source files at 0% coverage in reports#9
Conversation
Introduces a reusable coverage toolchain based on llvm-cov: - coverage/merger.py: per-test profraw -> profdata + object file packaging - coverage/reporter.py: cross-test aggregation, HTML/LCOV/text reports - coverage/effective_coverage.py: justification overlay + effective metrics - coverage/justify.py: justification manifest resolution - coverage/defs.bzl: score_coverage_reporter macro for consumer wiring - coverage/coverage.bazelrc: shared coverage flags - coverage/filter_regexes.txt: baseline source exclusions - coverage/generate_coverage_html.sh: convenience entry point Adds an end-to-end example under tests/coverage exercising the pipeline with a small instrumented library, test, justification file, and consumer filter regexes. Known limitation: source files not linked into any cc_test are not yet included in the report (no instrumented object file -> invisible to llvm-cov).
llvm-cov only reports files linked into at least one test binary.
Sources that exist in the workspace but no test pulls in silently
disappear from the report, causing coverage to appear higher than
it actually is.
Three mechanisms work together to fix this:
Bazel aspect + manifest rule (coverage/defs.bzl):
_collect_sources_aspect walks the dependency graph of all configured
targets and collects C/C++ source files. score_instrumented_sources_manifest
writes a workspace-relative path-per-line manifest.
score_coverage_reporter gains an optional instrumented_sources_manifest
parameter that passes the manifest to the reporter via
--instrumented_sources_manifest.
Reporter augmentation (coverage/reporter.py):
After llvm-cov export, the reporter compares the manifest against
covered sources from the LCOV output. For each missing file a
synthetic 0%-coverage LCOV record is appended (SF + DA per
non-blank line + LF/LH). The llvm-cov text summary TOTALS line is
updated in-place using re.finditer to preserve fixed-width column
alignment. Per-file HTML pages and a "Not Linked Into Tests" index
section are generated for visibility.
Correctness and security fixes applied during review:
- Use str.replace() instead of str.format() for HTML template
rendering so that { and } in C++ source bodies do not crash the
reporter with KeyError/ValueError.
- Separate stderr from stdout for run_llvm_cov_export and
run_llvm_cov_report (separate_stderr=True) so that llvm-cov
warning messages are not mixed into LCOV/summary output.
- Validate that resolved manifest paths stay within workspace_root
via Path.is_relative_to() before reading files.
- Extend _escape_html to cover ' (') and " (") so that
file paths with apostrophes do not break HTML attributes.
- Count only non-blank lines for LF in synthetic LCOV records
to avoid inflating the denominator in aggregate metrics.
Test fixture (tests/coverage/uncovered.cpp, uncovered.h, BUILD.bazel):
cc_library intentionally not linked into any cc_test. Verifies that
the reporter surfaces the file at 0% coverage rather than omitting it.
Docs (coverage/README.md): new section 5a with usage example.
- Use heuristic to identify instrumentable lines instead of counting all non-blank lines. Filters comments, preprocessor directives, lone braces, namespace declarations, and access specifiers to avoid inflating LF values in synthetic LCOV records. - Augment summary.txt and console output with untested file line counts so the visible TOTALS reflect the true coverage including 0%-files. - Parse the llvm-cov column header to determine the Lines group index dynamically instead of hardcoding position 1. - Add workspace-bounds check after resolve() in _find_untested_sources to prevent path traversal via symlinks. - Escape single and double quotes in _escape_html to prevent attribute breakout in generated HTML pages.
- Narrow _NON_EXECUTABLE_RE block-comment pattern from `|\*.*` to `|\*(?:[/\s].*)?` so that pointer dereferences (`*ptr = value;`) are correctly classified as executable. - Add py_test with unit tests for all reporter augmentation helpers: _is_likely_executable, _count_instrumentable_lines, _covered_sources_from_lcov, _find_untested_sources, _append_zero_coverage_lcov, _augment_text_summary, _escape_html. Includes a path-traversal rejection test for _find_untested_sources.
- Add py_library target for reporter so unit tests can import it. - Remove coverable_test from instrumented_sources manifest targets to avoid testonly dependency violation (test sources don't need to appear in the manifest — they're tested by definition).
The heuristic line count (_count_instrumentable_lines) cannot replicate what llvm-cov would report for actually-instrumented objects. Rewriting TOTALS with approximate numbers gives false precision. - _augment_text_summary: no longer rewrites the TOTALS line; appends a clearly-labelled WARNING banner with ~N estimated lines instead. - _inject_untested_section_into_index: injects a prominent banner right after <body> so it is the first thing reviewers see. Detail table uses ~N notation and includes a disclaimer about the heuristic. - _append_zero_coverage_lcov docstring documents the approximation. - Tests updated to match banner-only behavior.
nradakovic
left a comment
There was a problem hiding this comment.
To be honest. I will need a lot of time (and coffee) to do proper review. Hopefully by Monday I will have feedback.
Thanks for the quick feedback Nikola. In fact, I took most of the code from this PR in the communication repository. Additionally, I fixed that files without unittests are shown in the report so not a higher coverage is show that is actually the case. Maybe for the review you seek help in the communication team. Happy review and hopefully good ☕😉 |
And this is also another problem. I'll for sure have some questions why certain implementation blocks are done in such way, but since communication PR is already merged (and in use?) not sure if I want to dive into large discussion and diverge from original PR. Anyway, let's see once I have concrete feedback. If something is on critical path, I will request that communication change their implementation. |
|
So something that I see by just looking at. This implementation relies on a generated Bash wrapper to reconstruct Bazel runfiles and workspace paths before invoking the actual reporter. This feels a bit non-Bazel-native. Can we instead model this as an executable Starlark rule, or move runfiles/path resolution into the reporter binary using Bazel runfiles libraries? |
nradakovic
left a comment
There was a problem hiding this comment.
For start I left 2 comments. It's enough to discuss Bazel implementation blocks (I still haven't check Python scripts) but I think they look OK. Have to do better look.
I see two paths for this PR to get merged:
- I will approve as is but only if we have plan to improve current implementation (tickets to track it with clear plan how to improve it). What communication guys did is on them to support it. This will be untilized by every single module in S-CORE so we do not want to find nasty surprises 2 days before release of S-CORE.
- We start now changing blocks but this will require some time and probably we will need to include more people who have more exp. with Bazel and code coverage.
@RSingh1511 Please provide your thoughts on this as well.
| reporter can add 0%-coverage entries for files that no test linked against.""", | ||
| ) | ||
|
|
||
| def score_coverage_reporter( |
There was a problem hiding this comment.
As I already said in my previous comment, this simply looks wrong from Bazel native perspective. This is one of the biggest anti-patterns in Bazel and people very often fall into this trap which later brings nothing more than pain in the head. Genrules are for simple content transformation (cat, grep, sed), not what we have here.
The solution here would be to encapsulate all this into custom rules with executable python binaries + Python's Runfiles API as dependency.
Then we would benefint in multiple fronts:
- Escaping horrible complexity with genrule (
\\$$,$$,$) - Increasing reproducibility since shell has differences per OS.
- Increasing debug capabilites.
There was a problem hiding this comment.
Thanks for the feedback. Bazel is a b...east 🤪 ... I still try to tame it 😉
Addressed with d762cc5
| # file: | ||
| # | ||
| # coverage --instrumentation_filter="^//<your_top_level_package>[/:]" | ||
| # coverage --coverage_report_generator=//<your_dir>:reporter_wrapper |
There was a problem hiding this comment.
And here is where things become fun. What if reference integration wants to execute code coverage which is in the end overall plan. How reference integration will set reporter for different modules? For start, everything that needs to be set in module .bazelrc file needs to be implemented over module extention (if possible).
|
Hello @olivembo , Thanks for this. Since you referenced communication, could you create an adoption PR in communication or logging showing how we can consume it? A real data example would help with verification and allow everyone to understand the use case. |
There was a problem hiding this comment.
Pull request overview
This PR extends the repo’s Bazel/llvm-cov coverage tooling so that C/C++ source files reachable from configured coverage roots—but not linked into any executed test binary—are still surfaced in coverage outputs as synthetic 0%-coverage entries (LCOV, HTML, and text summary). It does so by generating a manifest of reachable sources via a Bazel aspect and then augmenting the reporter outputs based on a manifest-vs-LCOV diff.
Changes:
- Adds a Bazel aspect +
score_instrumented_sources_manifestrule to enumerate transitive C/C++ sources and plumbs an optionalinstrumented_sources_manifestintoscore_coverage_reporter. - Implements report augmentation in
coverage/reporter.py(LCOV + HTML index/pages + text summary banner) and adds unit tests for the augmentation helpers. - Adds coverage adoption docs/config (
coverage/README.md,coverage.bazelrc, filter regex baseline) and a consumer smoke-test workspace undertests/coverage.
Reviewed changes
Copilot reviewed 5 out of 27 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/MODULE.bazel | Adds rules_python + Python toolchain setup for the test workspace. |
| tests/coverage/uncovered.h | Adds an intentionally untested header fixture for smoke-testing “0% untested file” surfacing. |
| tests/coverage/uncovered.cpp | Adds an intentionally untested source fixture for smoke-testing “0% untested file” surfacing. |
| tests/coverage/reporter_test.py | Adds unit tests for coverage reporter augmentation helpers (LCOV/HTML/text/escaping/path checks). |
| tests/coverage/coverage_justifications.yaml | Adds a sample justification DB used by the coverage smoke test. |
| tests/coverage/coverage_filter_regexes.txt | Adds smoke-test consumer-side extra ignore regexes. |
| tests/coverage/coverable.h | Adds a small API exercised by the coverage smoke test (with an intentionally uncovered branch). |
| tests/coverage/coverable.cpp | Adds the implementation for the smoke-test library. |
| tests/coverage/coverable_test.cpp | Adds a test that intentionally leaves one branch uncovered for report validation. |
| tests/coverage/BUILD.bazel | Wires the smoke test targets, manifest generation, and reporter wrapper. |
| tests/BUILD.bazel | Exports MODULE.bazel so the reporter wrapper can resolve workspace root at runtime. |
| tests/.bazelrc | Imports coverage bazelrc and adds coverage smoke-test settings for the tests workspace. |
| README.md | Updates top-level README to document coverage tooling availability and link to coverage guide. |
| MODULE.bazel | Adds rules_python, rules_shell, Python toolchain, and pip hub wiring for coverage tooling. |
| coverage/requirements.in | Adds pyyaml as the direct dependency for justification processing. |
| coverage/requirements_lock.txt | Adds a locked pyyaml pin with hashes for Bazel pip integration. |
| coverage/reporter.py | Implements final report generation + augmentation with synthetic 0%-coverage entries for untested sources. |
| coverage/README.md | Adds an adoption guide for consumers (setup, configs, optional untested-files manifest). |
| coverage/merger.py | Adds per-test merger that packages profdata + object file metadata for final reporting. |
| coverage/justify.py | Adds YAML + in-source marker processing to generate a justification manifest. |
| coverage/generate_coverage_html.sh | Adds a consumer-facing driver to extract reports, run justification/effective-coverage, and optionally archive artifacts. |
| coverage/filter_regexes.txt | Adds baseline ignore regexes (tests/mocks/fakes/external/benchmarks). |
| coverage/effective_coverage.py | Adds HTML post-processing + effective coverage computation incorporating justifications. |
| coverage/defs.bzl | Adds the manifest aspect/rule and updates score_coverage_reporter to pass the manifest into the reporter. |
| coverage/coverage.bazelrc | Adds generic Bazel coverage flags/config to be imported by consumers. |
| coverage/BUILD.bazel | Adds Bazel targets for merger/reporter/justify/effective_coverage and the driver script. |
| .gitignore | Ignores generated cpp_coverage/ output directories. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - **`clang_tidy/.clang-tidy`** — centralized default check set (conservative baseline, tailorable per module) | ||
| - **`clang_tidy/clang_tidy.bazelrc`** — `--config=clang-tidy` bazelrc config consumers can import | ||
| - **`//coverage:reporter` + `score_coverage_reporter` macro** — llvm-cov source-based | ||
| coverage with a shared baseline of ignore regexes (test/mock/fake/external), pluggable |
…e_reporter The previous implementation used a genrule + heredoc to generate the reporter wrapper shell script. This required three-level escaping (\\$$) to handle Bazel make-variable and bash variable expansion in the same string which makes the code fragile and hard to debug. Replace with a custom Starlark rule (`_score_coverage_reporter_rule`) that uses `ctx.actions.write()` to produce the wrapper script directly. A new `_rlocation_path()` helper computes `Runfiles.Rlocation()`-compatible paths from File objects, eliminating all genrule make-variable gymnastics. The `--workspace_root` argument (previously computed via `readlink -f` in bash) is replaced by `--module_bazel` (a plain rlocation string). The reporter resolves it with `r.Rlocation()` and derives the workspace root in Python. The generated wrapper is now a readable 7-line script with no escaping. `rules_shell` is no longer a dependency of defs.bzl. Addresses review comment: eclipse-score#9 (comment)
|
|
Gentleman still wait or proceed with merge? |
|
Two cents from the guy where this was copied over from. As I wrote already in the Slack, the proper generation of baseline coverage requires a deeper look. Especially, as rules_cc is missing this support today - and I believe for a proper generation of baseline coverage, we need to look there first. But, this will require quite some time. Second, I am still in the workings to make the coverage also correct for QNX. First experiments also showed that the coverage for QNX will bring coverage results for rust included. Thus, I am also not sure yet how to best handle this and if this will be at the end an "cpp_policies" only thing, or might not include Rust also. Third, we still have a problem with constexpr functions, as they appear not covered. Fourth, my original plan would have been, once the three mentioned points are solved, to contribute this solution to toolchains_llvm, as obviously this can be reused by a way broader community. Anyhow, it can be decided that this is "good enough" for now, and these problems are tackled in an incremental way. On a personal note - while the licenses and copyrights that we utilize in this project fully allow a copying of the code to different places without touching the license + copyright, I still think it follows best practices to give at least some credit / blame for the original author. Feels quite bad if a topic where quite some time was invested is just copied over. I would have loved to see at least a clean commit with the original state and author, and then the improvements, refactorings on top of it. (I do not require to change this, but I would like to raise awareness). |
It's not approved yet |
Hi Jan, thanks for the detailed write-up. This is genuinely useful context. On the technical points:
On attribution: you're right to call this out, and I'll be direct: It should have been handled better from the start. A commit that establishes the original state with proper authorship, followed by the changes on top, would have been the clean way to do it. That ship has sailed for the commit history, but I've updated the PR description to explicitly credit your PR #549 as the origin of the pipeline. Not a substitute for getting the history right, but at least the record is there. The plan: Merge this as "good enough for now", and track the three open items as follow-ups. Does that work for you? |
Summary
llvm-covonly reports files whose object files are linked into at least one test.Source files that exist in the workspace but no
cc_testpulls in silently disappear from the coverage report.This PR adds a Bazel aspect that walks the dependency graph to collect all C/C++ sources, compares them against what
llvm-covactually reported, and augments the LCOV, HTML, and text outputs with synthetic 0%-coverage entries for the missing files.The coverage pipeline (merger, reporter, justify, effective_coverage) was originally developed by Jan Schlosser in eclipse-score/communication#549.
This PR centralizes it in score_cpp_policies, adds the baseline-coverage aspect, and improves the reporter.
Approach
Why an aspect + manifest instead of patching
--instrumentation_filter?instrumentation_filtercontrols which targets get built with coverage instrumentation, butllvm-covstill only reports files whose.ois linked into a test binary. There is no Bazel-native way to surface the gap. The aspect walksdeps/srcs/implementation_depstransitively and writes a manifest of all reachable.cpp/.cc/.cxx/.cfiles. The reporter then diffs manifest vs. LCOV output.Why heuristic line counts instead of exact numbers?
Without running
llvm-covagainst actual instrumented object files (which don't exist for untested sources), we can't get exact instrumentable line counts. The heuristic (_count_instrumentable_lines) filters blank lines, comments, preprocessor directives, lone braces, and namespace declarations. All outputs explicitly label these as estimates(
~N, "estimated via heuristic") to avoid false precision. The TOTALS line insummary.txtis intentionally left untouched. Only a WARNING banner is appended.Known limitations (tracked for follow-up)
rules_ccfirst. Current heuristic line counts are estimates.constexprfunctions: These appear uncovered due to how llvm-cov handles them.What changed
coverage/defs.bzl:_collect_sources_aspect,score_instrumented_sources_manifestrule,instrumented_sources_manifestparameter onscore_coverage_reportermacrocoverage/reporter.py: LCOV augmentation, HTML augmentation (top-banner + per-file pages + detail table), text summary banner, workspace-bounds check, HTML escaping including quotescoverage/BUILD.bazel:reporter_libpy_library for testabilitytests/coverage/:uncovered.cpp/.hfixture (library with no test),reporter_test.pywith unit tests for all augmentation helpers including path-traversal rejectionConsumer usage