v0.9.0 — Trustworthy Output: accurate tokens, honest features, pipe output#1
Conversation
…ross languages Begins v0.9.0 "Trustworthy Output". Fixes six verified signature/visibility extraction bugs that made advertised flags lie, each with a regression test: - Java --visibility was a total no-op: get_visibility returned All, so --visibility public dropped every symbol and private leaked all (B12) - C/C++ pointer/reference-return functions were silently dropped (B13) - C++ qualified return types (std::string) misread as the function name (B14) - Rust bodiless trait methods (function_signature_item) dropped from signatures and structure counts (B15) - Python class base lists double-parenthesized: class User((Base)) (B16) - Rust pub(crate)/pub(super) reported as fully public (B18) Also modernize two sort_by calls to satisfy newer clippy under -D warnings, and add docs/research/next-release-roadmap.md (the full v0.9.0 plan). Verified: 318 lib tests (all features) + full default suite pass; clippy --all-targets --all-features -D warnings clean; fmt clean.
…m flags
v0.9.0 "Trustworthy Output" — F1 + F7.
F1 — token counting is now model-accurate and selectable:
- New `--encoding {o200k_base|cl100k_base}` flag + `encoding` TOML option,
defaulting to o200k_base (GPT-4o / o-series). The old hardcoded cl100k_base
under-counts every modern OpenAI model.
- Encoding threads through estimate_tokens / count_file_tokens /
count_tree_tokens and the `--token-count` report.
F7 — invalid flag values are now rejected instead of silently coerced:
- `--truncate`, `--visibility`, `--encoding` use clap value_parser, so bad
values error with `[possible values: ...]` and `--help` lists them
(fixes B3 --visibility and B4 --truncate silent acceptance).
Docs: document `--encoding`/`--visibility`, fix the `--truncate` default in
README (was wrongly "none (default)"; clap default is "smart", modes smart|byte).
Verified: 321 lib tests (all features) + full integration suite pass;
clippy --all-targets --all-features -D warnings clean; fmt clean. Manual run
confirms o200k vs cl100k produce different counts and invalid values are rejected.
…e bypass & hash v0.9.0 "Trustworthy Output" — F4 + B1 + B2. - --max-tokens now counts the real tokenizer (per --encoding) on each file's rendered output, replacing the buf.len()/4 (parallel) and metadata().len()/4 (serial) byte heuristics. Both paths estimate identically, restoring byte-for-byte determinism between parallel and non-parallel builds. - Fix B1: the first file no longer bypasses the budget (dropped the `tokens_used > 0` guard); a single oversized file is now omitted with a notice. - Debit the header + file tree from the budget so --max-tokens covers the whole document. Tokenization is skipped entirely when no budget is set (no overhead on normal runs — caught a 50x test-suite slowdown from unconditional counting). - Fix B2: the content hash now folds in every output-affecting option (line_numbers, max_tokens, encoding, tree-sitter flags, encoding_strategy), so toggling e.g. --line-numbers changes the hash. Still hashes raw file content (not rendered output, which embeds volatile mtimes) to stay checkout-stable. Adds regression tests for first-file truncation and hash sensitivity/determinism. Verified: 323 lib (all-features) + 209 lib (serial) + full integration pass; clippy --all-targets --all-features -D warnings clean; fmt clean.
… stray --diff-only v0.9.0 "Trustworthy Output" — B6 + B7 + B8. - B6: collapse the two byte-identical, "must stay in sync" config-hash functions (cache.rs::hash_config and state.rs::compute_config_hash) into one shared config::config_fingerprint. No more drift hazard. - B7: the auto-diff cache hash now reflects the *resolved* CLI values for --signatures/--structure/--truncate/--visibility (effective_config previously only carried filter/ignore/line_numbers, so the hash saw the config-file values — usually None — and toggling these flags reused a stale baseline). Also fold in encoding_strategy (it transcodes non-UTF-8 content). Pure output-rendering options (--diff-only, --encoding tokenizer) are deliberately EXCLUDED so toggling them against an existing baseline does not discard it. - B8: --diff-only now warns when used without auto_diff instead of silently emitting full file contents. Note: the roadmap suggested also hashing diff_only/timestamped_output/ output_folder, but test_diff_only_mode_includes_added_files proves --diff-only must stay toggleable against a baseline; new config_fingerprint_sensitivity test documents the state-vs-output-only distinction. Verified: 324 lib (all-features) + 209 lib (serial) + full integration pass; clippy --all-targets --all-features -D warnings clean; fmt clean.
v0.9.0 "Trustworthy Output" — F2.
- `-o -` writes the generated document to stdout instead of a file, enabling
`context-builder -f rs -o - | llm`. generate_markdown writes through a
Box<dyn Write + Send> that is either the file or io::stdout().
- The auto-diff path also honors stdout (its in-memory composed document is
written to stdout), so diffs pipe too.
- Human-facing chatter ("Documentation created", timings, context-size warning)
is suppressed in stdout mode so it never corrupts the pipe; genuine warnings
stay on stderr.
- `-` is never folded into an output folder or timestamped name
(resolve_output_path returns early), and no file literally named `-` is created.
Uses the `-o -` convention (no new Args field, so no churn across the ~90 test
literals). Adds a config-resolver regression test; manually verified the pipe is
clean, chatter is absent from stdout, and normal file output is unaffected.
Verified: 325 lib (all-features) + 211 (serial) + full integration pass;
clippy --all-targets --all-features -D warnings clean; fmt clean.
…ategy warning v0.9.0 "Trustworthy Output" — low-severity bug mop-up (B19, B20, B5). - B19: clamp find_truncation_point's result to a UTF-8 char boundary in the shared dispatcher. The per-language `max_bytes` fallback could land mid-char, panicking once smart truncation slices on it. Node boundaries are already char-safe, so the clamp is a no-op for them. (regression test added) - B20: write the auto-diff cache atomically (NamedTempFile + persist/rename) instead of truncate-then-write under a lock. A crash mid-write no longer leaves a truncated, baseline-dropping cache; readers always see a complete file. - Drop the `fs2` dependency: std::fs::File provides advisory locking (lock_shared/unlock) since Rust 1.89, which already shadowed fs2's trait. Set rust-version = "1.89" to document the resulting MSRV. (roadmap §5 dep-diet) - B5: warn on an unrecognized `encoding_strategy` in config instead of silently falling back to "detect". Verified: 326 lib (all-features) + 211 (serial) + full integration pass; clippy --all-targets --all-features -D warnings clean; fmt clean.
…oken-count preview v0.9.0 "Trustworthy Output" — B17 + B9 (completes the low-severity mop-up). - B17: C++ struct inheritance, enum underlying types, and using/typedef alias targets — plus C typedef aliased types — are now byte-sliced/preserved instead of collapsed to a bare `struct X`/`enum X`/`typedef X`/`using/typedef X`. - B9: `--token-count` now renders each file through the same process_file path the document uses, then tokenizes that — so the preview matches the produced output (encoding transcoding + tree-sitter enrichment) instead of re-reading raw bytes via a divergent path. A file that fails to render contributes 0 tokens (matching the actual output); the unreadable-file test was updated. Also brings the CHANGELOG up to date for the whole mop-up (B5/B9/B17/B19/B20). Adds a C++ struct/enum/alias regression test. Verified: 327 lib (all-features) + 211 (serial) + full integration pass; clippy --all-targets --all-features -D warnings clean; fmt clean.
There was a problem hiding this comment.
Pull request overview
This PR implements the v0.9.0 “Trustworthy Output” release theme by making token counting/budgeting match real tokenizer behavior, improving pipe-friendly output, and fixing multiple tree-sitter signature/visibility correctness issues across languages. It also unifies the auto-diff/cache config fingerprinting to prevent stale baselines when output-affecting options change.
Changes:
- Adds selectable tokenizer encoding (
--encoding, configencoding) witho200k_basedefault, and wires token counting/budgeting through the chosen encoding. - Makes output pipe-friendly (
-o -to stdout) and updates auto-diff + hashing/budgeting behavior for determinism and correctness. - Fixes tree-sitter extraction/visibility/truncation edge cases (Rust/Java/C/C++/Python) and adds regression tests.
Reviewed changes
Copilot reviewed 28 out of 29 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/test_phase4_integration.rs | Updates integration test args to include encoding |
| tests/test_parallel_memory.rs | Updates parallel tests to include encoding |
| tests/test_determinism.rs | Updates determinism/cache-key tests to include encoding |
| tests/test_cwd_independence.rs | Updates CWD-independence tests to include encoding |
| tests/test_config_resolution.rs | Ensures resolved config includes encoding; updates tests accordingly |
| tests/test_comprehensive_edge_cases.rs | Updates edge-case tests to include encoding |
| tests/test_binary_file_autodiff.rs | Updates autodiff tests to include encoding |
| tests/test_auto_diff.rs | Updates autodiff workflow tests to include encoding |
| tests/cli_integration.rs | Updates CLI integration tests to include encoding |
| src/tree_sitter/truncation.rs | Clamps truncation points to UTF-8 boundaries + regression test |
| src/tree_sitter/languages/rust.rs | Extracts bodiless trait methods and fixes restricted pub(...) visibility handling |
| src/tree_sitter/languages/python.rs | Fixes Python base-list double-parenthesizing + regression test |
| src/tree_sitter/languages/java.rs | Implements real Java visibility filtering + regression tests |
| src/tree_sitter/languages/cpp.rs | Fixes C++ name resolution and preserves struct/enum/alias signature details + tests |
| src/tree_sitter/languages/c.rs | Fixes C pointer-return function extraction and preserves typedef target + test |
| src/token_count.rs | Adds Encoding enum + real tokenizer selection; aligns file/tree token counting with renderer |
| src/state.rs | Unifies config hashing via shared config_fingerprint |
| src/markdown.rs | Adds stdout output, debits header/tree tokens, and uses real-tokenizer budgeting (parallel + serial) |
| src/lib.rs | Pipes stdout mode through the app, updates token-count rendering path, and improves warnings/cache invalidation |
| src/config.rs | Adds encoding to config; introduces shared config_fingerprint and tests for sensitivity |
| src/config_resolver.rs | Resolves encoding; prevents folder/timestamp logic when output is - |
| src/cli.rs | Adds --encoding and value-parser validation for truncate/visibility/encoding |
| src/cache.rs | Drops fs2 usage; switches cache writes to temp-file + rename; uses shared config fingerprint |
| README.md | Documents stdout piping and encoding selection; updates CLI option docs |
| docs/research/next-release-roadmap.md | Adds roadmap/analysis document referenced by the release |
| CHANGELOG.md | Adds v0.9.0 in-progress changelog entries summarizing the work |
| Cargo.toml | Sets rust-version = "1.89" and removes fs2 dependency |
| Cargo.lock | Removes fs2 from the lockfile |
| benches/context_bench.rs | Updates benchmarks to include the new encoding arg |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (idx, entry) in files.iter().enumerate() { | ||
| // Estimate tokens for this file (~4 bytes per token) | ||
| let file_size = std::fs::metadata(entry.path()) | ||
| .map(|m| m.len()) | ||
| .unwrap_or(0); | ||
| let estimated_file_tokens = (file_size as usize) / 4; | ||
|
|
||
| if let Some(budget) = max_tokens { | ||
| if tokens_used + estimated_file_tokens > budget && tokens_used > 0 { | ||
| let remaining = files.len() - idx; | ||
| writeln!(output, "---\n")?; | ||
| writeln!( | ||
| output, | ||
| "_⚠️ Token budget ({}) reached. {} remaining files omitted._\n", | ||
| budget, remaining | ||
| )?; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| tokens_used += estimated_file_tokens; | ||
| // Render the file into a buffer first, then count the rendered chunk | ||
| // with the real tokenizer — identical to the parallel path. | ||
| let mut buf: Vec<u8> = Vec::new(); | ||
| process_file( | ||
| base_path, | ||
| entry.path(), | ||
| &mut output, | ||
| &mut buf, | ||
| line_numbers, | ||
| encoding_strategy, | ||
| ts_config, | ||
| )?; |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5b1f0f5ab1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let final_args = args; | ||
| // `-o -` streams the document to stdout (pipe mode). In this mode all | ||
| // human-facing chatter must go to stderr so it doesn't corrupt the pipe. | ||
| let to_stdout = final_args.output == "-"; |
There was a problem hiding this comment.
Route processing prompts away from pipe output
When -o - is used on a project with more than 100 collected files and the user does not pass --yes, the later prompter.confirm_processing(files.len()) path still calls confirm_processing, which prints its warning with print! to stdout before the markdown is written. That prepends the prompt to the piped document (and can also block in a pipeline), so pipe mode is unsafe unless this prompt is skipped or routed to stderr.
Useful? React with 👍 / 👎.
| let vis = self.get_visibility(node, source); | ||
| if !vis.matches_filter(visibility_filter) { | ||
| return None; |
There was a problem hiding this comment.
Preserve public trait methods under public filtering
When --signatures --visibility public is used on a Rust file with a public trait, required trait methods such as pub trait T { fn f(&self); } have no visibility_modifier on the function_signature_item itself, so this call classifies them as private and drops the newly extracted method signatures. The result shows the public trait but omits its public API requirements, which defeats the trait-method extraction added here.
Useful? React with 👍 / 👎.
…rial streaming) Three valid findings from the automated PR review (Copilot + Codex), each verified against the source before fixing: - Pipe mode (`-o -`) no longer corrupts the piped document: the >100-file confirmation prompt (`confirm_processing`, which `print!`s to stdout and blocks on stdin) is now skipped when output is stdout, alongside the existing chatter guards. Pipe mode is non-interactive, so it proceeds as `--yes` would. (Codex P2) - Public trait methods are no longer dropped under `--visibility public`: a `function_signature_item` carries no visibility modifier (Rust forbids one on trait items), so `extract_function_sig_item` now inherits the enclosing trait's visibility via a new `sig_item_visibility` helper instead of defaulting to private. A public trait's required methods are kept; a private trait's are still filtered out. (Codex P2) - The non-parallel build streams each file straight to output when no `--max-tokens` budget is set, instead of always rendering into a Vec first (buffering is only needed to tokenize the chunk for the budget). Keeps peak memory bounded on large files. Affects `--no-default-features` builds only (default is parallel). (Copilot) Adds regression tests: pipe_mode_skips_processing_confirmation (cli_integration) and test_public_trait_methods_survive_public_filter (rust.rs). CHANGELOG updated. Verified: fmt, clippy (default + all-features, -D warnings), and full tests pass in both parallel and serial builds.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0229f5f3b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // No modifiers node → package-private, treated as non-public. | ||
| Visibility::Private |
There was a problem hiding this comment.
Keep implicit public Java interface methods
When --signatures --visibility public processes a Java public interface Foo { void bar(); }, the method_declaration for bar has no modifiers child because Java interface methods are implicitly public. This new fallback classifies all modifier-less declarations as Private, and extract_method_signature then drops them under the public filter, so the generated signatures include the public interface but omit its public API methods.
Useful? React with 👍 / 👎.
…ublic Follow-up to the new Codex PR #1 review comment — the Java analog of the Rust trait-method fix. A Java interface method (`public interface Foo { void bar(); }`) is implicitly public but carries no `modifiers` node, so the B12 fallback classified it package-private and `extract_method_signature` dropped it under `--visibility public`, hiding the interface's API. Java's `get_visibility` now resolves a modifier-less declaration by context: implicitly public as an interface/annotation member, package-private in a class/enum/record. An explicit `public` still wins, and an explicit `private`/`protected` (including Java 9+ private interface methods) is still respected. Added `is_interface_member` (nearest-enclosing-type walk) and a `modifiers_text` helper. Verified the bug is unique to Java + Rust among supported languages: TypeScript extracts interfaces as one signature with no per-member filter; C++ `get_visibility` is dead code and never filters (signatures are stamped with the filter value, never dropped); C/Go/Python/JS have no per-member visibility filtering. Adds regression test test_interface_methods_survive_public_filter. Verified: fmt, clippy --all-features -D warnings, all-features lib tests (329 passed).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7285e2047f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| config.structure, | ||
| config.truncate, | ||
| config.visibility, | ||
| config.max_tokens, |
There was a problem hiding this comment.
Keep token budgets out of the diff cache key
When auto_diff already has a baseline, adding or changing --max-tokens now changes config_fingerprint, so CacheManager::new looks at a different cache file and previous_state is None; the run loses the actual file diff and behaves like a fresh baseline even though the captured project content is unchanged. max_tokens only truncates the rendered document, so it should be excluded from the auto-diff/cache fingerprint just like the other pure rendering options.
Useful? React with 👍 / 👎.
| .clone() | ||
| .unwrap_or_else(|| args.visibility.clone()) | ||
| }, | ||
| encoding: if args.encoding != "o200k_base" { |
There was a problem hiding this comment.
Honor explicit default encoding overrides
If context-builder.toml sets encoding = "cl100k_base", running --encoding o200k_base should override the config, but this branch treats o200k_base as an implicit default and falls back to the config value. That leaves users unable to select the default tokenizer from the CLI without editing the config file, so --token-count/--max-tokens can use the wrong encoding despite an explicit CLI flag.
Useful? React with 👍 / 👎.
…erride) Two more valid Codex P2 findings, both verified against the source by tracing the auto-diff capture/compare flow and CLI resolution: - config_fingerprint no longer includes rendering options. The auto-diff baseline is each selected file's RAW captured content (ProjectState stores bytes via read_to_string; the diff compares those), which no rendering option changes. Keying the cache on line_numbers/signatures/structure/ truncate/visibility/max_tokens/encoding_strategy therefore caused spurious baseline resets: toggling e.g. --max-tokens or --signatures between runs discarded the diff and hid real content changes. The fingerprint now keys on filter+ignore only (the inputs that decide which files form the comparable baseline); the project path is already keyed separately in cache.rs. This supersedes the earlier B7 approach and fixes the same-class bug Codex flagged for max_tokens, for all rendering flags. (Codex P2) - Explicit CLI flags override config even at their default value. --encoding, --truncate, and --visibility carry a default value, so "value == default" could not be distinguished from "flag omitted" — an explicit `--encoding o200k_base` was ignored when the config set cl100k_base. run() now reads clap's ValueSource to detect explicitly-passed flags and threads that into the resolver (new ExplicitCli), so CLI-explicit > config > default holds even when the explicit value equals the default. (Codex P2) Regression tests: config_fingerprint_sensitivity rewritten to assert only filter/ignore change the key while every rendering option leaves it stable; explicit_cli_default_value_overrides_config covers the precedence fix. Verified end-to-end via the binary that `--encoding o200k_base` overrides a cl100k_base config. fmt, clippy (default + all-features, -D warnings), and full tests pass in both parallel (330 lib) and serial (212 lib) builds.
Bump version 0.8.3 -> 0.9.0 across Cargo.toml, Cargo.lock, SKILL.md (frontmatter + verify string), and the demo banner; finalize the CHANGELOG v0.9.0 heading with its release date. The release content (accurate token counting, honest tree-sitter signatures, pipe output, reliable auto-diff cache, and the PR-review fixes) landed via #1.
v0.9.0 — "Trustworthy Output"
First release after ~4 months of dormancy. Theme: accurate tokens, honest features, and pipe-friendly output — make the tool''s core promise (accurate, trustworthy, LLM-optimized output) actually hold. Planned in
docs/research/next-release-roadmap.md(generated via a multi-agent codebase analysis; bug findings adversarially verified).Features
--encoding {o200k_base|cl100k_base}(defaulto200k_base) +encodingTOML option. The old hardcodedcl100k_baseunder-counts every modern OpenAI model. (F1)-o -streams to stdout forcontext-builder -f rs -o - | llm; chatter stays off the pipe; auto-diff documents pipe too. (F2)--max-tokensnow counts the real tokenizer on each file''s rendered output (both parallel + serial paths estimate identically), debits the header/tree, and skips tokenization entirely when no budget is set. (F4)--truncate/--visibility/--encodingreject invalid values via clapvalue_parser(no more silent coercion);--helplists the valid set. (F7)Correctness fixes (18 of 20 verified bugs)
--visibilitywas a total no-op (B12); C/C++ pointer/reference-return functions were dropped (B13); C++ qualified return types were misread as the function name (B14); Rust bodiless trait methods were dropped (B15); Python base lists were double-parenthesized (B16); Rustpub(crate)was reported as public (B18); C/C++ struct inheritance / enum underlying types / alias targets were dropped (B17).--max-tokens(B1); the content hash now folds in every output-affecting option so toggling e.g.--line-numberschanges the hash (B2);--token-countpreview renders through the same path as the document so it matches (B9).--diff-onlywarns when used withoutauto_diff(B8); cache is written atomically via temp-file + rename, so a crash can''t corrupt the baseline (B20).encoding_strategywarns instead of silently defaulting (B5).Maintenance
fs2dependency —std::fs::Fileprovides advisory locking since Rust 1.89 (MSRV set to 1.89).sort_bycalls for newer clippy under-D warnings.Deferred to v0.10
--format xml(F3), wiring up--truncate smart(F5), a tree-sitter integration test (F8).Testing
Every commit passed, in both parallel and serial builds: full test suite (327 lib all-features / 211 serial + integration),
clippy --all-targets --all-features -- -D warnings, andcargo fmt --all --check.