diff --git a/.github/INVENTORY.md b/.github/INVENTORY.md index 2d23b137..e74e1bda 100644 --- a/.github/INVENTORY.md +++ b/.github/INVENTORY.md @@ -123,6 +123,7 @@ This file is the exact path inventory for the live GitHub Copilot catalog in thi - `.github/skills/local-agent-sync-external-resources/SKILL.md` - `.github/skills/local-agent-sync-global-copilot-configs-into-repo/SKILL.md` - `.github/skills/local-agent-sync-install-ai-resources/SKILL.md` +- `.github/skills/local-copilot-log-analyzer/SKILL.md` - `.github/skills/mattpocock-caveman/SKILL.md` - `.github/skills/openai-docx/SKILL.md` - `.github/skills/openai-gh-address-comments/SKILL.md` diff --git a/.github/instructions/internal-python.instructions.md b/.github/instructions/internal-python.instructions.md index d94e9905..3ef22e5e 100644 --- a/.github/instructions/internal-python.instructions.md +++ b/.github/instructions/internal-python.instructions.md @@ -11,7 +11,10 @@ This file is optimized for Copilot code review and should produce only evidenced - Verify guard clauses and error handling make failure modes explicit. - Flag unsafe input handling, shell invocation, or filesystem side effects. - Check function and module boundaries for readability and cohesion. +- Flag behavioral configuration buried in helpers, services, or library modules instead of centralized at the correct boundary: a script entrypoint, `Configuration` section, settings module, adapter, application factory, or composition root. +- Do not flag stable domain invariants merely because they are constants near domain code. - Verify type hints and public interfaces stay consistent with call sites. +- Flag manual formatting churn that fights the repository formatter; when Ruff is configured, prefer `ruff format` and Ruff diagnostics over subjective style edits. - Report dependency usage that is unpinned or unnecessary for the change. - Flag vendored libraries, wheelhouses, copied site-packages, or fallback dependency mirrors. - Flag new external dependencies that are missing hash-locked pins in the owning requirements file. diff --git a/.github/skills/internal-code-review/references/anti-patterns-python.md b/.github/skills/internal-code-review/references/anti-patterns-python.md index 26f7b694..9b2473e4 100644 --- a/.github/skills/internal-code-review/references/anti-patterns-python.md +++ b/.github/skills/internal-code-review/references/anti-patterns-python.md @@ -23,6 +23,7 @@ Baseline owner: `internal-python` | PY-M07 | `print()` instead of `logging` in application/library code | No log level control in production | | PY-M08 | Missing unit tests for new public functions | Violates test coverage mandate | | PY-M09 | Python tests outside repository-root `tests/` or without mirrored source paths | Breaks repository test discoverability and ownership mapping | +| PY-M10 | `rich`, emoji, tables, or panels outside human-facing CLI/reporting boundaries | Mixes terminal UI with importable logic or machine-readable output such as JSON | ## Minor @@ -61,7 +62,7 @@ except: try: result = fetch_data() except requests.RequestException as exc: - logger.warning("⚠️ Fetch failed: %s", exc) + logger.warning("Fetch failed: %s", exc) raise ``` @@ -89,5 +90,5 @@ import logging logger = logging.getLogger(__name__) def process(data: list[dict]) -> None: - logger.info("ℹ️ Processing %d items", len(data)) + logger.info("Processing %d items", len(data)) ``` diff --git a/.github/skills/internal-gateway-execute-plans/SKILL.md b/.github/skills/internal-gateway-execute-plans/SKILL.md index 72d5217e..1f27227a 100644 --- a/.github/skills/internal-gateway-execute-plans/SKILL.md +++ b/.github/skills/internal-gateway-execute-plans/SKILL.md @@ -50,6 +50,7 @@ consumes approved `compact` and `extended` retained plans. - Reject `compact` folders outside the `mini-plan-*` convention. - Ignore `questions.md` during execution. - Maintain a compact execution state and prefer targeted rereads over full file re-ingestion unless new evidence invalidates current state. +- Use `Compact Evidence Reporting` for large validator output: read enough output to decide the state honestly, then retain command, exit code, material counts, header or schema checks, changed files, and exact gaps instead of pasting raw logs. - Infer the execution strategy from `Plan profile`, folder shape, merged control-contract sections in `02-control.md` when applicable, and the validation path. Do not require a separate retained-plan consumer field. - Audit only mandatory requirements that are applicable; do not convert specialist rules into universal policy. - Use `superpowers-verification-before-completion` as the fresh-evidence owner; do not duplicate its mechanics. @@ -71,6 +72,10 @@ scope, anti-scope, and validation path, then iterate: 5. Continue only while evidence improves and no stop condition fires. 6. Stop with `DONE`, a blocker, or an explicit evidence gap. +Apply `Compact Evidence Reporting` after each focused validation: preserve the +exact gap and proof path, but keep large outputs summarized unless the raw +output is itself the missing evidence. + Stop on scope drift, destructive action, owner conflict, missing validation path, human approval need, secret exposure risk, or repeated non-improving failures. @@ -195,6 +200,7 @@ Before declaring any closeout step complete: - Hiding ownership conflicts instead of escalating a next owner and validation path. - Continuing the Agentic Execution Loop after evidence stops improving or a stop condition fires. +- Pasting raw large validator output when compact evidence would preserve the same proof. - Packaging `DONE` while evidence gaps still require `APPLIED_UNVERIFIED`, `PARTIAL`, or `BLOCKED`. - Declaring a non-`DONE` state without writing or updating the `-plan-state.md` marker. - Leaving stale `-plan-state.md` markers behind after a state transition. diff --git a/.github/skills/internal-gateway-idea-brainstorming/SKILL.md b/.github/skills/internal-gateway-idea-brainstorming/SKILL.md index d1b424f2..9f04ddf8 100644 --- a/.github/skills/internal-gateway-idea-brainstorming/SKILL.md +++ b/.github/skills/internal-gateway-idea-brainstorming/SKILL.md @@ -35,6 +35,7 @@ through retained-plan creation. It stops before execution. - Same-conversation support-skill loading is not a lane change. - Idea Gate 0 remains mandatory. - Start with a bounded evidence pass ordered by risk. Read only the smallest local owner evidence needed to classify the request before asking questions. +- For large tabular files, generated reports, or long log exports, keep the bounded evidence pass aggregate-first: collect file sizes, schema or headers, counts, anomalies, and targeted slices before any deeper read. - When authoritative platform semantics control feasibility or ownership, verify them early in the bounded evidence pass. - This gateway is not a specialized execution owner. A concrete task may not be accepted for execution here until Idea Gate 0 is `grill-me satisfied` and `Critical Gate 2` is `confident`. - For a direct concrete operation, emit `Specialization Checkpoint: gated`, explain that this owner cannot decide task ownership or execute yet, and continue with the bounded evidence pass plus mandatory `grill-me`. @@ -59,6 +60,7 @@ State rules: - If the incoming request is already concrete (file edit, command execution, validator run, or implementation step), start with `Specialization Checkpoint: gated` before Idea Gate 0. - At `Specialization Checkpoint: gated`, name the recommended specialized owner (`internal-gateway-simple-task` by default, `internal-gateway-review` for defect-first review, `internal-gateway-critical-master` for pressure testing), but do not ask the user to keep this owner yet. - Continue through the bounded evidence pass, mandatory `grill-me`, and critical gate before asking whether this owner should stay in charge of the task. +- Before the initial numbered block, keep large-file and large-log evidence compact: summarize counts, headers, anomalies, routes, and open gaps unless raw content is itself the missing evidence. - After the evidence pass, load `grill-me` and ask one mandatory numbered bulk question block with recommendations and defaults. - Before the initial numbered block, emit a compact facts/options summary derived from the bounded evidence pass. - Ask further focused numbered bulk blocks only for unresolved, dependent, or reopened branches. diff --git a/.github/skills/internal-gateway-review/SKILL.md b/.github/skills/internal-gateway-review/SKILL.md index 53834ce2..63d60e60 100644 --- a/.github/skills/internal-gateway-review/SKILL.md +++ b/.github/skills/internal-gateway-review/SKILL.md @@ -9,16 +9,69 @@ description: Use when repository-owned work needs same-conversation defect-first - `internal-code-review` - `internal-high-level-review` +- `internal-ai-resource-review` +- `internal-copilot-audit` - `internal-gateway-critical-master` - `internal-gateway-writing-plans` - `internal-agent-support-next-step` -Portable review orchestrator. This skill owns review scope, lens selection, -findings consolidation, critical support, and remediation-plan transition. It -does not apply fixes. +Treat this section as an audit and routing index, not a preload bundle. Load a +referenced skill only when the domain, finding, blocker, or phase requires it. + +Portable review orchestrator. Owns review scope, lens selection, findings +consolidation, critical support, and remediation-plan transition. It does not apply fixes. + +Before any user-visible verdict, run a lightweight internal check for evidence, +severity, false positives, contrary evidence, and scope narrowing. Load +`internal-gateway-critical-master` only for a material challenge. Revise or +reopen when the check exposes a material gap. See `references/review-gate.md` for the review output contract and gate states. +## Lens selection + +Select the review lens from the changed-path families, not from a single default. +A diff may activate more than one lens; load each only when its evidence appears. + +- Code lens (`internal-code-review`): Python, Bash, Terraform, Java, or + Node.js/TypeScript source changes. +- Systems lens (`internal-high-level-review`): architecture, workflow, or + cross-cutting impact beyond line-level defects. +- AI-resource lens (`internal-ai-resource-review`, with `internal-copilot-audit` + as the drift sub-lens): repository-owned AI customization assets, including + `.github/skills/**`, `.github/agents/*.agent.md`, `.github/prompts/*.prompt.md`, + `.github/instructions/**`, `AGENTS.md`, `.github/copilot-instructions.md`, + `.github/INVENTORY.md`, and `**/agents/openai.yaml`. + +When the diff is mainly AI customization assets, the AI-resource lens is +mandatory and the code lens stays scoped to any embedded scripts only. Do not +let the code lens silently skip `.md` skill, agent, or instruction files. + +## Token Discipline + +Inspect diff and failing evidence first; avoid broad repository scans unless an +evidence gap requires one; never preload referenced skills; show at most 5 +material findings unless exhaustive review is requested; summarize omitted +low-risk observations separately, not as findings. + +Use `Compact Evidence Reporting` for large diffs, generated files, tabular +exports, and logs: keep findings defect-first, cite the smallest excerpt or +file point that proves impact, and avoid dumping large raw blocks when a +targeted excerpt plus evidence path preserves the same proof. + +## Review To Plan Transition + +Before creating, accepting, or routing a remediation plan, keep the review +defect-first and map every original material finding: `id`, `status` (`planned`, +`deferred`, `rejected`, or `residual`), `reason`, `next owner`, and `validation +expected`. + +If remediation steps cover less than 100% of material findings, label the +output `partial remediation plan` and keep residual, deferred, or rejected +findings visible. A retained mini-plan is a coverage-preserving handoff authored +by `internal-gateway-writing-plans`; its job is plan creation, not fixes. This +gateway does not choose the execution owner. + ## When to use - The user asks for review of a concrete artifact, diff, workflow, or bundle. @@ -27,7 +80,11 @@ See `references/review-gate.md` for the review output contract and gate states. ## Validation - Findings stay defect-first. +- Lens selection matches the changed-path families; AI customization assets route to `internal-ai-resource-review` (drift via `internal-copilot-audit`) instead of being skipped by the code lens. - Review flow preserves compact context: prioritize diff and failing evidence first, then expand only when an evidence gap remains. -- Review output carries findings, severity, confidence, evidence gap, route or next owner, and a Review Gate outcome before the final verdict. -- Retained remediation plans are authored by `internal-gateway-writing-plans`. +- Large evidence may be reported compactly, but each material finding still keeps severity, confidence, evidence gap, counter-validation result, and route or next owner. +- Review output carries findings, severity, confidence, evidence gap, counter-validation result, route or next owner, and a Review Gate outcome before the final verdict. +- The review cannot present analysis to the user until counter-validation confirms it or reopens material gaps. +- Remediation-plan transitions preserve a 100% material-finding coverage map or explicitly declare a `partial remediation plan`. +- Retained remediation plans are authored by `internal-gateway-writing-plans` and preserve the coverage map. - The gateway stops before fixes. diff --git a/.github/skills/internal-gateway-review/references/review-gate.md b/.github/skills/internal-gateway-review/references/review-gate.md index bb3e0f14..5c0c4f7e 100644 --- a/.github/skills/internal-gateway-review/references/review-gate.md +++ b/.github/skills/internal-gateway-review/references/review-gate.md @@ -8,15 +8,19 @@ Use this reference when `internal-gateway-review` needs to package findings befo - Severity - Confidence - Evidence gap +- Counter-validation - Route or next owner - Review Gate outcome ## Gate States -- `review gate: satisfied` when the findings are specific, routed, and ready for the user-visible verdict. -- `review gate: reopen` when material evidence is missing or the remediation choice needs more challenge. +- `review gate: satisfied` when the findings are specific, routed, counter-validated, and ready for the user-visible verdict. +- `review gate: reopen` when material evidence is missing, counter-validation exposes a material flaw, or the remediation choice needs more challenge. ## Boundary - Keep the gate visible before any fixes. +- Run counter-validation before the final user-visible verdict; challenge each finding for evidence, severity, route, and contrary proof. +- For large diffs, generated files, logs, or tabular exports, keep evidence compact: cite the smallest excerpt or path that proves the finding and summarize omitted raw volume. +- Report only material self-critique results: corrections, confidence changes, evidence gaps, or confirmation that no material issue was found. - Use the gate to route each actionable finding to the smallest next owner. diff --git a/.github/skills/internal-gateway-simple-task/SKILL.md b/.github/skills/internal-gateway-simple-task/SKILL.md index cb0f4e59..dc8c2873 100644 --- a/.github/skills/internal-gateway-simple-task/SKILL.md +++ b/.github/skills/internal-gateway-simple-task/SKILL.md @@ -70,6 +70,33 @@ Classify every simple task before operational work as `full-gate`, - If planning, review, critical pressure, or multi-phase validation becomes the real job, `escalate`. +### Token Budget Gate + +- Run a `Token Budget Gate` before choosing `trivial-skip` or `plan-mode` when + the user asks for low-token execution or the task centers on large tabular + files, log exports, repeated tool output, or broad file changes. +- For Copilot or debug log analysis, start with file size, model-call counts, + prompt or token aggregates, tool-span counts, result-size summaries, and + targeted slices; avoid full JSON dumps or prompt bodies unless they are the + missing evidence. +- Keep compact reporting runner-agnostic: ask for bounded summaries, exit + state, counts, anomalies, and evidence gaps, but do not require `jq`, `awk`, + shell flags, or terminal-only recipes unless they are already the local + workflow being analyzed. +- A cost checkpoint pauses before a new expensive tool burst, broad reread, or + repeated execution loop. It does not interrupt ordinary conversation, + grill-me questioning, or collaborative reasoning when no expensive tool + action is being launched. +- If the user explicitly asks for full output, deeper slices, or continued + execution, name the likely token or context impact before expanding and then + either proceed with the smallest bounded next slice or ask for confirmation + before the new expensive burst. +- Keep `trivial-skip` only for truly tiny local work with obvious validation + and no material completeness risk. +- If context pressure could hide required validation, data integrity, or route + ownership, prefer `plan-mode` and apply the `Plan Profile Selection Guard` + before proposing `compact`. + ## Simple Procedure 1. Inspect local files first. @@ -147,9 +174,15 @@ the output shape changes. ### Profile - Default to `compact` (`tmp/superpowers/mini-plan-*`) with - `01-change-summary.md` and `02-execution.md`. -- Use `extended` only when the task needs multi-slice execution, multiple - independent validators, an articulated anti-scope, or external pins. + `01-change-summary.md` and `02-execution.md` only when the task stays within + one owner, one execution lane, one primary validation path, and low + completeness risk. +- Apply the `Plan Profile Selection Guard` before proposing `compact`. +- Use `extended` when the task needs multi-slice execution, multiple + independent validators, an articulated anti-scope, external pins, + cross-skill token-discipline work, validator-impacting changes, or + exports, generated reports, and datasets that need non-trivial + reconciliation. ### Procedure diff --git a/.github/skills/internal-gateway-simple-task/references/plan-mode.md b/.github/skills/internal-gateway-simple-task/references/plan-mode.md index df9ab530..0fa6fcdc 100644 --- a/.github/skills/internal-gateway-simple-task/references/plan-mode.md +++ b/.github/skills/internal-gateway-simple-task/references/plan-mode.md @@ -39,31 +39,55 @@ signal and ask for explicit confirmation before switching to plan mode. provisioning). - There is material risk that context pressure or chat limits will interrupt the work before it can be validated. +- The task centers on large `.csv`, `.tsv`, `.xlsx`, JSON log exports, + repeated tool output, or broad file changes that would bloat chat context. - The user is asking for a large refactor, migration, or cross-file mechanical change that is safer as a tracked plan. Do not switch to plan mode implicitly without declaring the detected signals and asking for user confirmation. -## Profile selection +### Token Budget Gate + +When the cost signals come mainly from context pressure instead of task count, +prefer a compact evidence posture rather than a raw-output posture. Keep same-chat +execution only for tiny local work; otherwise switch to `plan-mode` and let the +profile guard choose whether `compact` is still safe. + +A cost checkpoint pauses before a new expensive tool burst, broad reread, or +multi-step execution loop. It does not interrupt ordinary conversation, +grill-me analysis, or collaborative study when no expensive tool action is +starting. -- **Default `compact`**: use for a single owner, concrete target, one primary - validation path, and one execution lane. Folder name follows - `tmp/superpowers/mini-plan-*` and contains `01-change-summary.md` and - `02-execution.md`. -- **Use `extended` only when**: the task needs multi-slice execution, several - independent validators, an articulated anti-scope, or external pins that must - be tracked in a control file. +When the user explicitly asks for broader output, deeper analysis, or continued +execution, name the likely token or context impact first and then either +continue with the smallest bounded next slice or ask for confirmation before +the new expensive burst. + +## Profile selection -When in doubt, prefer `compact`. A simple task that needs a plan usually does -not need the overhead of an extended plan. +- **Default `compact`**: use only when the task stays within a single owner, + concrete target, one primary validation path, one execution lane, and low + completeness risk. Folder name follows `tmp/superpowers/mini-plan-*` and + contains `01-change-summary.md` and `02-execution.md`. +- **Plan Profile Selection Guard**: escalate to `extended` when context or + completeness risk is material, especially for cross-skill token-discipline + work, validator-impacting changes, exports or generated reports, datasets + that need non-trivial reconciliation, several independent validators, an + articulated anti-scope, or external pins that must be tracked in a control + file. + +When profile safety is in doubt, prefer `extended` and state why. Prefer +`compact` only when the plan can record the contrary evidence that keeps +lower-context execution safe. ## Confirmation rule for implicit triggers For implicit cost-signal triggers, emit a short statement that: 1. Names the detected cost signals. -2. Proposes `plan-mode` with a default `compact` profile. +2. Proposes `plan-mode` with the safest profile suggested by the signals, + defaulting to `compact` only when the profile guard stays clear. 3. Asks the user to confirm, decline, or choose `extended`. Do not write the retained plan until the user confirms. diff --git a/.github/skills/internal-gateway-writing-plans/SKILL.md b/.github/skills/internal-gateway-writing-plans/SKILL.md index f2b456a9..4063ff2b 100644 --- a/.github/skills/internal-gateway-writing-plans/SKILL.md +++ b/.github/skills/internal-gateway-writing-plans/SKILL.md @@ -44,12 +44,25 @@ New `compact` plans should use `tmp/superpowers/mini-plan-*`. | Profile | When | Required files | | --- | --- | --- | | `compact` | Single owner, concrete target, one validation path, low-to-medium risk, and one execution lane. Best fit for small/fast executors after positive handoff validation. | `01-change-summary.md`, `02-execution.md` | -| `extended` | Cross-family changes, higher risk, lower-context execution, multiple validators, or multi-slice execution state. Thinking-first profile with explicit control files and deterministic read order. | `01-change-summary.md`, `02-control.md`, `03-execution.md`, additional numbered files by category (`04-...`). | +| `extended` | Cross-family changes, higher risk, lower-context execution, multiple validators, or multi-slice execution state. Soft-limit profile: use judgment-based size review with completeness over compression, explicit control files, and deterministic read order. | `01-change-summary.md`, `02-control.md`, `03-execution.md`, additional numbered files by category (`04-...`). | + +### Plan Profile Selection Guard + +Escalate to `extended` when completeness or context-discipline risk is material: +cross-skill token-discipline changes; exports, generated reports, or datasets +with non-trivial reconciliation; validator-impacting contract changes; +external API contracts (credentials, pagination, retries, schema pinning); +executive-facing output; multiple validators; or synced always-on guidance +edits. Do not use `compact` when the executor needs exact sources, target files, validators, blockers, or external pins that only `02-control.md` can provide. +If `compact` is still chosen near one of those edges, the plan must record the +contrary evidence that keeps one owner, one execution lane, and one validation +path sufficient despite lower-context execution. + ## Explicit Constraints - Create retained plans under `tmp/superpowers//`. @@ -70,6 +83,8 @@ can provide. - Compact plans have a 2,000 estimated-token total budget measured as `ceil(UTF-8 bytes / 4)` across plan Markdown files. Keep `02-execution.md` under 1,500 estimated tokens. Treat warnings as required review inputs. +- For `extended`, treat token warnings as review inputs for completeness and + slicing. Prefer splitting into numbered files over compression. - `compact` uses exactly `01-change-summary.md` and `02-execution.md` during authoring. `extended` uses `01-change-summary.md`, `02-control.md`, `03-execution.md`, and optional higher numbered files. @@ -98,6 +113,11 @@ can provide. - For `extended`, implementation-contract sections are merged into `02-control.md` with these exact headings: `Sources`, `Candidate targets`, `Validation commands`, `Blockers and fallback rules`, and `External pins`. +- For `extended`, recommend adding deep companion files only when justified by + triggers, and keep them as recommendations (not ERROR-level required files): + `data-contract.md` for reconciled datasets and schema mappings, + `validation-runbook.md` for multi-validator troubleshooting or rollback paths, + and API/schema pin notes when external dependencies or credentials drive risk. - Apply a say-once rule: each control fact (target, owner, validator, blockers, pins, and source-item coverage) is written once in the owning file, and step files do not restate target/owner/validator. @@ -126,7 +146,10 @@ can provide. `questions.md` file for `compact`. 11. Run scope challenge and plan review gate for non-trivial plans. 12. Run `audit` first, then run `handoff-check`; execute only when ready. -13. Treat token warnings as review inputs for compression or split decisions, not as proof of measured savings. +13. Treat token warnings as review inputs, not as proof of measured savings. For + `extended`, prefer splitting into numbered files over compression, and never + compress away source pins, schema contracts, validation rules, stop + conditions, or failure-investigation steps. ## Validation diff --git a/.github/skills/internal-gateway-writing-plans/references/plan-review-gate.md b/.github/skills/internal-gateway-writing-plans/references/plan-review-gate.md index 32d2ff92..d3da491e 100644 --- a/.github/skills/internal-gateway-writing-plans/references/plan-review-gate.md +++ b/.github/skills/internal-gateway-writing-plans/references/plan-review-gate.md @@ -33,7 +33,7 @@ or handoff. It checks clarity and validity without creating reviewer personas. | Open questions | Is `questions.md` present and set to `- none` for execution handoff, or explicitly blocking handoff? | | Lifecycle status | Is plan state explicit (`scaffold`, `ready`, or `closed`) so an executor does not infer readiness? | | Token discipline | Does the ledger define `Initial evidence pass` and `Reading budget` so the executor can classify the folder with the fewest safe reads? | -| Profile token budget | Is `compact` within the 2,000 estimated-token total budget, with `01-change-summary.md` under 300 and `02-execution.md` under 1,500, or escalated to `extended`? | +| Profile token budget | Is `compact` within the 2,000 estimated-token total budget, with `01-change-summary.md` under 300 and `02-execution.md` under 1,500, or escalated to `extended`? For `extended`, are soft limits reviewed with completeness over compression and split-by-slice decisions when files grow large? | ## Outcomes diff --git a/.github/skills/internal-gateway-writing-plans/scripts/plan_authoring.py b/.github/skills/internal-gateway-writing-plans/scripts/plan_authoring.py index bc091611..6fc4b05a 100644 --- a/.github/skills/internal-gateway-writing-plans/scripts/plan_authoring.py +++ b/.github/skills/internal-gateway-writing-plans/scripts/plan_authoring.py @@ -709,11 +709,16 @@ def _token_warnings(plan_folder: Path, profile: str | None = None) -> list[str]: control_names = {"01-change-summary.md", "02-control.md"} control_tokens = sum(tokens for name, tokens in file_tokens if name in control_names) if total_tokens and control_tokens / total_tokens > 0.7: - warnings.append("Initial control read is disproportionately large; compress or split the control files.") + warnings.append("Initial control read is disproportionately large; prefer splitting control facts into numbered files by delivery slice.") for name, tokens in file_tokens: if tokens > 1200: - warnings.append(f"Estimated token weight is high for {name}; split or compress by delivery slice.") + if profile == "extended": + warnings.append( + f"Informational: estimated token weight is high for {name}; prefer splitting into numbered files by delivery slice." + ) + else: + warnings.append(f"Estimated token weight is high for {name}; split or compress by delivery slice.") return warnings diff --git a/.github/skills/internal-python-project/SKILL.md b/.github/skills/internal-python-project/SKILL.md index fa447852..03dfbe87 100644 --- a/.github/skills/internal-python-project/SKILL.md +++ b/.github/skills/internal-python-project/SKILL.md @@ -5,6 +5,11 @@ description: Use when creating or modifying Python package or application code w # Python Project Skill +## Referenced skills + +- `internal-python-script`: route CLI adapters, direct operator execution, and rich console reporting boundaries. +- `internal-tdd`: load for bugfixes, features, or project behavior changes with a meaningful public or service seam. + ## When to use - Services, use cases, adapters, packages, and modules in Python applications. @@ -26,9 +31,13 @@ description: Use when creating or modifying Python package or application code w ## Compact Python baseline - Prefer early returns, guard clauses, clear names, and readable control flow. +- Keep functions small enough to read without tracing hidden state. Prefer explicit inputs over module-level lookups inside reusable logic. - Add type hints on public or non-trivial function signatures. - Keep comments, docstrings, logs, exceptions, and CLI output in English. - Use the repository-declared runtime before falling back to ambient `python3`. +- Centralize behavioral configuration instead of scattering magic values through services, adapters, or library modules. Put environment-specific and operator-tuned values in a settings module, application factory, CLI adapter, framework configuration, or composition root. +- Pass configuration into reusable project code through typed settings, constructor arguments, or function parameters. Domain and service code should not read environment variables, files, or deployment defaults directly unless that boundary is its explicit responsibility. +- Do not confuse domain invariants with configuration. Stable rules that belong to the domain may stay near the domain code; deployment-specific paths, endpoints, thresholds, defaults, and feature switches should live at the configuration boundary. - Do not vendor libraries, wheelhouses, copied site-packages, or fallback dependency mirrors. - If external packages are introduced, keep exact pins and hashes in the owning requirements file. @@ -38,23 +47,30 @@ description: Use when creating or modifying Python package or application code w - Choose async only when the workload is I/O-bound and the surrounding stack supports it cleanly. - Keep request or transport models, domain logic, and persistence concerns in separate modules. - Prefer a domain/service/adapter decomposition before adding generic catch-all modules. -- Keep reusable module and service logs neutral or structured; reserve emoji log formatting for outer operator-facing entrypoints. +- Keep reusable module and service logs neutral, structured, or framework-native. Log events should be parsable, searchable, and useful in production. +- Design professional reporting as a boundary concern: core project code returns typed results, events, or DTOs; adapters decide whether to render JSON, HTTP responses, framework logs, metrics, or human-facing CLI reports. +- No emoji or `rich` rendering inside importable domain, service, persistence, framework modules, or machine-readable output paths such as JSON. Use `rich` only in human-facing CLI adapter reporting. +- If a project exposes a CLI adapter, keep the CLI adapter thin and route its operator-facing reporting to the script boundary. A CLI adapter may use an `ExecutionReporter`; the core project code should not know that reporter exists. Load `references/examples.md` when you need a minimal module or test example. +Load `references/logging-and-reporting.md` when project code needs a professional logging/reporting layout, structured log context, result DTOs, adapter-owned rendering, or JSON versus human-output boundaries. + ## Testing - Follow the repository pytest defaults. - BDD-like names: `given_when_then` style. - Prefer fixtures, parameterization, and mocking only when they reduce duplication or isolate real external boundaries. - Use coverage reports to close meaningful behavioral gaps, not as a blanket 100% doctrine. -- For modify tasks: edit implementation first, run existing tests, then update tests only for intentional behavior changes. +- For bugfixes, features, and intentional behavior changes, start test-first through the public API, service boundary, adapter contract, or framework-owned seam: add or update the failing test, confirm it fails for the intended reason, then implement the smallest fix. +- For refactors, prose-only updates, generated fixtures, or mechanical formatting with no executable behavior change, run existing focused tests and syntax validation instead of manufacturing speculative tests. ## Architecture and framework guidance - Follow the repository's existing framework before introducing FastAPI, Flask, Django, or a new dependency stack. - Use dataclasses or typed DTOs for internal contracts, and boundary-validation models where the framework already expects them. - Keep async flows end-to-end; do not mix blocking libraries into async request paths without an explicit bridge. +- When Ruff is configured for the project, let `ruff format` own formatting and use `ruff check` before broader test runs. Avoid hand-formatting that creates churn against the configured formatter. ## Test-shape guidance @@ -70,5 +86,6 @@ Load `references/common-mistakes.md` for the full mistake table. ## Validation - `python -m compileall ` (syntax check) +- `pip install --require-hashes -r requirements.txt` (dependency integrity check, only when requirements change) - `pytest tests/` (run tests) - Lint with project's configured linter. diff --git a/.github/skills/internal-python-project/references/common-mistakes.md b/.github/skills/internal-python-project/references/common-mistakes.md index 592bdf60..a4c1eb43 100644 --- a/.github/skills/internal-python-project/references/common-mistakes.md +++ b/.github/skills/internal-python-project/references/common-mistakes.md @@ -6,8 +6,10 @@ | Mutable default arguments (`def f(items=[])`) | Shared state between calls — classic Python gotcha | Use `None` default + create inside function | | Bare `except:` or `except Exception:` | Swallows `KeyboardInterrupt`, `SystemExit` | Catch specific exceptions | | No type hints on public API | Hard to understand contracts, no static analysis | Add type hints on function signatures | +| Updating dependency requirements without refreshed hashes | Reproducible installs break or drift silently | Regenerate exact pins and hashes, then validate with `pip install --require-hashes -r requirements.txt` | | Tests that depend on execution order | Fragile test suite, non-deterministic failures | Each test must be self-contained | | Forcing async into CPU-bound or simple flows | Adds complexity without throughput benefit | Keep it synchronous unless I/O concurrency is the real bottleneck | | Mocking internal implementation details | Makes tests brittle and hides real regressions | Mock only true external boundaries | +| Using `rich`, emoji, tables, or panels outside human-facing CLI adapter reporting | Mixes terminal UI with project behavior or machine-readable output such as JSON | Keep project logs neutral or structured, keep data output plain, and put `rich` reporting in a CLI adapter | | Treating line coverage as the goal | Inflates test volume without improving defect detection | Target coverage around changed behavior and risky paths | | God classes with 10+ methods | Hard to test, hard to reason about | Split by responsibility into focused classes | diff --git a/.github/skills/internal-python-project/references/logging-and-reporting.md b/.github/skills/internal-python-project/references/logging-and-reporting.md new file mode 100644 index 00000000..682e8337 --- /dev/null +++ b/.github/skills/internal-python-project/references/logging-and-reporting.md @@ -0,0 +1,101 @@ +# Python Project Logging And Reporting + +Use this reference when Python project code needs professional logging, reporting layout, structured log context, result DTOs, adapter-owned rendering, or a clear boundary between JSON/data output and human-facing CLI reporting. + +## Boundary + +- Project internals should expose behavior through typed results, domain events, DTOs, return values, exceptions, or framework contracts. +- Domain, service, persistence, and framework modules should use standard `logging` or the repository framework's native logging. +- Logs from importable modules should be neutral, structured when useful, and parsable in production. +- Human-facing rendering belongs to adapters: CLI, admin command, report command, or delivery script. +- Machine-readable outputs such as JSON, API responses, event payloads, or exported files must stay plain data. Do not decorate them with `rich`, emoji, color, panels, or tables. +- A CLI adapter may use the script `ExecutionReporter` pattern or `rich`, but the project core should not import or know about that reporter. + +## Professional Layout + +Prefer this ownership split when the project needs both reusable behavior and operator-facing reporting: + +```text +src/{package}/ +├── domain/ # entities, value objects, domain rules; no logging UI +├── services/ # use cases; structured logging and typed results +├── adapters/ +│ ├── cli.py # optional human-facing rendering boundary +│ ├── http.py # framework/API response boundary +│ └── persistence.py +└── observability.py # logger setup helpers only when the project owns setup +``` + +Use existing repository structure first. Do not create these folders just to satisfy the shape when the current project has a clearer convention. + +## Logging Shape + +Use stable event names and explicit context. Prefer values that help production search, alerting, and diagnosis. + +```python +from __future__ import annotations + +import logging +from dataclasses import dataclass +from pathlib import Path + +logger = logging.getLogger(__name__) + + +@dataclass(frozen=True) +class ImportSummary: + imported_count: int + skipped_count: int + output_path: Path + + +def import_records(source_path: Path, output_path: Path) -> ImportSummary: + logger.info( + "records_import_started", + extra={"source_path": source_path.as_posix(), "output_path": output_path.as_posix()}, + ) + + summary = ImportSummary(imported_count=12, skipped_count=1, output_path=output_path) + + logger.info( + "records_import_completed", + extra={ + "imported_count": summary.imported_count, + "skipped_count": summary.skipped_count, + "output_path": summary.output_path.as_posix(), + }, + ) + return summary +``` + +## Adapter Rendering + +Adapters translate project results into the output contract for that boundary. + +```python +def summary_to_json(summary: ImportSummary) -> dict[str, object]: + return { + "imported_count": summary.imported_count, + "skipped_count": summary.skipped_count, + "output_path": summary.output_path.as_posix(), + } + + +def render_human_summary(summary: ImportSummary, reporter: object) -> None: + reporter.summary( + status="completed", + counts={"imported": summary.imported_count, "skipped": summary.skipped_count}, + produced_files=[summary.output_path], + diagnostics=[], + ) +``` + +The JSON adapter returns plain data. The human adapter may use `ExecutionReporter` or `rich` if the CLI/reporting boundary owns that dependency. + +## Review Checklist + +- Does the core return typed results or framework-native responses instead of printing? +- Are logs searchable and useful without terminal formatting? +- Are secrets, tokens, bearer values, passwords, credentials, and sensitive payloads omitted or redacted? +- Is JSON or other machine-readable output plain data? +- If `rich` appears, is it isolated to a human-facing CLI/reporting adapter with a dependency decision note? diff --git a/.github/skills/internal-python-script/SKILL.md b/.github/skills/internal-python-script/SKILL.md index 95ddd2b8..509f942c 100644 --- a/.github/skills/internal-python-script/SKILL.md +++ b/.github/skills/internal-python-script/SKILL.md @@ -5,6 +5,11 @@ description: Use when creating or modifying standalone Python scripts, CLIs, or # Python Script Skill +## Referenced skills + +- `internal-python-project`: route away when imported package, application, service, or framework behavior becomes the primary contract. +- `internal-tdd`: load for bugfixes, features, or script behavior changes with a meaningful executable seam. + ## When to use - New standalone Python scripts. @@ -27,23 +32,30 @@ description: Use when creating or modifying standalone Python scripts, CLIs, or - Standalone tools should default to a dedicated folder or toolkit root, not a loose top-level `.py` file. - Keep entrypoints thin: parse arguments, resolve paths, orchestrate helpers, and return an exit code through `main() -> int` plus `raise SystemExit(main())`. +- Keep script-owned configuration visible at the entrypoint boundary. In single-file scripts, place a clearly named `Configuration` section near the end of the file, after helper definitions and before `main()` or `raise SystemExit(main())`. +- Name configuration values by purpose, not by type: paths, file names, field lists, thresholds, defaults, mappings, filters, and output modes should explain what behavior they control. +- Do not hide script-specific configuration inside helper modules or libraries. Helpers should accept explicit parameters or a small typed settings object when several values travel together. - Keep single-file scripts under 400 lines when possible. At 300 lines, review whether orchestration and helper boundaries stay clear; at 400 lines, split-or-justify is required. - Place shared helper logic in local helper modules, preferably under `utils/` when the toolkit structure supports that layout. - For operator-facing script work, crossing the 400-line threshold should move toward a toolkit or project structure according to the primary contract, not an ever-growing single entrypoint. - Keep policy checks focused on maintained source; generated outputs and large fixture data are excluded unless directly edited. - Prefer `argparse`, `pathlib.Path`, and small helper functions for operator-facing tools. -- Keep emoji logs at operator-facing boundaries such as start, success, warning, and failure states; keep reusable helpers free of decorative log formatting. +- Keep operator-facing console reporting centralized in a dedicated reporter, for example `ExecutionReporter`. Application logic should call semantic reporter methods instead of constructing styled strings or scattered `print()` calls. +- Use `rich` as the preferred console rendering library for polished human-facing CLI reports when the terminal experience is part of the contract. Keep it out of `--format json`, other machine-readable outputs, and reusable helper logic. +- Keep emoji, panels, tables, and color at human-facing boundaries such as banners, sections, success, warning, error, and summaries. Keep reusable helpers and machine-readable output paths free of decorative log formatting. +- Load `references/reporting.md` when a script needs professional console reporting, `rich` rendering, an `ExecutionReporter` shape, redaction rules, or verbose/debug output boundaries. - When a tool can be called from subdirectories, resolve the repository root explicitly instead of assuming the current working directory. - Use type hints on non-trivial public helpers and CLI-facing boundaries. - Use `asyncio` only when the script truly coordinates multiple I/O-bound tasks. - Reach for `pathlib`, context managers, and small helper functions before adding framework-like structure to a script. -- Add machine-readable output such as `--format json` only when the tool has a real automation consumer. Keep text output as the default operator path. +- Add machine-readable output such as `--format json` only when the tool has a real automation consumer. Keep text output as the default operator path, and do not decorate machine-readable output with `rich`, emoji, color, or tables. - When machine-readable output can become large and the script is agent-facing, add a bounded mode such as `--format compact` that preserves status, blocker or finding counts, key path evidence, and next action without dumping full detail. - Keep full `--format json` available for durable audit/debug use; do not replace it with compact mode. ## Compact Python baseline - Prefer early returns, guard clauses, clear names, and readable control flow. +- Keep script-owned configuration at the entrypoint boundary with simple descriptive names. - Add type hints on public or non-trivial function signatures. - Keep comments, docstrings, logs, exceptions, and CLI output in English. - Use the repository-declared runtime before falling back to ambient `python3`. @@ -64,12 +76,15 @@ Dependency decision note - Keep the note short and task-specific. - Compare the standard library with realistic third-party candidates. - If the final choice uses external libraries, create or update the local `requirements.txt` before finishing the task. +- Keep exact pins and current hashes in `requirements.txt`. Use `pip-compile --generate-hashes` or an equivalent repository-approved workflow, then validate with `pip install --require-hashes -r requirements.txt` when the requirements file changes. - If several entrypoints share the same lock file, record the decision once at the shared toolkit `requirements.txt` rather than repeating it in every script. ## Layout and templates Load `references/layout-and-templates.md` when you need the default folder layout, a repo-aligned multi-tool toolkit layout, a minimal entry point, a hash-locked `requirements.txt`, or the launcher pattern. +Load `references/reporting.md` when the script needs a richer `ExecutionReporter`, `rich` console rendering, status tables, redaction behavior, or a final operator summary. + Keep these rules visible while drafting: - Use a dedicated tool folder or toolkit root rather than a loose top-level `.py` file. @@ -82,7 +97,9 @@ Keep these rules visible while drafting: - Follow the repository pytest defaults. - Use coverage reports to inspect missing behavior on touched code, not to force blanket 100% coverage. -- For modify tasks: edit implementation first, run existing tests, then update tests only for intentional behavior changes. +- For bugfixes, features, and intentional behavior changes, start test-first through the public CLI or stable helper seam: add or update the failing test, confirm it fails for the intended reason, then implement the smallest fix. +- For refactors, prose-only updates, generated fixtures, or mechanical formatting with no executable behavior change, run the existing focused tests plus `py_compile` or `compileall` instead of manufacturing speculative tests. +- When Ruff is configured, run `ruff format` for formatting-only Python edits and `ruff check` for lint feedback before wider test runs. - Prefer existing repository commands such as `make lint`, `make test`, or a shared script runner before inventing a one-off validation path. ## Runtime guidance @@ -99,5 +116,6 @@ Load `references/common-mistakes.md` for the full mistake table. - `python -m py_compile .py` (syntax check) - `bash -n run.sh` (launcher syntax check, only when `run.sh` exists) +- `pip install --require-hashes -r requirements.txt` (dependency integrity check, only when requirements change) - `pytest tests/` (run tests) - `python -m compileall ` or the repository's canonical shared runner when the tool already lives inside a maintained toolkit diff --git a/.github/skills/internal-python-script/references/common-mistakes.md b/.github/skills/internal-python-script/references/common-mistakes.md index d1f551cf..e104cfdb 100644 --- a/.github/skills/internal-python-script/references/common-mistakes.md +++ b/.github/skills/internal-python-script/references/common-mistakes.md @@ -9,6 +9,7 @@ | No argument parsing | Caller has to modify script source to change behavior | Use `argparse` for any configurable parameter | | Installing deps globally or without hash-locked version pinning | Non-reproducible environment and hidden setup drift | Keep dependencies in the local `requirements.txt` with exact pins and hashes | | Adding an empty `requirements.txt` to a stdlib-only tool | Adds noise and implies missing setup steps | Omit `requirements.txt` when the script uses only the standard library | +| Updating `requirements.txt` without refreshed hashes | Breaks reproducible installs and hides dependency drift | Regenerate exact pins and hashes, then validate with `pip install --require-hashes -r requirements.txt` | | Wrapping a stdlib-only script in Bash | Adds setup indirection without solving a real dependency problem | Document direct `python3