Skip to content

Add canonical PAPERSCOUT_<FIELD> process-env overrides for every Sett…#67

Merged
wpak-ai merged 4 commits into
developfrom
refactor/flat-config
Jun 17, 2026
Merged

Add canonical PAPERSCOUT_<FIELD> process-env overrides for every Sett…#67
wpak-ai merged 4 commits into
developfrom
refactor/flat-config

Conversation

@henry0816191

@henry0816191 henry0816191 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add canonical PAPERSCOUT_<FIELD> process-env overrides for every Settings field while keeping legacy unprefixed names (DATABASE_URL, SLACK_BOT_TOKEN, etc.) fully supported in .env files and as fallbacks.

The Settings model was already flat; this PR documents the 1:1 env mapping and wires a dual-read loader so operators can override any setting via Docker -e, shell export, or CI without editing .env.

Changes

  • config.py: _PrefixedEnvSource + settings_customise_sources; ENV_VAR_MAP and expanded docstring
  • test_config.py: tests for prefixed load, legacy fallback, precedence, and field coverage
  • Docs: .env.example, README, onboarding, SERVER_SETUP — document canonical PAPERSCOUT_* names

Behavior

Source Example Notes
Process env (canonical) PAPERSCOUT_DATABASE_URL Wins over legacy when both set
.env / legacy env DATABASE_URL Unchanged; existing deployments unaffected

Test plan

  • uv run pytest tests/test_config.py — 7 passed
  • uv run pytest tests/ — 369 passed, 2 skipped

Related issue

Summary by CodeRabbit

  • New Features

    • Added canonical PAPERSCOUT_-prefixed environment variables with higher precedence than legacy unprefixed names.
    • Prefixed values are read from process environment and do not load from .env, while legacy .env behavior remains supported.
  • Documentation

    • Updated README, onboarding, server setup, and .env.example to document the canonical naming scheme, override behavior, and example settings.
  • Tests

    • Added/expanded coverage for precedence rules, boolean parsing, complete field-to-environment mapping, and dotenv handling.

@henry0816191 henry0816191 requested a review from wpak-ai as a code owner June 17, 2026 17:41
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c5ee3d6f-7c21-402d-97ca-1b6a1eced2a8

📥 Commits

Reviewing files that changed from the base of the PR and between b418335 and 85f99e1.

📒 Files selected for processing (1)
  • .env.example
✅ Files skipped from review due to trivial changes (1)
  • .env.example

📝 Walkthrough

Walkthrough

src/paperscout/config.py gains a custom _PrefixedEnvSource that reads PAPERSCOUT_-prefixed keys from the process environment only (not .env), a settings_customise_sources override placing it above the legacy env/dotenv source, and an exported ENV_VAR_MAP. Tests verify the priority and fallback behavior. Documentation in README.md, docs/onboarding.md, deploy/SERVER_SETUP.md, .env.example, and the CI workflow are updated to describe the dual naming scheme.

Changes

Canonical PAPERSCOUT_ env var prefix support

Layer / File(s) Summary
Prefixed env source and ENV_VAR_MAP in config.py
src/paperscout/config.py
Introduces ENV_VAR_PREFIX, prefixed_env_name/legacy_env_name helpers, _PrefixedEnvSource (reads only PAPERSCOUT_-prefixed keys from process env, not .env), settings_customise_sources inserting that source between constructor kwargs and legacy env/dotenv, and ENV_VAR_MAP mapping each Settings field to its (prefixed, legacy) env var name pair.
Tests for prefixed/legacy loading and ENV_VAR_MAP completeness
tests/test_config.py
Adds imports of ENV_VAR_MAP and prefixed_env_name; adds tests confirming poll_interval_minutes is read from the prefixed var, falls back to the legacy var, prefixed takes precedence over legacy, ENV_VAR_MAP covers all Settings.model_fields with correctly derived names, and dotenv behavior respects source priority.
Example config and documentation updates
.env.example, .github/workflows/ci.yml, README.md, deploy/SERVER_SETUP.md, docs/onboarding.md
Updates .env.example with header documentation, ENABLE_OPEN_STD=false entry, and message queue internals section; updates README.md env-var tables, onboarding prose (two locations), SERVER_SETUP.md, and CI workflow comments to describe PAPERSCOUT_<FIELD> canonical overrides, their precedence over legacy .env names, and ENV_VAR_MAP reference in config.py.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • wpak-ai

Poem

🐰 Hoppity-hop through the env-var maze,
PAPERSCOUT_ leads the way these days!
Legacy names still answer the call,
But prefixed keys stand tallest of all.
A map was drawn so none go astray —
The rabbit configured it perfectly today! 🗺️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding canonical PAPERSCOUT_ process-env overrides while keeping legacy names supported.
Description check ✅ Passed The PR description comprehensively documents the changes, includes a detailed test plan section with test results, and clearly links the related issue #64.
Linked Issues check ✅ Passed The PR fulfills all six acceptance criteria from #64: flattens config namespace via ENV_VAR_MAP and dual-read loader, documents 1:1 PAPERSCOUT_ mapping, updates documentation, maintains backward compatibility, and includes comprehensive tests with no regressions.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the namespace flattening objective: config loader implementation, documentation updates, and test coverage. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/flat-config

Comment @coderabbitai help to get the list of available commands and usage tips.

@henry0816191

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review again

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== src/paperscout/config.py ==="
cat src/paperscout/config.py

echo ""
echo "=== tests/test_config.py ==="
cat tests/test_config.py

echo ""
echo "=== .env.example ==="
cat .env.example

echo ""
echo "=== README.md ==="
cat README.md

echo ""
echo "=== docs/onboarding.md ==="
cat docs/onboarding.md

echo ""
echo "=== deploy/SERVER_SETUP.md ==="
cat deploy/SERVER_SETUP.md

echo ""
echo "=== .github/workflows/ci.yml ==="
cat .github/workflows/ci.yml

Length of output: 72623



@henry0816191 Here is a full review of PR #67.


Review of PR #67 — Canonical PAPERSCOUT_* Environment Variable Overrides

Summary

The PR correctly implements a dual-read settings loader: _PrefixedEnvSource reads PAPERSCOUT_<FIELD> from the process environment (not .env), the default env_settings reads unprefixed names from the process environment, and dotenv_settings reads the .env file. The priority order is well-documented and the tests cover the main paths. A few issues are worth addressing.


src/paperscout/config.py

[Medium] ENV_VAR_MAP uses a bare positional tuple — no named positions

ENV_VAR_MAP: dict[str, tuple[str, str]] = _build_env_var_map()

Callers must remember that index 0 is the prefixed name and index 1 is the legacy name. A NamedTuple makes this self-documenting and prevents future index-order bugs:

from typing import NamedTuple

class EnvVarNames(NamedTuple):
    prefixed: str   # PAPERSCOUT_<FIELD>
    legacy: str     # <FIELD>.upper()

ENV_VAR_MAP: dict[str, EnvVarNames] = {
    name: EnvVarNames(prefixed_env_name(name), legacy_env_name(name))
    for name in Settings.model_fields
}

The test's unpacking prefixed, legacy = ENV_VAR_MAP[field_name] still works unchanged because NamedTuple supports positional unpacking.


[Medium] legacy_env_name() diverges from pydantic-settings' actual resolution when Field(validation_alias=...) or Field(alias=...) is used

def legacy_env_name(field_name: str) -> str:
    return field_name.upper()

pydantic-settings resolves env var names using the field's validation_alias (or alias) when present, falling back to field_name. The current implementation silently ignores aliases. The existing fields happen to have no custom aliases, but this is a latent bug. A more robust version reads the field info directly:

def legacy_env_name(field_name: str) -> str:
    field = Settings.model_fields.get(field_name)
    if field and field.validation_alias:
        alias = field.validation_alias
        if isinstance(alias, str):
            return alias.upper()
    return field_name.upper()

Alternatively, add a comment explicitly documenting the assumption that no field uses a custom alias.


[Low] _PrefixedEnvSource does not assert it ignores .env

The docstring says "not .env" but there is no enforcement. EnvSettingsSource indeed does not load .env files (that is DotEnvSettingsSource's job), so the behavior is correct — but adding a brief comment or an __init__ that explicitly sets env_file=None (if the API supports it) would make the intent durable against future pydantic-settings changes.


tests/test_config.py

[Medium] test_env_mapping_covers_all_fields is tautological

def test_env_mapping_covers_all_fields():
    for field_name in Settings.model_fields:
        assert field_name in ENV_VAR_MAP
        ...

_build_env_var_map() builds the map by iterating over Settings.model_fields, so by construction every field is always in the map. This test can only fail if _build_env_var_map is rewritten to use a different data source — which is unlikely. Consider replacing the field_name in ENV_VAR_MAP assertion with something with real regression value, e.g. verifying the map has no extra keys:

assert set(ENV_VAR_MAP.keys()) == set(Settings.model_fields.keys())

Or add a test that verifies a specific well-known field to catch a full replacement of _build_env_var_map:

assert "database_url" in ENV_VAR_MAP
assert ENV_VAR_MAP["database_url"].prefixed == "PAPERSCOUT_DATABASE_URL"
assert ENV_VAR_MAP["database_url"].legacy == "DATABASE_URL"

[Low] No test covering non-integer prefixed fields

All three behavioral tests (test_prefixed_process_env_loads, test_legacy_process_env_fallback, test_prefixed_wins_over_legacy) use poll_interval_minutes (integer). There are no tests for string fields (database_url), boolean fields (enable_bulk_wg21), list fields (probe_prefixes), or Path fields (data_dir). pydantic-settings coercion for booleans (e.g. "true"/"false") and lists (JSON strings) should be exercised at least once via the prefixed path.


[Low] Missing test: PAPERSCOUT_* keys in .env are silently ignored

The docstring and .env.example explicitly state that prefixed names are not read from .env. This is a documented contract that has no test coverage. A regression test using tmp_path + a fake .env file would protect against a future pydantic-settings change that causes _PrefixedEnvSource to start reading .env:

def test_prefixed_name_in_dotenv_is_ignored(monkeypatch, tmp_path):
    monkeypatch.setenv("_PAPERSCOUT_TESTING", "1")
    monkeypatch.delenv("PAPERSCOUT_POLL_INTERVAL_MINUTES", raising=False)
    env_file = tmp_path / ".env"
    env_file.write_text("PAPERSCOUT_POLL_INTERVAL_MINUTES=77\n")
    monkeypatch.chdir(tmp_path)
    s = Settings()
    assert s.poll_interval_minutes == 30  # default, not 77

README.md

[Low] "Required" table has a "Canonical process-env" column; all other tables do not

The inconsistency may confuse operators who see canonical names for DATABASE_URL but not for PORT, POLL_INTERVAL_MINUTES, etc. Either add the column to all tables (possibly as a simple tooltip note like "use PAPERSCOUT_<NAME> for process-env") or add a single note at the top of the section explaining the pattern applies universally. The latter is shorter and avoids doubling every table's width.


.env.example

[Low] Several Settings fields are absent

The following fields have defaults and are documented in the Settings class but have no corresponding entry in .env.example (making them hard to discover):

Field Would be
enable_open_std ENABLE_OPEN_STD=false
wg21_index_timeout_s WG21_INDEX_TIMEOUT_S=30
mq_max_retries MQ_MAX_RETRIES=8
mq_circuit_breaker_threshold MQ_CIRCUIT_BREAKER_THRESHOLD=5
mq_circuit_breaker_cooldown_seconds MQ_CIRCUIT_BREAKER_COOLDOWN_SECONDS=60
mq_max_size MQ_MAX_SIZE=1000

These may be intentionally omitted (internal / rarely tuned), but if so a brief comment in .env.example saying "message-queue internals are tunable via MQ_* vars; see config.py" would make the omission explicit.


Minor nits

  • config.py Line 26: _PrefixedEnvSource is a private class (single underscore) but is imported by tests/test_config.py indirectly through settings_customise_sources. No action needed, but worth keeping the import chain in mind if the class is ever renamed.
  • .github/workflows/ci.yml: The added comment ends with etc. but only shows two examples; could say "see ENV_VAR_MAP in config.py" to be more useful to a CI reader.

Overall, the implementation is sound and the backward-compatibility design is correct. The main actionable items are the NamedTuple for ENV_VAR_MAP, the legacy_env_name alias fragility, and the missing .env-isolation regression test.

henry0816191 and others added 2 commits June 18, 2026 02:05
Add tests that PAPERSCOUT_* keys in .env are ignored while legacy keys
still load, and clarify that all README env tables support canonical
process-env overrides via the PAPERSCOUT_ prefix.

Co-authored-by: Cursor <cursoragent@cursor.com>
@henry0816191 henry0816191 force-pushed the refactor/flat-config branch from babe7e9 to 2d926f7 Compare June 17, 2026 18:06

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.env.example:
- Line 30: The `ENABLE_OPEN_STD` variable is currently out of order and should
appear before scheduling-related variables like `POLL_INTERVAL_MINUTES`.
Reorganize the environment variables in the `.env.example` file so that all
`ENABLE_*` flags appear together as a logical group before any numeric polling
or scheduling settings. This ensures consistent alphabetical and functional
ordering throughout the configuration file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 635233e8-301e-47b0-8455-7d9398de664b

📥 Commits

Reviewing files that changed from the base of the PR and between 2d926f7 and b418335.

📒 Files selected for processing (4)
  • .env.example
  • .github/workflows/ci.yml
  • src/paperscout/config.py
  • tests/test_config.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_config.py

Comment thread .env.example
@wpak-ai wpak-ai merged commit e56e8e2 into develop Jun 17, 2026
10 checks passed
@wpak-ai wpak-ai deleted the refactor/flat-config branch June 17, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flat config namespace restructuring

2 participants