Skip to content

Add CI guard for manual .xlf edits in human-authored PRs#8902

Open
Evangelink wants to merge 3 commits into
mainfrom
dev/amauryleve/xlf-manual-edit-guard
Open

Add CI guard for manual .xlf edits in human-authored PRs#8902
Evangelink wants to merge 3 commits into
mainfrom
dev/amauryleve/xlf-manual-edit-guard

Conversation

@Evangelink
Copy link
Copy Markdown
Member

Summary

The repository policy in copilot-instructions.md → Localization Guidelines is explicit:

NEVER manually modify *.xlf files. Instead, regenerate them by running dotnet msbuild <project>.csproj /t:UpdateXlf on the owning project.

Despite that, hand-edited .xlf files do occasionally land in PRs. They get silently reverted by the next OneLocBuild bulk update, so the fix is lost and any regression returns. This PR encodes the rule as a CI check so it's a property of the repository rather than reviewer vigilance.

What the guard does

  • Triggers on every pull_request whose diff touches **/*.xlf or **/*.resx.
  • Bot PRs are exempt (dotnet-bot, dotnet-maestro, any *[bot] account) — OneLocBuild owns the bulk update path.
  • For human PRs, every changed .xlf is mapped to its resource basename (e.g. Resources/xlf/Resource.cs.xlfResource). If the diff doesn't also contain a .resx with the matching basename, the check fails with a clear remediation message.
  • Basename matching (not strict path matching) is used because testfx places .xlf in an xlf/ subdirectory while the matching .resx lives in the parent Resources/ directory.

Why blocking rather than a soft notice

  • The policy uses "NEVER" — this is a hard rule, not a guideline.
  • The escape hatch is trivial (touch the .resx too, or get the OneLocBuild bot to do it), so the cost of a false positive is low.
  • Translation typo fixes that are genuinely one-off go via the OneLocBuild submission flow; hand-edits get overwritten anyway.

If reviewers prefer a soft warning (::notice annotation), I can switch the workflow — just say the word. I went with blocking because the policy wording justifies it.

Files added

  • .github/scripts/check_xlf_manual_edit.py — diff parser, bot detection, basename matching, exit-code reporting
  • .github/workflows/xlf-manual-edit-guard.yml — paths-scoped pull_request workflow, contents: read only, 5-minute timeout

Tested locally

Five synthetic diffs covering: bot exemption, [bot]-suffix wildcard, locale-bearing .xlf with matching .resx, no-locale .xlf, multi-segment locale (zh-Hans), and the testfx Resources/xlf/ cross-directory layout. All cases produced the expected exit code.


Part of a small batch of repo-maintenance improvements identified from a one-month review of PR review-comment patterns (companion PRs: #8898 expert-reviewer extensions, #8899 msbuild-reviewer Rule B-3, #8900 PublicAPI init-accessor guard, #8901 CLI help-text drift reminder).

The repository policy (.github/copilot-instructions.md, Localization
Guidelines section) is explicit:

    NEVER manually modify *.xlf files. Instead, regenerate them by
    running `dotnet msbuild <project>.csproj /t:UpdateXlf` on the
    owning project.

The OneLocBuild bot (dotnet-bot) owns the bulk translation update
path. A human PR may legitimately touch *.xlf files only when those
changes are the output of /t:UpdateXlf after a *.resx edit -- in which
case the same PR will also include the *.resx change. A PR that
hand-edits *.xlf without any *.resx change is the defect this guard
catches.

What the guard does:

* On every pull_request whose diff touches *.xlf or *.resx, run
  .github/scripts/check_xlf_manual_edit.py.
* Bot PRs (dotnet-bot, dotnet-maestro, any *[bot] account) are exempt
  and the script exits 0 immediately.
* For human PRs the script extracts the basename of each changed *.xlf
  (e.g. `Resources/xlf/Resource.cs.xlf` -> `Resource`) and checks
  whether the diff also includes a *.resx with the same basename.
  Basename matching is used (not strict path matching) because testfx
  places *.xlf in a `xlf/` subdirectory while the matching *.resx
  lives in the parent `Resources/` directory.
* If any *.xlf is unaccompanied, the workflow fails with a clear
  remediation message (edit the *.resx, run /t:UpdateXlf, commit
  both).

Why blocking rather than a soft notice:

* The policy uses "NEVER" -- this is a hard rule, not a guideline.
* The escape hatch is trivial (touch the *.resx too, or get the
  OneLocBuild bot to do it), so the cost of a false positive is low.
* Translation typo fixes that are intentionally one-off can be done
  via the OneLocBuild submission flow; manual *.xlf hand-edits would
  be overwritten on the next bulk update anyway.

Why this matters now:

* Hand-edited *.xlf files get silently reverted by the next
  OneLocBuild run, so the fix is lost and the bug regresses.
* The current PR-review process catches this inconsistently; encoding
  the rule as a CI check makes it a property of the repository rather
  than reviewer vigilance.

Tested locally with five synthetic diffs covering: bot exemption,
[bot]-suffix wildcard, locale-bearing *.xlf with matching *.resx,
no-locale *.xlf, multi-segment locale (zh-Hans), and the testfx
`Resources/xlf/` cross-directory layout. All cases produced the
expected exit code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 7, 2026 08:33
Comment thread .github/scripts/check_xlf_manual_edit.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a pull-request-time CI guard that enforces the repo’s localization policy: human-authored PRs must not include standalone .xlf changes (they must be accompanied by a corresponding .resx change), while bot-authored localization updates are exempt.

Changes:

  • Introduces a new pull_request workflow scoped to .xlf/.resx changes that runs a fast Python-based validation step.
  • Adds a Python script that detects whether changed .xlf files have a matching .resx basename also changed in the same PR, and emits a clear remediation report on failure.
  • Implements bot author exemptions (known bot list + *[bot] suffix) to allow OneLocBuild/dependency-bot localization updates.
Show a summary per file
File Description
.github/workflows/xlf-manual-edit-guard.yml New PR workflow to run the guard on .xlf/.resx diffs with minimal permissions and short timeout.
.github/scripts/check_xlf_manual_edit.py Implements the diff path parsing, bot exemption logic, basename matching, and step-summary reporting used by the workflow.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

Copy link
Copy Markdown
Member Author

@Evangelink Evangelink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated review by GitHub Copilot. Posted via a maintainer's GitHub token, so it appears under their account — the account owner did not write or approve this content personally. Generated by the Expert Code Review workflow. To request a follow-up action, reply by tagging @copilot directly.

Review Summary

This PR adds a new GitHub Actions workflow and a Python script to enforce the repository's localization policy: .xlf files must never be edited by hand — they must be regenerated via dotnet msbuild /t:UpdateXlf after editing the corresponding .resx.

Verdict table

Dimension Result
Logic / Correctness ✅ LGTM
Security ✅ LGTM
CI/CD Workflow Quality ✅ LGTM
Error Handling 💡 One suggestion (non-blocking, see inline comment)

Notable positives

  • Regex is correct. r"\.([a-zA-Z]{2,3}(?:-[A-Za-z0-9]+)*)\.xlf$" correctly strips the locale suffix (e.g. Strings.de.xlfStrings) and falls back gracefully to Path.stem for locale-less XLF files. Basename-only matching is intentionally permissive to accommodate xlf/ subdirectory layouts.
  • Bot exemption logic is clean: both the KNOWN_BOTS allowlist and the [bot]-suffix catch cover all expected automated actors (OneLocBuild, Maestro, GitHub Actions).
  • Minimal permissions (contents: read only) and timeout-minutes: 5 are appropriate.
  • pull_request (not pull_request_target) is the correct, safer trigger for a read-only guard that needs to run on fork PRs.
  • fetch-depth: 0 is required and present for the three-dot git diff origin/<base>...HEAD diff to resolve correctly.
  • write_step_summary is guarded against OSError so the script never crashes if GITHUB_STEP_SUMMARY is unavailable.

One non-blocking suggestion

See the inline comment on _run: subprocess.run itself can raise FileNotFoundError / OSError if git is not on PATH. In the ubuntu-latest CI environment this never fires in practice, but a small try/except OSError guard around the call would make the documented exit code 2 ("usage / IO error") accurate for that edge case and produce a friendlier error message than a raw Python traceback.

Generated by Expert Code Review (on open) for issue #8902 · sonnet46 3.4M



def _run(cmd: List[str]) -> str:
result = subprocess.run(cmd, capture_output=True, text=True, check=False)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (non-blocking): subprocess.run can raise FileNotFoundError / OSError when the executable (git) is not found on PATH. The current code assigns result before any returncode check, so if subprocess.run itself raises, the exception propagates uncaught as a raw Python traceback (exit code 1) rather than the documented exit code 2.

In the ubuntu-latest CI runner git is always present, so this never fires in practice. Consider a small guard for robustness:

def _run(cmd: List[str]) -> str:
    try:
        result = subprocess.run(cmd, capture_output=True, text=True, check=False)
    except OSError as exc:
        sys.stderr.write(f"Failed to launch {cmd[0]!r}: {exc}\n")
        sys.exit(2)
    if result.returncode != 0:
        ...

Copy link
Copy Markdown
Member

@Youssef1313 Youssef1313 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at current implementation. But I think the only reliable way to detect this properly would be:

  1. Revert all XLF changes in the given PR.
  2. Touch the corresponding resx (so that it has higher timestamp and MSBuild can see that UpdateXlf is not update to date)
  3. Run dotnet build /t:UpdateXlf on the corresponding project, and ensure it's the same set of changes that were in the PR.

- name: Run xlf manual-edit guard
env:
BASE_SHA: origin/${{ github.event.pull_request.base.ref }}
PR_AUTHOR: ${{ github.event.pull_request.user.login }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessarily complexity. The check can be done in YML right away without passing it down to the Python script.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — fixed in 41ee685. The whole job now if:-skips when the PR author is a bot (user.type == 'Bot' covers GitHub App identities; explicit logins cover dotnet-bot / dotnet-maestro user accounts). Removed KNOWN_BOTS, is_bot(), --author, and the PR_AUTHOR env var from the Python script — it's now pure diff analysis.

runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- uses: actions/checkout@v4
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pinned sha

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 41ee685. Pinned to: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 and actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0 (current tip-of-v4 / tip-of-v5 commits as of 2026-06-07).

* Move bot-author exemption from Python to the workflow `if:`.
  Per the reviewer, the Python script should focus on diff analysis
  and not need to know about workflow context. Removed `KNOWN_BOTS`,
  `is_bot()`, `--author`, and the `PR_AUTHOR` env var from the
  script. The workflow now skips the whole job via:

      if: >-
        github.event.pull_request.user.type != 'Bot' &&
        github.event.pull_request.user.login != 'dotnet-bot' &&
        github.event.pull_request.user.login != 'dotnet-maestro'

  `user.type == 'Bot'` covers GitHub App identities; the explicit
  logins cover the dotnet-bot / dotnet-maestro user accounts.

* Pin actions to SHA per the repository's lock-file convention. The
  workflow now uses:

      actions/checkout@34e1148 # v4.3.1
      actions/setup-python@a26af69 # v5.6.0

  These are the current tip-of-v4 / tip-of-v5 commits as of
  2026-06-07.

* Empty except: narrow `except Exception` to
  `except (AttributeError, ValueError, OSError)` and add an
  explanatory comment. The previous bare except swallowed any
  programmer error from `sys.stdout.reconfigure`; the new tuple
  matches the documented failure modes only.

* Drive-by while in the workflow: also switch the diff base from
  `origin/<base.ref>` to the immutable `base.sha` from the event
  payload, mirroring the fix applied to PR #8900. A base-branch
  advance between the event and the checkout fetch could otherwise
  pull in unrelated commits.

* Rename the env var to `BASE_REF` (with `BASE_SHA` fallback in the
  script's `--base` default for backward compatibility) for
  consistency with the rest of this PR series.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread .github/scripts/check_xlf_manual_edit.py Fixed
The previous fix added an explanatory comment ABOVE the try/except, but
github-code-quality's empty-except rule looks for a comment INSIDE the
except body. Move/extend the comment so it lives next to the `pass`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 7, 2026 11:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 3

Comment on lines +54 to +60
- name: Run xlf manual-edit guard
env:
# Pin the diff base to the immutable PR-event base SHA so a
# base-branch advance between the event and the checkout fetch
# can't pull in unrelated commits and false-flag .xlf paths.
BASE_REF: ${{ github.event.pull_request.base.sha }}
run: python .github/scripts/check_xlf_manual_edit.py
Comment on lines +135 to +149
def find_violations(paths: Iterable[str]) -> List[Tuple[str, str]]:
"""Return `(xlf_path, expected_resx_basename)` for every `.xlf` change
that has no `.resx` change with a matching basename in the diff."""
norm_paths = [p.replace("\\", "/") for p in paths]
resx_basenames: Set[str] = {
Path(p).stem for p in norm_paths if p.endswith(".resx")
}
violations: List[Tuple[str, str]] = []
for p in norm_paths:
if not p.endswith(".xlf"):
continue
base = xlf_to_resx_basename(p)
if base not in resx_basenames:
violations.append((p, base + ".resx"))
return violations
Comment on lines +97 to +98
if "/" in line or line.endswith((".xlf", ".resx", ".cs")):
paths.append(line)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants