Skip to content

fix(skill): force utf-8 stdout in playwright_patterns.md to prevent Windows cp1252 crash#8

Open
raykuo998 wants to merge 1 commit into
microsoft:mainfrom
raykuo998:fix/windows-cp1252-stdout-aria-snapshot
Open

fix(skill): force utf-8 stdout in playwright_patterns.md to prevent Windows cp1252 crash#8
raykuo998 wants to merge 1 commit into
microsoft:mainfrom
raykuo998:fix/windows-cp1252-stdout-aria-snapshot

Conversation

@raykuo998
Copy link
Copy Markdown

Summary

Fixes #7 — the canonical Python skeletons in skills/webwright/reference/playwright_patterns.md crash with UnicodeEncodeError on Windows the moment aria_snapshot() (or any other page text passed to print()) contains a glyph outside cp1252.

Two real-world cases that already trigger it:

  • arxiv search results page — the "▽ More" abstract toggle is U+25BD
  • Any page with typography symbols, emoji, CJK characters, math symbols, etc.

A Windows user copy-pasting the recommended explore skeleton from the doc hits the crash on the first non-trivial page. See issue #7 for the verbatim repro + traceback.

Changes

Single file: skills/webwright/reference/playwright_patterns.md.

  1. Browser launch skeleton — add import sys and sys.stdout.reconfigure(encoding=\"utf-8\") near the top of the heredoc with an inline comment explaining why.
  2. Final-script instrumentation — same reconfigure at the top, plus pass encoding=\"utf-8\" explicitly to LOG.write_text(...) and LOG.open(\"a\", ...) so non-cp1252 glyphs landing in the log file cannot crash on Windows either.
  3. Rules section — added a short "Windows note" describing the line and the equivalent PYTHONIOENCODING=utf-8 environment override.

The reconfigure is a no-op on POSIX where stdout is already utf-8, so the skeletons keep working unchanged on Linux/macOS.

Verification

Without the fix (verbatim heredoc from the doc, default Windows cp1252 stdout):

UnicodeEncodeError: 'charmap' codec can't encode character '▽' in position 4958: character maps to <undefined>
  File ".../Lib/encodings/cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]

With the fix on the same arxiv search results page:

ARIA: <full accessibility tree printed cleanly, U+25BD included>
exit code: 0

Test plan

  • Confirmed verbatim heredoc from the pre-fix doc crashes on Windows 11 + Python 3.12 + Playwright Firefox.
  • Confirmed the patched skeleton runs clean on the same machine, same page.
  • Verified the change is a no-op on POSIX (reconfigure to utf-8 when already utf-8 changes nothing).
  • Ran the full Webwright arxiv-search task end-to-end with the patched Final-script skeleton — all critical points verified, no encoding errors in log or stdout.

…indows cp1252 crash

Both Python skeletons (Browser launch + Final-script instrumentation) now
call sys.stdout.reconfigure(encoding="utf-8") at the top. The Final-script
LOG file writes also pass encoding="utf-8" explicitly so non-cp1252 glyphs
landing in the log cannot crash on Windows either. Added a Windows note
under the Rules section explaining why the line is needed.

The reconfigure is a no-op on POSIX where stdout is already utf-8, so the
skeletons keep working unchanged on Linux/macOS.

Fixes microsoft#7.
@raykuo998
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

playwright_patterns.md 'Browser launch skeleton' crashes with UnicodeEncodeError on Windows when aria_snapshot output contains non-cp1252 characters

1 participant