Environment
- OS: Windows 11 Home 10.0.26200
- Python: 3.12.10 (Windows Store distribution)
- Default
sys.stdout.encoding: cp1252
- Playwright: Firefox 146.0.1 (playwright firefox v1509)
- Webwright commit: 29fc4b4
Summary
The canonical "Browser launch skeleton" code block in skills/webwright/reference/playwright_patterns.md (lines 16–46) and the sibling pattern on line 65 both end with print(...aria_snapshot()). On any page whose accessibility tree contains characters outside cp1252 — for example an arxiv search results page, which carries U+25BD WHITE DOWN-POINTING TRIANGLE inside the "▽ More" abstract toggle — the print call crashes with UnicodeEncodeError under Windows's default cp1252 stdout encoding.
A Windows user who copy-pastes the recommended explore skeleton from Webwright's own reference doc will hit this on the very first non-trivial page they look at. The "Final-script instrumentation" example on lines 129–166 is affected at line 143's print(line, end="") for the same reason (the file write uses open(..., encoding="utf-8") so the log file itself is fine, but the mirror print to stdout still hits cp1252).
Reproduction
# Windows PowerShell, Python 3.12 default install, no PYTHONIOENCODING set
python - <<'PY'
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as pw:
browser = await pw.firefox.launch(headless=True)
ctx = await browser.new_context(viewport={"width": 1280, "height": 1800})
page = await ctx.new_page()
await page.goto(
"https://arxiv.org/search/?query=browser+agent&searchtype=all",
wait_until="domcontentloaded",
)
snapshot = await page.locator("body").aria_snapshot()
print("ARIA:", snapshot)
await browser.close()
asyncio.run(main())
PY
Expected
ARIA snapshot prints to stdout without crashing.
Actual
UnicodeEncodeError: 'charmap' codec can't encode character '▽' in position 4958: character maps to <undefined>
File ".../Lib/encodings/cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
Root cause
Python on Windows defaults sys.stdout.encoding to cp1252 unless PYTHONIOENCODING=utf-8 is set in the environment or the script explicitly reconfigures stdout. Playwright's aria_snapshot() returns the full accessibility tree as a string that frequently contains non-cp1252 glyphs from the page (typography symbols, emoji, CJK characters, math symbols).
Suggested fix
At the top of every code skeleton in skills/webwright/reference/playwright_patterns.md (the "Browser launch skeleton" example and the "Final-script instrumentation" example), force utf-8 stdout:
import sys
sys.stdout.reconfigure(encoding="utf-8")
Add a short "Windows note" callout next to the skeleton explaining why this line is needed. The fix is portable (no-op on POSIX where stdout is already utf-8) and keeps the patterns reliable on the platform where the bug bites hardest.
Happy to send a PR.
Environment
sys.stdout.encoding:cp1252Summary
The canonical "Browser launch skeleton" code block in
skills/webwright/reference/playwright_patterns.md(lines 16–46) and the sibling pattern on line 65 both end withprint(...aria_snapshot()). On any page whose accessibility tree contains characters outside cp1252 — for example an arxiv search results page, which carriesU+25BD WHITE DOWN-POINTING TRIANGLEinside the "▽ More" abstract toggle — the print call crashes withUnicodeEncodeErrorunder Windows's defaultcp1252stdout encoding.A Windows user who copy-pastes the recommended explore skeleton from Webwright's own reference doc will hit this on the very first non-trivial page they look at. The "Final-script instrumentation" example on lines 129–166 is affected at line 143's
print(line, end="")for the same reason (the file write usesopen(..., encoding="utf-8")so the log file itself is fine, but the mirrorprintto stdout still hits cp1252).Reproduction
Expected
ARIA snapshot prints to stdout without crashing.
Actual
Root cause
Python on Windows defaults
sys.stdout.encodingtocp1252unlessPYTHONIOENCODING=utf-8is set in the environment or the script explicitly reconfigures stdout. Playwright'saria_snapshot()returns the full accessibility tree as a string that frequently contains non-cp1252 glyphs from the page (typography symbols, emoji, CJK characters, math symbols).Suggested fix
At the top of every code skeleton in
skills/webwright/reference/playwright_patterns.md(the "Browser launch skeleton" example and the "Final-script instrumentation" example), force utf-8 stdout:Add a short "Windows note" callout next to the skeleton explaining why this line is needed. The fix is portable (no-op on POSIX where stdout is already utf-8) and keeps the patterns reliable on the platform where the bug bites hardest.
Happy to send a PR.