Skip to content

playwright_patterns.md 'Browser launch skeleton' crashes with UnicodeEncodeError on Windows when aria_snapshot output contains non-cp1252 characters #7

@raykuo998

Description

@raykuo998

Environment

  • OS: Windows 11 Home 10.0.26200
  • Python: 3.12.10 (Windows Store distribution)
  • Default sys.stdout.encoding: cp1252
  • Playwright: Firefox 146.0.1 (playwright firefox v1509)
  • Webwright commit: 29fc4b4

Summary

The canonical "Browser launch skeleton" code block in skills/webwright/reference/playwright_patterns.md (lines 16–46) and the sibling pattern on line 65 both end with print(...aria_snapshot()). On any page whose accessibility tree contains characters outside cp1252 — for example an arxiv search results page, which carries U+25BD WHITE DOWN-POINTING TRIANGLE inside the "▽ More" abstract toggle — the print call crashes with UnicodeEncodeError under Windows's default cp1252 stdout encoding.

A Windows user who copy-pastes the recommended explore skeleton from Webwright's own reference doc will hit this on the very first non-trivial page they look at. The "Final-script instrumentation" example on lines 129–166 is affected at line 143's print(line, end="") for the same reason (the file write uses open(..., encoding="utf-8") so the log file itself is fine, but the mirror print to stdout still hits cp1252).

Reproduction

# Windows PowerShell, Python 3.12 default install, no PYTHONIOENCODING set
python - <<'PY'
import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as pw:
        browser = await pw.firefox.launch(headless=True)
        ctx = await browser.new_context(viewport={"width": 1280, "height": 1800})
        page = await ctx.new_page()
        await page.goto(
            "https://arxiv.org/search/?query=browser+agent&searchtype=all",
            wait_until="domcontentloaded",
        )
        snapshot = await page.locator("body").aria_snapshot()
        print("ARIA:", snapshot)
        await browser.close()

asyncio.run(main())
PY

Expected

ARIA snapshot prints to stdout without crashing.

Actual

UnicodeEncodeError: 'charmap' codec can't encode character '▽' in position 4958: character maps to <undefined>
  File ".../Lib/encodings/cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]

Root cause

Python on Windows defaults sys.stdout.encoding to cp1252 unless PYTHONIOENCODING=utf-8 is set in the environment or the script explicitly reconfigures stdout. Playwright's aria_snapshot() returns the full accessibility tree as a string that frequently contains non-cp1252 glyphs from the page (typography symbols, emoji, CJK characters, math symbols).

Suggested fix

At the top of every code skeleton in skills/webwright/reference/playwright_patterns.md (the "Browser launch skeleton" example and the "Final-script instrumentation" example), force utf-8 stdout:

import sys
sys.stdout.reconfigure(encoding="utf-8")

Add a short "Windows note" callout next to the skeleton explaining why this line is needed. The fix is portable (no-op on POSIX where stdout is already utf-8) and keeps the patterns reliable on the platform where the bug bites hardest.

Happy to send a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions