Skip to content

fix junitxml bin_xml_escape: supplementary plane characters incorrectly escaped#14484

Merged
bluetech merged 4 commits into
pytest-dev:mainfrom
EternalRights:fix-junitxml-supplementary-plane-regex
May 28, 2026
Merged

fix junitxml bin_xml_escape: supplementary plane characters incorrectly escaped#14484
bluetech merged 4 commits into
pytest-dev:mainfrom
EternalRights:fix-junitxml-supplementary-plane-regex

Conversation

@EternalRights
Copy link
Copy Markdown
Contributor

The illegal_xml_re regex in bin_xml_escape was using \u10000-\u10ffff to cover the supplementary plane range, but Python's \u escape only takes 4 hex digits. So \u10000 was parsed as \u1000 + 0 and \u10ffff as \u10ff + ff, meaning the entire supplementary plane (U+10000 to U+10FFFF) was missing from the "valid" character set.

This caused all supplementary plane characters to be incorrectly escaped in JUnit XML output -- emoji, CJK Extension B-F, mathematical symbols, etc. For example bin_xml_escape("test_😀") returned "test_#x1F600" instead of "test_😀".

The fix: change \u10000-\u10ffff to \U00010000-\U0010ffff (8-digit unicode escape).

Also uncommented the supplementary plane code points in the existing test_bin_xml_escape test, and added a dedicated test_bin_xml_escape_supplementary_plane test with emoji, CJK Ext B, and musical symbol cases.

Closes #14483

@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided (automation) changelog entry is part of PR label May 15, 2026
Comment thread testing/test_junitxml.py Outdated
def test_bin_xml_escape_supplementary_plane() -> None:
assert bin_xml_escape(chr(0x1F600)) == chr(0x1F600)
assert bin_xml_escape("test_😀") == "test_😀"
assert bin_xml_escape("test_𠀀") == "test_𠀀"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kinda whitespace is this

We may need to rename the function as we ought to avoid leaving likeness as is

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll fold the supplementary plane checks into the existing test and drop the separate function. Pushing a fix shortly.

Copy link
Copy Markdown
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please see my comments.

Comment thread testing/test_junitxml.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to adjust this comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the outdated XXX comments since the test now covers supplementary plane codepoints

Comment thread changelog/14483.bugfix.rst Outdated
@@ -0,0 +1 @@
Fixed ``bin_xml_escape`` in junitxml incorrectly escaping supplementary plane characters (U+10000 and above, including emoji) due to using ``\u`` instead of ``\U`` for the supplementary plane range in the ``illegal_xml_re`` regex.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rewrite this changelog entry to be user-facing? A user doesn't know what bin_xml_escape is, and don't really care what was causing the bug (can click through to the issue if interested). The entry should describe the problem as the user might see it, in this case invalid escaping for high Unicode codepoints.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And optionally sign with -- by :user:`EternalRights`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewritten to describe the user-visible problem, as suggested

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added the signature

Comment thread testing/test_junitxml.py Outdated
@@ -1122,8 +1122,7 @@ def test_invalid_xml_escape() -> None:
0xFFFE,
0x0FFFF,
) # , 0x110000)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this entry commented out? Do we need new entries here or in the valid tuple?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the comment — 0x110000 exceeds the max Unicode codepoint so it can't be passed to chr()

Copy link
Copy Markdown
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bluetech bluetech added the backport 9.0.x apply to PRs at any point; backports the changes to the 9.0.x branch label May 28, 2026
@bluetech bluetech merged commit 4748bb1 into pytest-dev:main May 28, 2026
36 checks passed
@patchback
Copy link
Copy Markdown

patchback Bot commented May 28, 2026

Backport to 9.0.x: 💚 backport PR created

✅ Backport PR branch: patchback/backports/9.0.x/4748bb15b9053a791c04783d4d84eba512c7cf79/pr-14484

Backported as #14532

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

bluetech pushed a commit that referenced this pull request May 28, 2026
bluetech added a commit that referenced this pull request May 28, 2026
…748bb15b9053a791c04783d4d84eba512c7cf79/pr-14484

[PR #14484/4748bb15 backport][9.0.x] fix junitxml bin_xml_escape: supplementary plane characters incorrectly escaped
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 9.0.x apply to PRs at any point; backports the changes to the 9.0.x branch bot:chronographer:provided (automation) changelog entry is part of PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bin_xml_escape: supplementary plane characters (U+10000+) incorrectly escaped due to wrong unicode escape in regex

4 participants