Skip to content

Fix segfault in TypeTreeHelper boost when reading past buffer end#374

Open
SNWCreations wants to merge 1 commit into
K0lb3:masterfrom
SNWCreations:fix/typetree-boost-segfault
Open

Fix segfault in TypeTreeHelper boost when reading past buffer end#374
SNWCreations wants to merge 1 commit into
K0lb3:masterfrom
SNWCreations:fix/typetree-boost-segfault

Conversation

@SNWCreations
Copy link
Copy Markdown

@SNWCreations SNWCreations commented May 30, 2026

Problem

When read_typetree (C++ boost path) parses certain MonoBehaviour objects whose type tree describes more data than is actually available in the object's byte range, the parser crashes with an access violation (exit code 0xC0000005 on Windows) instead of raising a Python exception.

This happens with MonoBehaviours that have complex serialization structures (e.g. [SerializeReference] fields with ManagedReferencesRegistry). The embedded type tree correctly describes the field structure, but the actual serialized data for some fields is shorter than what the type tree implies. Asset Studio handles the same scenario gracefully with a warning:

[Warning] Error reading field 'references': Unable to read beyond the end of the stream.
[Info] Error while read type, read 16776 bytes but expected 18320 bytes

UnityPy's pure Python read_value path would also handle this correctly (bounds checks raise Python exceptions). The crash only occurs in the C++ boost path.

Root Cause

When a bounds check fires deep in a recursive read_typetree_value call chain, the function correctly returns NULL and sets PyErr. However, if the error occurs at a point where the caller has already partially consumed data and subsequent recursive calls are made before the NULL propagates fully, the parser can enter an inconsistent state. Additionally, PyList_New(length) calls were not checked for NULL — if a large (but positive) length value is read from misaligned data, the allocation can fail and subsequent PyList_SET_ITEM on the NULL pointer causes the segfault.

Fix

  1. Added exhausted flag to ReaderT: Once any bounds check fails, this flag is set. At the entry of read_typetree_value and read_typetree_value_array, the flag is checked and NULL is returned immediately. This ensures that even if one code path doesn't perfectly propagate NULL, all subsequent parsing attempts fail fast with a proper Python ValueError.

  2. Added NULL checks after PyList_New: In the generic array path and read_pair_array, the return value of PyList_New is now checked before use.

  3. Set exhausted flag in all bounds-check failure paths: Every read_* function that detects an out-of-bounds condition now sets reader->exhausted = true before returning NULL/error.

Impact

  • Callers of obj.read() will now receive a ValueError exception instead of a process crash when the object data cannot be fully parsed
  • No change in behavior for objects that parse successfully
  • The fix is defensive — it doesn't attempt to partially parse or recover data, it simply ensures the failure mode is a catchable Python exception rather than a segfault

Reproduction

Any Unity bundle containing MonoBehaviour objects with [SerializeReference] fields where the serialized data is shorter than the type tree's full structure. The crash occurs when calling obj.read() on such objects with UnityPyBoost available.

Sample files:
erroring_bundles.zip

Add an 'exhausted' flag to ReaderT that is set when any bounds check
fails. Check this flag at the entry of read_typetree_value and
read_typetree_value_array to immediately return NULL (with a Python
ValueError set) instead of continuing to operate on invalid state.

Also add NULL checks after PyList_New calls that could fail when
allocation is requested for a large count derived from misaligned data.
@nesrak1
Copy link
Copy Markdown

nesrak1 commented May 30, 2026

Unrelated to the PR which would be good to have, your sample files look fine so this is just a bug in parsing SerializeReference in whatever version of AssetStudio you're using and UnityPy. It may be a case of nested managedReference or some other bug. We should probably make a new Github issue about it.

@SNWCreations
Copy link
Copy Markdown
Author

Unrelated to the PR which would be good to have, your sample files look fine so this is just a bug in parsing SerializeReference in whatever version of AssetStudio you're using and UnityPy. It may be a case of nested managedReference or some other bug. We should probably make a new Github issue about it.

Thanks for the note. Just to clarify, this PR addresses exactly the crash scenario described: when the C++ boost path encounters a SerializeReference/ManagedReferencesRegistry structure where the serialized data is shorter than the type tree implies, it currently segfaults instead of raising a Python exception. The fix ensures it fails gracefully with a ValueError. Whether the data mismatch itself is a Unity serialization quirk or an upstream bug is a separate question, but at least UnityPy won't crash on it anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants