Skip to content

gh-152235: Defer GC tracking of set and frozenset to end of construction#152237

Merged
corona10 merged 2 commits into
python:mainfrom
corona10:gh-152235-set/frozenset
Jun 26, 2026
Merged

gh-152235: Defer GC tracking of set and frozenset to end of construction#152237
corona10 merged 2 commits into
python:mainfrom
corona10:gh-152235-set/frozenset

Conversation

@corona10

@corona10 corona10 commented Jun 25, 2026

Copy link
Copy Markdown
Member

@corona10 corona10 requested a review from rhettinger as a code owner June 25, 2026 20:50
@corona10 corona10 requested review from methane and vstinner June 25, 2026 20:50
@corona10 corona10 added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes needs backport to 3.15 pre-release feature fixes, bugs and security fixes labels Jun 25, 2026
@corona10

Copy link
Copy Markdown
Member Author

TSAN script

import gc, threading

N = 4000
stop = threading.Event()
box = []

def reader():
    while not stop.is_set():
        if box:
            o = box[0]
            try:
                len(o); repr(o); hash(o)
            except Exception:
                pass

class It:                       
    def __init__(self): self.i = 0
    def __iter__(self): return self
    def __next__(self):
        if self.i == 5 and not box:
            for o in gc.get_objects()
                if type(o) is frozenset and 0 < len(o) < N:
                    box.append(o)
                    break
        if self.i >= N: raise StopIteration
        self.i += 1; return self.i

def builder():
    try:
        for _ in range(30):
            box.clear(); frozenset(It())
    finally:
        stop.set()

ts = [threading.Thread(target=reader) for _ in range(2)]
b = threading.Thread(target=builder)
for t in ts: t.start()
b.start(); b.join()
for t in ts: t.join(timeout=2)

print("observed half-built frozenset:", bool(box), "(False = fixed)")

AS-IS

SUMMARY: ThreadSanitizer: data race setobject.c:321 in set_add_entry_takeref
==================
==================
WARNING: ThreadSanitizer: data race (pid=89894)
  Atomic write of size 8 at 0x00030c150460 by thread T3:
    #0 set_add_entry_takeref setobject.c:323 (python.exe:arm64+0x100176708)
    #1 set_add_key setobject.c:608 (python.exe:arm64+0x100179a04)
    #2 set_update_iterable_lock_held setobject.c:1233 (python.exe:arm64+0x10017eafc)
    #3 set_update_local setobject.c:1280 (python.exe:arm64+0x10017efa4)
    #4 make_new_frozenset setobject.c:1419 (python.exe:arm64+0x1001837d8)
    #5 frozenset_vectorcall setobject.c:1458 (python.exe:arm64+0x100178ae8)
    #6 _PyEval_EvalFrameDefault generated_cases.c.h:2418 (python.exe:arm64+0x100290ca4)
    #7 _PyEval_Vector ceval.c:2141 (python.exe:arm64+0x10028a828)
    #8 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008fda4)
    #9 _PyObject_VectorcallPrepend call.c:855 (python.exe:arm64+0x1000914e8)
    #10 method_vectorcall classobject.c:55 (python.exe:arm64+0x1000947f8)
    #11 context_run context.c:731 (python.exe:arm64+0x1002d5e50)
    #12 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (python.exe:arm64+0x1000a6ce8)
    #13 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008f77c)
    #14 _Py_VectorCallInstrumentation_StackRefSteal ceval.c:768 (python.exe:arm64+0x10028b300)
    #15 _PyEval_EvalFrameDefault generated_cases.c.h:1906 (python.exe:arm64+0x10028fccc)
    #16 _PyEval_Vector ceval.c:2141 (python.exe:arm64+0x10028a828)
    #17 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008fda4)
    #18 _PyObject_VectorcallPrepend call.c:855 (python.exe:arm64+0x1000914e8)
    #19 method_vectorcall classobject.c:55 (python.exe:arm64+0x1000947f8)
    #20 _PyObject_Call call.c:348 (python.exe:arm64+0x10008fa24)
    #21 PyObject_Call call.c:373 (python.exe:arm64+0x10008fa98)
    #22 thread_run _threadmodule.c:388 (python.exe:arm64+0x1004426a8)
    #23 pythread_wrapper thread_pthread.h:234 (python.exe:arm64+0x100382284)

  Previous read of size 8 at 0x00030c150460 by thread T2:
    #0 set_repr setobject.c:816 (python.exe:arm64+0x1001777d0)
    #1 PyObject_Repr object.c:784 (python.exe:arm64+0x10013f370)
    #2 builtin_repr bltinmodule.c:2677 (python.exe:arm64+0x100285fec)
    #3 _PyEval_EvalFrameDefault generated_cases.c.h:2712 (python.exe:arm64+0x1002919a4)
    #4 _PyEval_Vector ceval.c:2141 (python.exe:arm64+0x10028a828)
    #5 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008fda4)
    #6 _PyObject_VectorcallPrepend call.c:855 (python.exe:arm64+0x1000914e8)
    #7 method_vectorcall classobject.c:55 (python.exe:arm64+0x1000947f8)
    #8 context_run context.c:731 (python.exe:arm64+0x1002d5e50)
    #9 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (python.exe:arm64+0x1000a6ce8)
    #10 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008f77c)
    #11 _Py_VectorCallInstrumentation_StackRefSteal ceval.c:768 (python.exe:arm64+0x10028b300)
    #12 _PyEval_EvalFrameDefault generated_cases.c.h:1906 (python.exe:arm64+0x10028fccc)
    #13 _PyEval_Vector ceval.c:2141 (python.exe:arm64+0x10028a828)
    #14 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008fda4)
    #15 _PyObject_VectorcallPrepend call.c:855 (python.exe:arm64+0x1000914e8)
    #16 method_vectorcall classobject.c:55 (python.exe:arm64+0x1000947f8)
    #17 _PyObject_Call call.c:348 (python.exe:arm64+0x10008fa24)
    #18 PyObject_Call call.c:373 (python.exe:arm64+0x10008fa98)
    #19 thread_run _threadmodule.c:388 (python.exe:arm64+0x1004426a8)
    #20 pythread_wrapper thread_pthread.h:234 (python.exe:arm64+0x100382284)

  Thread T3 (tid=2000779, running) created by main thread at:
    #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x33834)
    #1 do_start_joinable_thread thread_pthread.h:281 (python.exe:arm64+0x1003814c8)
    #2 PyThread_start_joinable_thread thread_pthread.h:323 (python.exe:arm64+0x100381304)
    #3 ThreadHandle_start _threadmodule.c:475 (python.exe:arm64+0x1004424b8)
    #4 do_start_new_thread _threadmodule.c:1919 (python.exe:arm64+0x100441f8c)
    #5 thread_PyThread_start_joinable_thread _threadmodule.c:2042 (python.exe:arm64+0x100441034)
    #6 cfunction_call methodobject.c:564 (python.exe:arm64+0x100136864)
    #7 _PyObject_MakeTpCall call.c:242 (python.exe:arm64+0x10008ec24)
    #8 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008f810)
    #9 _Py_VectorCall_StackRefSteal ceval.c:726 (python.exe:arm64+0x10028abe8)
    #10 _PyEval_EvalFrameDefault generated_cases.c.h (python.exe:arm64+0x1002936a4)
    #11 PyEval_EvalCode ceval.c:679 (python.exe:arm64+0x10028a404)
    #12 run_eval_code_obj pythonrun.c:1369 (python.exe:arm64+0x10035ed1c)
    #13 run_mod pythonrun.c:1472 (python.exe:arm64+0x10035ea7c)
    #14 _PyRun_SimpleFileObject pythonrun.c:518 (python.exe:arm64+0x10035a090)
    #15 _PyRun_AnyFileObject pythonrun.c:81 (python.exe:arm64+0x10035982c)
    #16 pymain_run_file main.c:430 (python.exe:arm64+0x10039b4d4)
    #17 Py_RunMain main.c:796 (python.exe:arm64+0x10039a81c)
    #18 pymain_main main.c:826 (python.exe:arm64+0x10039ad8c)
    #19 Py_BytesMain main.c:850 (python.exe:arm64+0x10039ae88)
    #20 main python.c:15 (python.exe:arm64+0x100000a04)

  Thread T2 (tid=2000778, running) created by main thread at:
    #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x33834)
    #1 do_start_joinable_thread thread_pthread.h:281 (python.exe:arm64+0x1003814c8)
    #2 PyThread_start_joinable_thread thread_pthread.h:323 (python.exe:arm64+0x100381304)
    #3 ThreadHandle_start _threadmodule.c:475 (python.exe:arm64+0x1004424b8)
    #4 do_start_new_thread _threadmodule.c:1919 (python.exe:arm64+0x100441f8c)
    #5 thread_PyThread_start_joinable_thread _threadmodule.c:2042 (python.exe:arm64+0x100441034)
    #6 cfunction_call methodobject.c:564 (python.exe:arm64+0x100136864)
    #7 _PyObject_MakeTpCall call.c:242 (python.exe:arm64+0x10008ec24)
    #8 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008f810)
    #9 _Py_VectorCall_StackRefSteal ceval.c:726 (python.exe:arm64+0x10028abe8)
    #10 _PyEval_EvalFrameDefault generated_cases.c.h (python.exe:arm64+0x1002936a4)
    #11 PyEval_EvalCode ceval.c:679 (python.exe:arm64+0x10028a404)
    #12 run_eval_code_obj pythonrun.c:1369 (python.exe:arm64+0x10035ed1c)
    #13 run_mod pythonrun.c:1472 (python.exe:arm64+0x10035ea7c)
    #14 _PyRun_SimpleFileObject pythonrun.c:518 (python.exe:arm64+0x10035a090)
    #15 _PyRun_AnyFileObject pythonrun.c:81 (python.exe:arm64+0x10035982c)
    #16 pymain_run_file main.c:430 (python.exe:arm64+0x10039b4d4)
    #17 Py_RunMain main.c:796 (python.exe:arm64+0x10039a81c)
    #18 pymain_main main.c:826 (python.exe:arm64+0x10039ad8c)
    #19 Py_BytesMain main.c:850 (python.exe:arm64+0x10039ae88)
    #20 main python.c:15 (python.exe:arm64+0x100000a04)

SUMMARY: ThreadSanitizer: data race setobject.c:323 in set_add_entry_takeref
==================
ThreadSanitizer:DEADLYSIGNAL
==89894==ERROR: ThreadSanitizer: SEGV on unknown address 0x000102000100 (pc 0x000103b03aec bp 0x00016e402000 sp 0x00016e401fc0 T2000777)
==89894==The signal is caused by a READ memory access.
    #0 __tsan_atomic64_load <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x5faec)
    #1 _PyMem_MiFree obmalloc.c:298 (python.exe:arm64+0x100163bd4)
    #2 PyMem_Free obmalloc.c:1286 (python.exe:arm64+0x100165c54)
    #3 list_dealloc listobject.c:569 (python.exe:arm64+0x1000e793c)
    #4 _Py_Dealloc object.c:3319 (python.exe:arm64+0x10013df24)
    #5 _Py_MergeZeroLocalRefcount object.c (python.exe:arm64+0x10013e180)
    #6 set_repr setobject.c:816 (python.exe:arm64+0x1001778d4)
    #7 PyObject_Repr object.c:784 (python.exe:arm64+0x10013f370)
    #8 builtin_repr bltinmodule.c:2677 (python.exe:arm64+0x100285fec)
    #9 _PyEval_EvalFrameDefault generated_cases.c.h:2712 (python.exe:arm64+0x1002919a4)
    #10 _PyEval_Vector ceval.c:2141 (python.exe:arm64+0x10028a828)
    #11 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008fda4)
    #12 _PyObject_VectorcallPrepend call.c:855 (python.exe:arm64+0x1000914e8)
    #13 method_vectorcall classobject.c:55 (python.exe:arm64+0x1000947f8)
    #14 context_run context.c:731 (python.exe:arm64+0x1002d5e50)
    #15 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (python.exe:arm64+0x1000a6ce8)
    #16 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008f77c)
    #17 _Py_VectorCallInstrumentation_StackRefSteal ceval.c:768 (python.exe:arm64+0x10028b300)
    #18 _PyEval_EvalFrameDefault generated_cases.c.h:1906 (python.exe:arm64+0x10028fccc)
    #19 _PyEval_Vector ceval.c:2141 (python.exe:arm64+0x10028a828)
    #20 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008fda4)
    #21 _PyObject_VectorcallPrepend call.c:855 (python.exe:arm64+0x1000914e8)
    #22 method_vectorcall classobject.c:55 (python.exe:arm64+0x1000947f8)
    #23 _PyObject_Call call.c:348 (python.exe:arm64+0x10008fa24)
    #24 PyObject_Call call.c:373 (python.exe:arm64+0x10008fa98)
    #25 thread_run _threadmodule.c:388 (python.exe:arm64+0x1004426a8)
    #26 pythread_wrapper thread_pthread.h:234 (python.exe:arm64+0x100382284)
    #27 __tsan_thread_start_func <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x337a0)
    #28 _pthread_start <null> (libsystem_pthread.dylib:arm64e+0x6c54)
    #29 thread_start <null> (libsystem_pthread.dylib:arm64e+0x1c18)

==89894==Register values:
 x[0] = 0x0000000103a94000   x[1] = 0x0000100204000200   x[2] = 0x00000000c58c34ff   x[3] = 0x0000000000000008
 x[4] = 0x0000000000000003   x[5] = 0x0000000000000000   x[6] = 0x0000000000000000   x[7] = 0x0000000000000000
 x[8] = 0x0000000000000000   x[9] = 0x0000000000000034  x[10] = 0x00000000c0000000  x[11] = 0x0000000000000000
x[12] = 0x00000001103cb8f0  x[13] = 0x0000100204000200  x[14] = 0x0000000000000003  x[15] = 0x0000000000001f8c
x[16] = 0x0000000000000000  x[17] = 0x0000000103b58a40  x[18] = 0x0000000000000000  x[19] = 0x0000000102000100
x[20] = 0x0000000103a94000  x[21] = 0x0000000102bf3bd8  x[22] = 0x0000000000000000  x[23] = 0x000000016e4030e0
x[24] = 0x0000000000000000  x[25] = 0x0000000000000000  x[26] = 0x0000000000000011  x[27] = 0x000000000000001f
x[28] = 0x0000000000000011     fp = 0x000000016e402000     lr = 0x0000000103b03adc     sp = 0x000000016e401fc0
ThreadSanitizer can not provide additional info.
SUMMARY: ThreadSanitizer: SEGV obmalloc.c:298 in _PyMem_MiFree
==89894==ABORTING

TO-BE

observed half-built frozenset: False (False = fixed)

Comment thread Objects/setobject.c
@corona10 corona10 requested a review from sergey-miryanov June 25, 2026 22:08
Comment thread Objects/setobject.c
so = (PySetObject *)type->tp_alloc(type, 0);
// Allocate untracked: the fill below runs user code, and a half-built
// set must not be reachable from another thread via gc.get_objects().
so = (PySetObject *)_PyType_AllocNoTrack(type, 0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the type spec and remove PyType_GenericAlloc from tp_alloc?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch :)

@corona10 corona10 requested a review from sergey-miryanov June 26, 2026 01:06
@corona10 corona10 merged commit 908f438 into python:main Jun 26, 2026
102 of 104 checks passed
@miss-islington-app

Copy link
Copy Markdown

Thanks @corona10 for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14, 3.15.
🐍🍒⛏🤖

@bedevere-app

bedevere-app Bot commented Jun 26, 2026

Copy link
Copy Markdown

GH-152242 is a backport of this pull request to the 3.15 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label Jun 26, 2026
@miss-islington-app

Copy link
Copy Markdown

Sorry, @corona10, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 908f438e198a753d40d1166b5f8725e650a9ed6e 3.13

@bedevere-app

bedevere-app Bot commented Jun 26, 2026

Copy link
Copy Markdown

GH-152243 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.14 bugs and security fixes label Jun 26, 2026
corona10 added a commit that referenced this pull request Jun 26, 2026
…nstruction (gh-152237) (gh-152243)

gh-152235: Defer GC tracking of set and frozenset to end of construction (gh-152237)
(cherry picked from commit 908f438)

Co-authored-by: Donghee Na <donghee.na@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs backport to 3.13 bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants