Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 80 additions & 12 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,91 @@
# Changelog

## 2.4.0 (Unreleased)
## 3.0.0 (Unreleased)

### Added
### Breaking Changes

- **Simplified execution model** - Only two public execution modes: `worker` and `owngil`
- `worker`: Dedicated pthread per context with stable thread affinity (default)
- `owngil`: Dedicated pthread + subinterpreter with own GIL (Python 3.14+)
- Removed `multi_executor` and `free_threaded` from public API
- Internal capability detection still tracks Python features

- **Removed `py:num_executors/0`** - Contexts now use per-context worker threads
instead of a shared executor pool. This function is no longer needed.

- **`py:execution_mode/0` returns `worker | owngil`** - Based on the `context_mode`
application configuration. Previously returned internal capabilities like
`free_threaded`, `subinterp`, or `multi_executor`.

- **Context thread affinity** - Contexts in MULTI_EXECUTOR mode are now assigned a
fixed executor thread at creation. All operations (call, eval, exec) from the same
context run on the same OS thread, preventing thread state corruption in libraries
like numpy and PyTorch that have thread-local state.
- **Removed `py:async_stream/3,4`** - Streaming async generators was never
implemented behind the API and always returned `{error, stream_not_implemented}`.
Use `py:stream_start/3,4` for sync generators; async-generator support may
return in a later release.

- **Removed `num_executors` / `num_async_workers` configuration** - Both keys
were no-ops after the v3.0 worker rework. Configure context count via
`num_contexts` and the rate-limit ceiling via `max_concurrent`.

- **Strict context-mode validation at the NIF boundary** - `py_nif:context_create/1`
now returns `{error, {invalid_mode, Atom}}` for anything other than `worker | owngil`.
Previously, callers that bypassed `py_context` (notably `py_reactor_context`)
silently mapped any unknown atom — including legacy `auto` and `subinterp` —
to worker mode. Code that relied on that loophole must pass `worker` (or
`owngil`) explicitly.

### Fixed

- **`py:async_call/3,4` + `py:async_await/1,2` round-trip** - Previously the
await receive matched `{py_response, _, _}` while the event loop sent
`{async_result, _, _}`, causing every async call to silently time out.
Async calls now go directly through `py_event_loop:create_task` and
`py_event_loop:await`.

- **`py:async_gather/1,2` actually executes** - Reimplemented as concurrent
`async_call` submission with sequential `async_await`. Returns
`{ok, [Result1, ...]}` on success or `{error, {gather_failed, [{Idx, Reason}, ...]}}`
if any call fails. The previous implementation returned `gather_not_implemented`.

### Changed

- **`py:execution_mode/0` now returns actual mode** - Returns `worker` (default),
`owngil`, `free_threaded`, or `multi_executor` based on actual configuration
instead of Python capability. Previously returned `subinterp` even when using
worker mode.
- **Per-context worker threads** - Each context now gets its own dedicated pthread
that handles all Python operations. This provides stable thread affinity for
numpy/torch/tensorflow compatibility without needing a shared executor pool.

- **Async NIF dispatch** - Context operations use async NIFs with message passing
instead of blocking dirty schedulers. This improves concurrency under load.

- **Request queue per context** - Replaced single-slot request pattern with proper
request queues that support multiple concurrent callers.

- **No global asyncio policy install on Python 3.14+.** `asyncio.set_event_loop_policy`
was deprecated in 3.14 and is removed in 3.16. The Erlang integration's run path
already uses `loop_factory=` (`erlang.run/1`, `asyncio.Runner`) so the global
policy was only a convenience for bare `asyncio.run()` inside `py:exec`. We now
skip the install on 3.14+ to avoid the deprecation warning. On 3.14+ use
`erlang.run(main)` or `asyncio.Runner(loop_factory=erlang.new_event_loop)`
explicitly. Behavior on Python 3.9–3.13 is unchanged. `erlang.install()` raises
`RuntimeError` on 3.14+ (still emits a `DeprecationWarning` and works on 3.12–3.13).

### Removed

- **Removed obsolete subinterp test references** - Test suites updated to reflect
the removal of subinterpreter mode. Tests now use `worker` or `owngil` modes.
- Multi-executor pool (`g_executors[]`, `multi_executor_start/stop`)
- `context_dispatch_call/eval/exec` functions (dead code)
- References to `PY_MODE_MULTI_EXECUTOR` in context operations
- `py_async_pool` legacy gen_server (unused after async API rewire)
- **Explicit `py:subinterp_*` handle API removed.** `py:subinterp_create/0`,
`subinterp_destroy/1`, `subinterp_call/4,5`, `subinterp_eval/2,3`,
`subinterp_exec/2`, `subinterp_cast/4`, `subinterp_async_call/4`,
`subinterp_await/1,2`, and `subinterp_pool_*` are all gone. Use
`py_context:new(#{mode => owngil})` instead — it gives the same
parallelism with OTP supervision and automatic cleanup.
`py:subinterp_supported/0` (capability probe) and `py:parallel/1`
(which routes through the context API) stay.
- Internal `py_execution_mode_t` collapsed from 3 values to 2 (`free_threaded`
/ `gil`); `py_nif:execution_mode/0` returns `free_threaded | gil` instead
of the old `free_threaded | subinterp | multi_executor`.
- `examples/reactor_owngil_example.erl` deleted (called nonexistent
`py:subinterp_reactor_*` functions; pre-existing breakage).

## 2.3.1 (2026-04-01)

Expand Down
42 changes: 16 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,9 @@ evaluate expressions, and stream from generators - all without blocking Erlang
schedulers.

**Parallelism options:**
- **Worker mode** (default, recommended) - Works with any Python version. With free-threaded Python (3.13t+), provides true parallelism automatically
- **SHARED_GIL sub-interpreters** (Python 3.12+) - Isolated namespaces, shared GIL (isolation improves in 3.14+)
- **OWN_GIL sub-interpreters** (Python 3.14+) - Each interpreter has its own GIL, true parallelism
- **BEAM processes** - Fan out work across lightweight Erlang processes
- **Worker mode** (default, recommended) - Works with any Python version. With free-threaded Python (3.13t+), provides true parallelism automatically.
- **OWN_GIL sub-interpreters** (Python 3.14+) - Each interpreter has its own GIL, true parallelism.
- **BEAM processes** - Fan out work across lightweight Erlang processes.

Key features:
- **Process-bound environments** - Each Erlang process gets isolated Python state, enabling OTP-supervised Python actors
Expand Down Expand Up @@ -302,14 +301,11 @@ Ref = py:async_call(aiohttp, get, [<<"https://api.example.com/data">>]),
{ok, Response} = py:async_await(Ref).

%% Gather multiple async calls concurrently
{ok, Results} = py:async_gather([
{ok, [Users, Posts, Comments]} = py:async_gather([
{aiohttp, get, [<<"https://api.example.com/users">>]},
{aiohttp, get, [<<"https://api.example.com/posts">>]},
{aiohttp, get, [<<"https://api.example.com/comments">>]}
]).

%% Stream from async generators
{ok, Chunks} = py:async_stream(mymodule, async_generator, [args]).
```

## Parallel Execution with Sub-interpreters
Expand All @@ -328,7 +324,7 @@ True parallelism without GIL contention using Python 3.14+ OWN_GIL sub-interpret
%% Each call runs in its own interpreter with its own GIL
```

For Python 3.12/3.13, use SHARED_GIL sub-interpreters (`mode => subinterp`) for namespace isolation, but note that parallelism is limited by the shared GIL.
For Python 3.12/3.13 the public modes are `worker` (default) and `owngil` (Python 3.14+ only). Earlier versions run all contexts under the shared main interpreter via dedicated worker threads — namespace isolation between contexts is local-dict based, not via subinterpreters.

## Parallel Processing with BEAM Processes

Expand Down Expand Up @@ -590,9 +586,9 @@ ok = py:clear_traces().
%% sys.config
[
{erlang_python, [
{num_workers, 4}, %% Python worker pool size
{max_concurrent, 17}, %% Max concurrent operations (default: schedulers * 2 + 1)
{num_executors, 4} %% Executor threads (multi-executor mode)
{num_contexts, 8}, %% Number of contexts (default: schedulers)
{context_mode, worker}, %% worker | owngil
{max_concurrent, 17} %% Max concurrent operations (default: schedulers * 2 + 1)
]}
].
```
Expand All @@ -605,40 +601,34 @@ When creating Python contexts, you can choose the execution mode:

| Mode | Python Version | Description |
|------|----------------|-------------|
| `worker` | Any | Main interpreter, shared namespace (default, recommended) |
| `subinterp` | 3.12+ | SHARED_GIL sub-interpreter, isolated namespace |
| `owngil` | 3.14+ | OWN_GIL sub-interpreter, true parallelism |
| `worker` | Any | Dedicated pthread per context, main interpreter namespace (default) |
| `owngil` | 3.14+ | Dedicated pthread + subinterpreter with its own GIL, true parallelism |

```erlang
%% Default: worker mode (recommended)
%% With free-threaded Python (3.13t+), provides true parallelism automatically
{ok, Ctx} = py_context:new(#{}).

%% Explicit subinterpreter with shared GIL (Python 3.12+)
%% Provides namespace isolation but no parallelism
{ok, Ctx} = py_context:new(#{mode => subinterp}).

%% OWN_GIL mode for true parallelism (Python 3.14+ required)
%% Each context runs in its own pthread with independent GIL
{ok, Ctx} = py_context:new(#{mode => owngil}).
```

**Worker mode is recommended** because it works with any Python version and automatically benefits from free-threaded Python (3.13t+) when available.
**Worker mode is recommended** because it works with any Python version and automatically benefits from free-threaded Python (3.13t+) when available. Each context owns a dedicated pthread, providing stable thread affinity for libraries with thread-local state (numpy, torch, tensorflow).

**Why OWN_GIL requires Python 3.14+**: Some C extensions (e.g., `_decimal`, `numpy`) have global state bugs in sub-interpreters on Python 3.12/3.13. These are fixed in Python 3.14. SHARED_GIL mode works on 3.12+ but with caveats for C extensions with global state.
**Why OWN_GIL requires Python 3.14+**: Some C extensions (e.g., `_decimal`, `numpy`) have global state bugs in sub-interpreters on Python 3.12/3.13. These are fixed in Python 3.14.

### Runtime Detection

Check the current execution mode:
Check the current execution mode (mirrors the `context_mode` application env):
```erlang
py:execution_mode(). %% => free_threaded | subinterp | multi_executor
py:execution_mode(). %% => worker | owngil
```

| Mode | Python Version | Parallelism |
|------|----------------|-------------|
| Free-threaded | 3.13+ (nogil) | True parallel, no GIL |
| Sub-interpreter | 3.12+ | Per-interpreter GIL |
| Multi-executor | Any | GIL contention |
| `worker` (default) | Any | One pthread per context; true parallelism on free-threaded 3.13t+ |
| `owngil` | 3.14+ | Per-interpreter GIL, true parallelism across contexts |

## Error Handling

Expand Down
16 changes: 11 additions & 5 deletions c_src/py_convert.c
Original file line number Diff line number Diff line change
Expand Up @@ -95,13 +95,19 @@ static void shared_dict_capsule_destructor(PyObject *capsule) {
* @return true if obj is a numpy ndarray, false otherwise
*/
static inline bool is_numpy_ndarray(PyObject *obj) {
/* Use cached type for fast isinstance check when available.
* The cache is only valid in the main interpreter - subinterpreters
* have their own object space, so we fall back to attribute detection. */
if (g_numpy_ndarray_type != NULL && g_execution_mode != PY_MODE_SUBINTERP) {
/* The cache is populated in the main interpreter. On builds where
* subinterpreters can be created (and the runtime isn't free-threaded,
* which short-circuits subinterp use) a context may be running inside
* a subinterpreter where the cached type is invalid -- fall back to
* duck typing in that case. */
#if defined(HAVE_SUBINTERPRETERS) && !defined(HAVE_FREE_THREADED)
/* Build supports subinterpreters and isn't free-threaded:
* skip the cached fast path. */
#else
if (g_numpy_ndarray_type != NULL) {
return PyObject_IsInstance(obj, g_numpy_ndarray_type) == 1;
}

#endif
/* Fallback: duck typing via attribute detection.
* Check for both 'tolist' method and 'ndim' attribute. */
return PyObject_HasAttrString(obj, "tolist") &&
Expand Down
Loading
Loading