Python runtime rework by cehongwang · Pull Request #4222 · pytorch/TensorRT

cehongwang · 2026-04-29T00:00:42Z

Unify Python and C++ TensorRT runtimes

PythonTorchTensorRTModule is removed. Both runtimes now live behind the same TorchTensorRTModule, with a single use_python_runtime flag (on CompilationSettings) selecting which path to take at engine setup.

What's changed
One module, two backends. TorchTensorRTModule.setup_engine() constructs either torch.classes.tensorrt.Engine (C++) or TRTEngine (Python) and binds the matching op to self.execute_engine_op. forward() just calls the bound op — no per-iteration branching, no separate module class.
Two equivalent ops. tensorrt::execute_engine (C++) and tensorrt::execute_engine_python (Python) are both registered and have the same signature, fake kernel, and semantics. The Python op is registered unconditionally so it's available even when the C++ runtime is loaded.

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

narendasan

Not sure this PR is clean yet

narendasan · 2026-04-29T20:06:33Z

                          : TRTEngine::ResourceAllocationStrategy::kStatic);
            })
        .def_readwrite("use_pre_allocated_outputs", &TRTEngine::use_pre_allocated_outputs)
+        .def_readwrite("pre_allocated_outputs", &TRTEngine::pre_allocated_outputs)


Why do we need both?

The first one is the flag, and the second one is the tensor. There is a test that compares the output tensor pointer to verify it is the same tensor. So the second one is needed

zewenli98

Is this PR based on #4164? Why do we need two separate PRs?
Can you complete Description of the PRs so that other folks can understand why and what the PRs are doing?

cehongwang · 2026-04-30T22:43:00Z

I will revert the commit 168bdea547a614b26c256527f65217e2d0fbf222 once the pytorch changes are landed.

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2026-05-04 23:59:29.343126+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2026-05-04 23:59:48.643676+00:00
@@ -38,10 +38,11 @@
    Optional[SerializedTensorRTEngineFmt],
    List[str],
    List[str],
]

+
class TorchTensorRTModule(torch.nn.Module):  # type: ignore[misc]
    """``nn.Module`` that runs a TensorRT engine inside PyTorch.

    When the C++ Torch-TensorRT runtime is available, execution uses
    ``torch.classes.tensorrt.Engine`` and ``torch.ops.tensorrt.execute_engine``.
@@ -157,10 +158,11 @@
            if k == "engine":
                object.__setattr__(result, k, v)  # shallow: reuse the same C++ Engine
            else:
                object.__setattr__(result, k, copy.deepcopy(v, memo))
        return result
+
    def _resolve_target_device(self) -> torch.device:
        """Resolve the engine's target CUDA device from compilation settings."""
        if self.settings.device is not None:
            return torch.device(f"cuda:{self.settings.device.gpu_id}")
        return torch.device(f"cuda:{torch.cuda.current_device()}")
--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/partitioning/test_000_resource_partitioning.py	2026-05-04 23:59:29.378025+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/partitioning/test_000_resource_partitioning.py	2026-05-04 23:59:52.160386+00:00
@@ -17,10 +17,11 @@
    ResourcePartitioner,
)

# Fixed RSS value to make memory-budget calculations deterministic.
_FIXED_RSS_BYTES = 512 * 1024 * 1024  # 512 MB
+

class TestResourcePartitioning(TestCase):
    def test_atomic_subgraph_correction(self):
        class net(nn.Module):
            def __init__(self):
@@ -110,7 +111,8 @@
            # The fusion should be fixed after the step
            partitioner._verify_all_fusion_nodes_in_same_subgraph(new_subgraphs)

            break

+
if __name__ == "__main__":
    run_tests()
--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/partitioning/test_001_resource_partitioning.py	2026-05-04 23:59:29.378025+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/partitioning/test_001_resource_partitioning.py	2026-05-04 23:59:52.471763+00:00
@@ -25,10 +25,11 @@
    resource_partition,
)

# Fixed RSS value used across all tests to make memory-budget calculations deterministic.
_FIXED_RSS_BYTES = 512 * 1024 * 1024  # 512 MB
+

class TestResourcePartitioning(TestCase):
    def test_resource_partitioning(self):
        class net(nn.Module):
            def __init__(self):
@@ -413,7 +414,8 @@
            == 4
        ), "The graph should have 4 accelerated subgraphs"

        torch._dynamo.reset()

+
if __name__ == "__main__":
    run_tests()

meta-cla Bot added the cla signed label Apr 29, 2026

github-actions Bot requested a review from zewenli98 April 29, 2026 00:01

narendasan reviewed Apr 29, 2026

View reviewed changes

cehongwang force-pushed the python-runtime-rework-patches branch from daad718 to a328e80 Compare April 29, 2026 20:59

zewenli98 reviewed Apr 30, 2026

View reviewed changes

cehongwang force-pushed the python-runtime-rework-patches branch from a328e80 to 00e4ea2 Compare April 30, 2026 21:04

cehongwang changed the title ~~Python runtime rework patches~~ Python runtime rework Apr 30, 2026

cehongwang force-pushed the python-runtime-rework-patches branch from 00e4ea2 to 168bdea Compare April 30, 2026 22:08

cehongwang force-pushed the python-runtime-rework-patches branch 3 times, most recently from 2b3b974 to c6b8edf Compare May 4, 2026 23:59

github-actions Bot requested changes May 4, 2026

View reviewed changes

cehongwang force-pushed the python-runtime-rework-patches branch 2 times, most recently from b04e443 to edd97c1 Compare May 5, 2026 23:24

Squash the commits

415867b

cehongwang force-pushed the python-runtime-rework-patches branch 2 times, most recently from 15503b0 to 42972f0 Compare May 5, 2026 23:43

added back use_python_runtime and enabled compatibility mode

29673a1

cehongwang force-pushed the python-runtime-rework-patches branch from 42972f0 to 29673a1 Compare May 5, 2026 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python runtime rework #4222

Python runtime rework #4222
cehongwang wants to merge 2 commits intomainfrom
python-runtime-rework-patches

cehongwang commented Apr 29, 2026 •

edited

Loading

Uh oh!

narendasan left a comment

Uh oh!

narendasan Apr 29, 2026

Uh oh!

cehongwang Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zewenli98 left a comment

Uh oh!

cehongwang commented Apr 30, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cehongwang commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unify Python and C++ TensorRT runtimes

Checklist:

Uh oh!

narendasan left a comment

Choose a reason for hiding this comment

Uh oh!

narendasan Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

cehongwang Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zewenli98 left a comment

Choose a reason for hiding this comment

Uh oh!

cehongwang commented Apr 30, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cehongwang commented Apr 29, 2026 •

edited

Loading