Python runtime rework #4222
Open
cehongwang wants to merge 2 commits intomainfrom
Open
Conversation
narendasan
reviewed
Apr 29, 2026
Collaborator
narendasan
left a comment
There was a problem hiding this comment.
Not sure this PR is clean yet
| : TRTEngine::ResourceAllocationStrategy::kStatic); | ||
| }) | ||
| .def_readwrite("use_pre_allocated_outputs", &TRTEngine::use_pre_allocated_outputs) | ||
| .def_readwrite("pre_allocated_outputs", &TRTEngine::pre_allocated_outputs) |
Collaborator
Author
There was a problem hiding this comment.
The first one is the flag, and the second one is the tensor. There is a test that compares the output tensor pointer to verify it is the same tensor. So the second one is needed
daad718 to
a328e80
Compare
a328e80 to
00e4ea2
Compare
00e4ea2 to
168bdea
Compare
Collaborator
Author
|
I will revert the commit |
2b3b974 to
c6b8edf
Compare
There was a problem hiding this comment.
There are some changes that do not conform to Python style guidelines:
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py 2026-05-04 23:59:29.343126+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py 2026-05-04 23:59:48.643676+00:00
@@ -38,10 +38,11 @@
Optional[SerializedTensorRTEngineFmt],
List[str],
List[str],
]
+
class TorchTensorRTModule(torch.nn.Module): # type: ignore[misc]
"""``nn.Module`` that runs a TensorRT engine inside PyTorch.
When the C++ Torch-TensorRT runtime is available, execution uses
``torch.classes.tensorrt.Engine`` and ``torch.ops.tensorrt.execute_engine``.
@@ -157,10 +158,11 @@
if k == "engine":
object.__setattr__(result, k, v) # shallow: reuse the same C++ Engine
else:
object.__setattr__(result, k, copy.deepcopy(v, memo))
return result
+
def _resolve_target_device(self) -> torch.device:
"""Resolve the engine's target CUDA device from compilation settings."""
if self.settings.device is not None:
return torch.device(f"cuda:{self.settings.device.gpu_id}")
return torch.device(f"cuda:{torch.cuda.current_device()}")
--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/partitioning/test_000_resource_partitioning.py 2026-05-04 23:59:29.378025+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/partitioning/test_000_resource_partitioning.py 2026-05-04 23:59:52.160386+00:00
@@ -17,10 +17,11 @@
ResourcePartitioner,
)
# Fixed RSS value to make memory-budget calculations deterministic.
_FIXED_RSS_BYTES = 512 * 1024 * 1024 # 512 MB
+
class TestResourcePartitioning(TestCase):
def test_atomic_subgraph_correction(self):
class net(nn.Module):
def __init__(self):
@@ -110,7 +111,8 @@
# The fusion should be fixed after the step
partitioner._verify_all_fusion_nodes_in_same_subgraph(new_subgraphs)
break
+
if __name__ == "__main__":
run_tests()
--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/partitioning/test_001_resource_partitioning.py 2026-05-04 23:59:29.378025+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/partitioning/test_001_resource_partitioning.py 2026-05-04 23:59:52.471763+00:00
@@ -25,10 +25,11 @@
resource_partition,
)
# Fixed RSS value used across all tests to make memory-budget calculations deterministic.
_FIXED_RSS_BYTES = 512 * 1024 * 1024 # 512 MB
+
class TestResourcePartitioning(TestCase):
def test_resource_partitioning(self):
class net(nn.Module):
def __init__(self):
@@ -413,7 +414,8 @@
== 4
), "The graph should have 4 accelerated subgraphs"
torch._dynamo.reset()
+
if __name__ == "__main__":
run_tests()b04e443 to
edd97c1
Compare
15503b0 to
42972f0
Compare
42972f0 to
29673a1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Unify Python and C++ TensorRT runtimes
PythonTorchTensorRTModule is removed. Both runtimes now live behind the same TorchTensorRTModule, with a single use_python_runtime flag (on CompilationSettings) selecting which path to take at engine setup.
What's changed
One module, two backends. TorchTensorRTModule.setup_engine() constructs either torch.classes.tensorrt.Engine (C++) or TRTEngine (Python) and binds the matching op to self.execute_engine_op. forward() just calls the bound op — no per-iteration branching, no separate module class.
Two equivalent ops. tensorrt::execute_engine (C++) and tensorrt::execute_engine_python (Python) are both registered and have the same signature, fake kernel, and semantics. The Python op is registered unconditionally so it's available even when the C++ runtime is loaded.
Checklist: