fix(gfx): copy subExecParam from device before reading timing data#323
Open
fcui-amd wants to merge 1 commit into
Open
fix(gfx): copy subExecParam from device before reading timing data#323fcui-amd wants to merge 1 commit into
fcui-amd wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a crash in the GPU GFX executor when collecting per-Transfer/per-iteration timing and CU data on systems where VRAM is not host-accessible (e.g., no large/resizable BAR) by copying SubExecParam back from device before reading fields on the host.
Changes:
- Copy
rss.subExecParamGpuPtrfrom device to host before collecting per-iteration CU IDs in multi-stream mode. - Copy
exeInfo.subExecParamGpufrom device to host before computing per-Transfer timing/CU IDs in single-stream mode.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
GFX executor read startCycle/stopCycle/xccId/hwId directly from subExecParamGpu (device memory allocated via hipMalloc). This assumed VRAM is host-accessible, which fails on GPUs without large/resizable BAR (e.g. PCIe cards), causing SIGSEGV. Now copy params back to host via hipMemcpy before reading in both single-launch and multistream paths. Signed-off-by: Flora Cui <flora.cui@amd.com>
674a070 to
45d6c97
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The GFX executor crashes with a SIGSEGV when collecting per-Transfer timing
data on GPUs that do not expose VRAM to the host (e.g. PCIe cards without
large/resizable BAR).
Technical Details
Copy the subExecutor parameters back to host with
hipMemcpy(..., DeviceToHost)before reading timing/CU data, in both affected paths. No behavior change on
hardware where direct access previously worked.
Test Plan
Test Result
Submission Checklist