Skip to content

fix(gfx): copy subExecParam from device before reading timing data#323

Open
fcui-amd wants to merge 1 commit into
developfrom
users/fcui/gfx-subexecparam-host-deref-segfault
Open

fix(gfx): copy subExecParam from device before reading timing data#323
fcui-amd wants to merge 1 commit into
developfrom
users/fcui/gfx-subexecparam-host-deref-segfault

Conversation

@fcui-amd

Copy link
Copy Markdown

Motivation

The GFX executor crashes with a SIGSEGV when collecting per-Transfer timing
data on GPUs that do not expose VRAM to the host (e.g. PCIe cards without
large/resizable BAR).

Technical Details

Copy the subExecutor parameters back to host with hipMemcpy(..., DeviceToHost)
before reading timing/CU data, in both affected paths. No behavior change on
hardware where direct access previously worked.

Test Plan

Test Result

Submission Checklist

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a crash in the GPU GFX executor when collecting per-Transfer/per-iteration timing and CU data on systems where VRAM is not host-accessible (e.g., no large/resizable BAR) by copying SubExecParam back from device before reading fields on the host.

Changes:

  • Copy rss.subExecParamGpuPtr from device to host before collecting per-iteration CU IDs in multi-stream mode.
  • Copy exeInfo.subExecParamGpu from device to host before computing per-Transfer timing/CU IDs in single-stream mode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/header/TransferBench.hpp Outdated
Comment thread src/header/TransferBench.hpp Outdated
GFX executor read startCycle/stopCycle/xccId/hwId directly from
subExecParamGpu (device memory allocated via hipMalloc). This assumed
VRAM is host-accessible, which fails on GPUs without large/resizable
BAR (e.g. PCIe cards), causing SIGSEGV. Now copy params back to host
via hipMemcpy before reading in both single-launch and multistream
paths.

Signed-off-by: Flora Cui <flora.cui@amd.com>
@fcui-amd fcui-amd force-pushed the users/fcui/gfx-subexecparam-host-deref-segfault branch from 674a070 to 45d6c97 Compare June 13, 2026 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants