Add opt-in GPU spreading for the parallel test suite by michel2323 · Pull Request #588 · JuliaGPU/oneAPI.jl

michel2323 · 2026-06-18T15:46:54Z

Summary

Adds an opt-in mechanism to spread the parallel test suite across multiple GPU tiles instead of oversubscribing device 0.

With ONEAPI_TEST_SPREAD_GPUS=1, each test worker process is pinned to a distinct GPU via ZE_AFFINITY_MASK (claimed round-robin through an atomic mkdir counter, set before using oneAPI so the Level Zero driver picks it up at init), passed through the ParallelTestRunner env kwarg.

device() is task-local and Malt runs each test in a fresh task, so a device! in init_worker_code would not stick — process-level pinning is the robust approach.

Notes

Fully opt-in: when ONEAPI_TEST_SPREAD_GPUS is unset, behavior is identical to current main (every worker stays on the first device). ZE_AFFINITY_MASK is a standard Level Zero variable, stack-agnostic.

🤖 Generated with Claude Code

ONEAPI_TEST_SPREAD_GPUS=1 pins each test worker process to a distinct GPU via ZE_AFFINITY_MASK (claimed round-robin through an atomic mkdir counter, set before `using oneAPI` so the Level Zero driver picks it up at init). This spreads the suite across all tiles instead of oversubscribing device 0. device() is task-local and Malt runs each test in a fresh task, so a device! in init_worker_code would not stick — process-level pinning is the robust approach. Default (unset) keeps every worker on the first device, preserving single-tile oversubscription which is useful for surfacing contention bugs. Verified: 6 concurrent claimers -> 6 distinct device UUIDs; real harness with --jobs=4 spreads cleanly (SUCCESS). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov · 2026-06-18T19:10:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.89%. Comparing base (e995a63) to head (dd1ba6b).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #588      +/-   ##
==========================================
- Coverage   80.92%   80.89%   -0.04%     
==========================================
  Files          48       48              
  Lines        3234     3234              
==========================================
- Hits         2617     2616       -1     
- Misses        617      618       +1

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in GPU spreading for the parallel test suite#588

Add opt-in GPU spreading for the parallel test suite#588
michel2323 wants to merge 1 commit into
mainfrom
test-gpu-spreading

michel2323 commented Jun 18, 2026

Uh oh!

codecov Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michel2323 commented Jun 18, 2026

Summary

Notes

Uh oh!

codecov Bot commented Jun 18, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant