Skip to content

test(e2e): run gpu workloads from manifest#1709

Draft
elezar wants to merge 3 commits into
feat/1476-gpu-workload-images/elezarfrom
feat/1472-gpu-validation-tests/elezar
Draft

test(e2e): run gpu workloads from manifest#1709
elezar wants to merge 3 commits into
feat/1476-gpu-workload-images/elezarfrom
feat/1472-gpu-validation-tests/elezar

Conversation

@elezar
Copy link
Copy Markdown
Member

@elezar elezar commented Jun 3, 2026

Summary

This PR adds manifest-driven GPU workload execution tests on top of the workload image artifacts from #1484. It keeps the existing GPU device-selection coverage, adds workload execution coverage under the umbrella gpu target, and documents how to build workload images locally before running the GPU e2e suite.

Related Issue

Closes #1472

Changes

  • Switch GPU workload execution tests from a single image env var to a YAML workload manifest consumed by the Rust e2e harness.
  • Run the manifest-defined workloads through openshell sandbox create --gpu --from <image> -- <command> and enforce declared pass or fail expectations.
  • Load the local manifest from e2e/gpu/images/.build/workloads.yaml by default, with OPENSHELL_E2E_WORKLOAD_MANIFEST available for external manifests.
  • Update the Docker GPU e2e wrapper to point users at the workload manifest flow when no local manifest exists.
  • Add serde_yaml to the e2e crate for manifest parsing.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Validation run:

  • mise run e2e:workloads:build
  • mise run e2e:docker:gpu

Notes:

  • mise run pre-commit currently fails in python:proto with ModuleNotFoundError: No module named 'grpc_tools'; this failure is not caused by the workload-manifest changes.
  • Build workload images and generate the local manifest with mise run e2e:workloads:build before running mise run e2e:docker:gpu locally.
  • External catalogs can be exercised by setting OPENSHELL_E2E_WORKLOAD_MANIFEST=/abs/path/to/workloads.yaml.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@elezar elezar force-pushed the feat/1476-gpu-workload-images/elezar branch from 5cc2d92 to efe4d25 Compare June 4, 2026 12:56
@elezar elezar force-pushed the feat/1472-gpu-validation-tests/elezar branch from 5a84bca to 1c8f7b7 Compare June 4, 2026 14:13
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 4, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant