fix: Fix failing train tests for v3 by mohamedzeidan2021 · Pull Request #5814 · aws/sagemaker-python-sdk

mohamedzeidan2021 · 2026-04-30T18:53:39Z

Fix instance type for JumpStart training integration test

test_jumpstart_train[huggingface-spc-bert-base-cased] was failing with ValueError: Training is not supported for model ID with instance type: ml.g5.xlarge. The model's SupportedTrainingInstanceTypes only includes ml.g4dn.* and ml.p3.* variants. Replaced ml.g5.xlarge with ml.g4dn.xlarge.

Note on test_base_model_false_still_works failure:

The other failing test (TestLLMAsJudgeBaseModelFix::test_base_model_false_still_works) is a pre-existing flaky test unrelated to both this change and the release commits. None of the release changes touch the evaluation pipeline code path:

"Make _PipelineExecution a public class" affects the pipeline execution class visibility, not the evaluation pipeline template rendering or _get_or_create_pipeline logic.
"Add CodeArtifact support for ModelTrainer" and "Wire FrameworkProcessor code_location" affect training source code/dependency installation, not the evaluator module.
The S3 bucket/path fixes are in core S3 utilities, not in the evaluation pipeline template selection.
The git_utils and service-2.json changes are unrelated to evaluation.
The test fails due to a race condition in the test itself: pytest-xdist runs test_base_model_evaluation_uses_correct_weights (evaluate_base_model=True) and test_base_model_false_still_works (evaluate_base_model=False) in parallel. Both call _get_or_create_pipeline with the same pipeline name prefix, so one test's pipeline.update() overwrites the other's pipeline definition before execution starts. This is a shared-resource concurrency issue in the test infrastructure that predates these release changes. A separate fix (e.g., marking the class @pytest.mark.serial) is needed to address it.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mujtaba1747 · 2026-05-01T02:12:15Z

 }


+@pytest.mark.serial


If this test passes if run serially, this means there maybe a race condition. We can fix it later post-release.

mohamedzeidan2021 · 2026-05-01T17:31:07Z

all integ tests passed

i accidently pushed a commit, but removed it so the checks restarted. I took this screenshot before though

mollyheamazon · 2026-05-01T17:35:30Z

Found the successful run: https://github.com/aws/sagemaker-python-sdk/actions/runs/25191463012

fix: Fix failing train tests for v3

a182f2f

mohamedzeidan2021 temporarily deployed to auto-approve April 30, 2026 18:53 — with GitHub Actions Inactive

serial

dc4eff3

mohamedzeidan2021 had a problem deploying to auto-approve April 30, 2026 20:33 — with GitHub Actions Error

mohamedzeidan2021 temporarily deployed to auto-approve April 30, 2026 20:33 — with GitHub Actions Inactive

Merge branch 'master' into master

eb3473f

mohamedzeidan2021 temporarily deployed to auto-approve April 30, 2026 20:33 — with GitHub Actions Inactive

mohamedzeidan2021 temporarily deployed to auto-approve April 30, 2026 20:34 — with GitHub Actions Inactive

only keep instance type fix

c9743cf

mohamedzeidan2021 temporarily deployed to auto-approve April 30, 2026 21:10 — with GitHub Actions Inactive

added serial back since it worked

9de4f23

mohamedzeidan2021 temporarily deployed to auto-approve April 30, 2026 22:00 — with GitHub Actions Inactive

mohamedzeidan2021 temporarily deployed to auto-approve April 30, 2026 22:01 — with GitHub Actions Inactive

mujtaba1747 previously approved these changes May 1, 2026

View reviewed changes

mohamedzeidan2021 dismissed mujtaba1747’s stale review via 6d1237a May 1, 2026 17:25

mohamedzeidan2021 temporarily deployed to auto-approve May 1, 2026 17:25 — with GitHub Actions Inactive

mohamedzeidan2021 force-pushed the master branch from 6d1237a to 9de4f23 Compare May 1, 2026 17:28

mohamedzeidan2021 temporarily deployed to auto-approve May 1, 2026 17:29 — with GitHub Actions Inactive

mollyheamazon approved these changes May 1, 2026

View reviewed changes

aviruthen approved these changes May 1, 2026

View reviewed changes

mohamedzeidan2021 merged commit 1885e4c into aws:master May 1, 2026
73 of 108 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix failing train tests for v3#5814

fix: Fix failing train tests for v3#5814
mohamedzeidan2021 merged 5 commits intoaws:masterfrom
mohamedzeidan2021:master

mohamedzeidan2021 commented Apr 30, 2026 •

edited

Loading

Uh oh!

mujtaba1747 May 1, 2026

Uh oh!

mohamedzeidan2021 commented May 1, 2026 •

edited

Loading

Uh oh!

mollyheamazon commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		}


		@pytest.mark.serial

Conversation

mohamedzeidan2021 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mujtaba1747 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

mohamedzeidan2021 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mollyheamazon commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mohamedzeidan2021 commented Apr 30, 2026 •

edited

Loading

mohamedzeidan2021 commented May 1, 2026 •

edited

Loading