Fix/modelbuilder deploy datacacheconfig 5750#5764
Fix/modelbuilder deploy datacacheconfig 5750#5764aviruthen wants to merge 12 commits intoaws:masterfrom
Conversation
…eateInferenceCom (5750)
|
sagemaker-core and sagemaker-rain integ tests should not be triggered, changes only made to sagemaker-serve module |
| # Return immediately when endpoint is no longer creating. | ||
| # Log fetching below is best-effort and must not block completion. | ||
| return desc |
There was a problem hiding this comment.
Log fetching below is best-effort and must not block completion.
What does this mean ?
Looks like desc will be returned for most cases and not when endpoint is no longer creating
There was a problem hiding this comment.
I was running into issues where the endpoint reached a non-creating status but was not returning from monitoring deployment progress. This change is just meant to emphasize that the moment the endpoint is no longer creating we should return and immediately continue with ModelBuilder logic
| progress_tracker.update_status(endpoint_status) | ||
|
|
||
| # Return desc if we should stop polling | ||
| if stop: |
There was a problem hiding this comment.
Seems like the current fix returns desc for any non-Creating status, including transient states
like Updating or RollingBack. The original stop variable was meant to gate the return on terminal states only.
The issue requests exposing additional CreateInferenceComponent API parameters through ModelBuilder.deploy(), primarily DataCacheConfig, BaseInferenceComponentName, Container specification, and VariantName. The _deploy_core_endpoint method in model_builder.py builds InferenceComponentSpecification but does not pass through these parameters. The sagemaker.core.shapes module already has InferenceComponentDataCacheConfig and related shapes. The fix requires: (1) adding new optional parameters to the deploy() method and _deploy_core_endpoint(), (2) wiring those parameters into the InferenceComponentSpecification and InferenceComponent.create() call, and (3) making variant_name configurable instead of hardcoded to 'AllTraffic'. The deploy wrappers in model_builder_servers.py pass **kwargs through to _deploy_core_endpoint, so they require no changes.
Includes unit and integration tests.