-
Notifications
You must be signed in to change notification settings - Fork 808
feat(providers): add Docker Model Runner provider #1312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| // SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| use crate::{DiscoveredProvider, ProviderError, ProviderPlugin}; | ||
|
|
||
| pub struct ModelRunnerProvider; | ||
|
|
||
| impl ProviderPlugin for ModelRunnerProvider { | ||
| fn id(&self) -> &'static str { | ||
| "model-runner" | ||
| } | ||
|
|
||
| fn discover_existing(&self) -> Result<Option<DiscoveredProvider>, ProviderError> { | ||
| Ok(Some(DiscoveredProvider::default())) | ||
| } | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::ModelRunnerProvider; | ||
| use crate::ProviderPlugin; | ||
|
|
||
| #[test] | ||
| fn model_runner_provider_id_is_correct() { | ||
| assert_eq!(ModelRunnerProvider.id(), "model-runner"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn model_runner_discover_returns_default_provider() { | ||
| let result = ModelRunnerProvider | ||
| .discover_existing() | ||
| .expect("discovery should succeed"); | ||
| assert!(result.is_some()); | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
128 changes: 128 additions & 0 deletions
128
docs/get-started/tutorials/inference-docker-model-runner.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| --- | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| title: "Run Local Inference with Docker Model Runner" | ||
| sidebar-title: "Inference with Docker Model Runner" | ||
| slug: "get-started/tutorials/inference-docker-model-runner" | ||
| description: "Route sandbox inference requests to Docker Model Runner running on your host machine using the built-in model-runner provider type." | ||
| keywords: "Generative AI, Cybersecurity, Tutorial, Inference Routing, Docker Model Runner, Local Inference, Sandbox" | ||
| --- | ||
|
|
||
| This tutorial shows how to route `inference.local` requests from OpenShell sandboxes to Docker Model Runner on your host machine. | ||
|
|
||
| Docker Model Runner is built into Docker Desktop. It runs models locally using the OpenAI-compatible API with no external service or API key required. | ||
|
|
||
| After completing this tutorial, you will know how to: | ||
|
|
||
| - Pull and run a model with Docker Model Runner. | ||
| - Create a `model-runner` provider in OpenShell. | ||
| - Set Docker Model Runner as the `inference.local` backend. | ||
| - Verify inference from inside a sandbox. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A working OpenShell installation. Complete the [Quickstart](/get-started/quickstart) before proceeding. | ||
| - Docker Desktop with Docker Model Runner enabled (Docker Desktop 4.40 or later). | ||
|
|
||
| ## Verify Docker Model Runner Is Available | ||
|
|
||
| Confirm Docker Model Runner is running on your host: | ||
|
|
||
| ```shell | ||
| docker model version | ||
| ``` | ||
|
|
||
| If Docker Model Runner is not available, upgrade Docker Desktop or enable the feature in Docker Desktop settings under the **Beta Features** tab. | ||
|
|
||
| <Steps toc={true}> | ||
|
|
||
| ### Pull a Model | ||
|
|
||
| Pull a model to use for inference. A small model is a good starting point: | ||
|
|
||
| ```shell | ||
| docker model pull ai/smollm2 | ||
| ``` | ||
|
|
||
| Verify the model is available: | ||
|
|
||
| ```shell | ||
| docker model list | ||
| ``` | ||
|
|
||
| ### Create a Provider | ||
|
|
||
| Create a `model-runner` provider. No credentials are needed because Docker Model Runner is accessed over the Docker-internal network: | ||
|
|
||
| ```shell | ||
| openshell provider create --name model-runner --type model-runner | ||
| ``` | ||
|
|
||
| ### Set Inference Routing | ||
|
|
||
| Point `inference.local` at the model-runner provider and choose a model: | ||
|
|
||
| ```shell | ||
| openshell inference set --provider model-runner --model ai/smollm2 | ||
| ``` | ||
|
|
||
| OpenShell will verify that the upstream endpoint is reachable before saving. If the model has not fully loaded yet, wait a few seconds and retry. | ||
|
|
||
| Confirm: | ||
|
|
||
| ```shell | ||
| openshell inference get | ||
| ``` | ||
|
|
||
| ### Verify from a Sandbox | ||
|
|
||
| Run a request through `https://inference.local`: | ||
|
|
||
| ```shell | ||
| openshell sandbox create -- \ | ||
| curl https://inference.local/v1/chat/completions \ | ||
| --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}' | ||
| ``` | ||
|
|
||
| A JSON response from the model confirms end-to-end connectivity. | ||
|
|
||
| </Steps> | ||
|
|
||
| ## Model Recommendations | ||
|
|
||
| | Use case | Model | Notes | | ||
| |---|---|---| | ||
| | Smoke test | `ai/smollm2` | Small, fast, good for verifying setup | | ||
| | Coding and reasoning | `ai/llama3.2` | Strong general-purpose model | | ||
| | Chat | `ai/gemma3` | Lightweight with good instruction following | | ||
|
|
||
| Search for additional models with: | ||
|
|
||
| ```shell | ||
| docker model search <query> | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| Common issues and fixes: | ||
|
|
||
| - **`docker model version` fails** — Docker Desktop is not running or Docker Model Runner is disabled. Enable it in Docker Desktop settings. | ||
| - **`openshell inference set` fails with connection refused** — The model may still be loading. Run `docker model ps` to check. If no model is loaded, run `docker model run --detach ai/smollm2` to pre-load it. | ||
| - **Model not found** — Run `docker model list` to confirm the model is present. Run `docker model pull <model>` if needed. | ||
| - **HTTPS vs HTTP** — Code inside sandboxes must call `https://inference.local`, not `http://`. | ||
|
|
||
| Useful commands: | ||
|
|
||
| ```shell | ||
| openshell status | ||
| openshell inference get | ||
| openshell provider get model-runner | ||
| docker model ps | ||
| docker model list | ||
| ``` | ||
|
|
||
| ## Next Steps | ||
|
|
||
| - To learn more about managed inference, refer to [Inference Routing](/sandboxes/inference-routing). | ||
| - To configure a different self-hosted backend, refer to [Inference Routing](/sandboxes/inference-routing#configure-inference-routing). | ||
| - To learn how to use Ollama for local inference, refer to [Inference with Ollama](/get-started/tutorials/inference-ollama). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| id: model-runner | ||
| display_name: Docker Model Runner | ||
| description: Local AI inference via Docker Model Runner | ||
| category: inference | ||
| inference_capable: true | ||
| endpoints: | ||
| - host: model-runner.docker.internal | ||
| port: 80 | ||
| protocol: rest | ||
| access: read-write | ||
| enforcement: enforce | ||
| binaries: [/usr/local/bin/docker] | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is docker making the API call or would it be a model harness, agent workload, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this instance it's the inference provider. Docker model runner is effectively an alternative to ollama.