Skip to content

feat(lib): add get_model_capabilities for runtime parameter routing#3124

Open
SinhSinhAn wants to merge 2 commits intoopenai:mainfrom
SinhSinhAn:feat/model-capabilities
Open

feat(lib): add get_model_capabilities for runtime parameter routing#3124
SinhSinhAn wants to merge 2 commits intoopenai:mainfrom
SinhSinhAn:feat/model-capabilities

Conversation

@SinhSinhAn
Copy link
Copy Markdown

Summary

Closes #3073.

Add a hand-curated model capability registry so applications that support multiple OpenAI models can decide which parameters to send (and which UI controls to render) without prefix-matching model strings themselves. Lives in src/openai/lib/, which CONTRIBUTING.md designates as the area Stainless will not regenerate.

from openai import get_model_capabilities

caps = get_model_capabilities("gpt-5.4-mini")
caps.family                      # "gpt-5.4"
caps.supports_temperature        # False
caps.supports_reasoning          # True
caps.reasoning_effort_options    # ('none', 'minimal', 'low', 'medium', 'high', 'xhigh')

get_model_capabilities("gpt-4.1").supports_reasoning              # False
get_model_capabilities("gpt-5-chat-latest").supports_temperature  # True (chat variant)
get_model_capabilities("nonexistent-model")                       # None

Background

#3073 (filed by @pamelafox while migrating azure-search-openai-demo to the Responses API) describes the current pain: applications targeting multiple model families maintain hardcoded checks like

def is_reasoning_model(model: str) -> bool:
    return model.startswith("gpt-5")

These break every time a new family launches, and the version-based logic for reasoning_effort levels is reverse-engineered from documentation rather than provided by the SDK.

What this PR adds

  • ModelCapabilities: a frozen dataclass with family, supports_temperature, supports_reasoning, and reasoning_effort_options fields.
  • get_model_capabilities(model: str) -> ModelCapabilities | None: looks up a model identifier and returns the matching capabilities, or None if the model is not registered.

Both symbols are re-exported from openai (matching the pydantic_function_tool pattern) and from openai.lib.

Lookup behaviour

  • Longest-prefix matching so "gpt-5.4" beats "gpt-5", and "o1-pro" beats "o1".
  • Date-suffix aware: "gpt-5.4-mini-2026-03-17" resolves to the gpt-5.4 family.
  • Size-variant aware: -mini, -nano, -pro are handled automatically by prefix matching (no special-casing).
  • Chat / search variants: *-chat-latest and *-search-preview models override their family defaults to (temperature: True, reasoning: False). The family field still reports the underlying family so callers can group variants together.
  • Unknown models return None. Callers should treat that as "fall back to the API's own validation" — send your default parameters and handle a 400 if the model rejects them.

Effort scales encoded

Family Effort options
o1, o3, o4-mini low, medium, high
gpt-5 minimal, low, medium, high
gpt-5.1, gpt-5.2, gpt-5.3 none, minimal, low, medium, high
gpt-5.4+ none, minimal, low, medium, high, xhigh

This matches the canonical ReasoningEffort literal in types/shared/reasoning_effort.py.

Files changed

  • src/openai/lib/_models.py (new, 200 lines): the registry and lookup implementation
  • src/openai/lib/__init__.py: re-export the public symbols
  • src/openai/__init__.py: re-export at the top level (matching pydantic_function_tool)
  • tests/lib/test_model_capabilities.py (new, 35 tests): family-by-family coverage, edge cases, and a realistic "route parameters by model" usage scenario
  • helpers.md: documentation for the public surface

Test plan

  • pytest tests/lib/ — 140 tests pass (35 new + 105 existing)
  • ruff check and ruff format --check clean on touched files
  • pyright clean on touched files
  • mypy reports no new errors on touched files (12 pre-existing errors in unrelated files)
  • python -c "import openai" works
  • All four scenarios from issue Add convenience methods for detecting model parameter support (temperature, reasoning) #3073 verified:
    • gpt-4.1-mini -> supports temperature, no reasoning
    • gpt-5 -> rejects temperature, effort minimal/low/medium/high
    • gpt-5.1 -> adds none
    • gpt-5.4 -> adds xhigh

Design notes

I considered exposing a stringly-typed API like openai.models.supports(model, "temperature") (one of the alternatives in the issue), but the dataclass approach is more Pythonic, plays better with type checkers, and avoids adding string-keyed parameter names to the public surface that would themselves drift over time. Field access is also strictly more discoverable in IDEs.

The registry is hand-maintained (not derived from a server-side source) for the same reason _old_api.py and the Azure helpers are: capability metadata is documented behaviour, not OpenAPI schema. New families need a corresponding entry in _FAMILIES. I kept the surface narrow (4 fields) to minimise that maintenance burden — vision/audio/structured-output capabilities can be added later without breaking existing callers thanks to the dataclass shape.

Applications that support multiple OpenAI models (gpt-4.1, gpt-5.x,
o-series, etc.) currently maintain prefix checks like
`model.startswith("gpt-5")` to decide which parameters to send. These
checks break every time a new family launches, and the version-based
logic for reasoning_effort levels is reverse-engineered from
documentation rather than provided by the SDK.

Add a hand-curated capability registry under `openai.lib._models`
exposing two public symbols:

- `ModelCapabilities`: a frozen dataclass with `family`,
  `supports_temperature`, `supports_reasoning`, and
  `reasoning_effort_options` fields.
- `get_model_capabilities(model)`: looks up a model identifier and
  returns the matching `ModelCapabilities`, or `None` if unknown.

The lookup is date-suffix-aware (gpt-5.4-mini-2026-03-17 -> gpt-5.4),
recognises size variants (-mini, -nano, -pro), and treats *-chat-latest
and *-search-preview models as non-reasoning chat variants.

Both symbols are re-exported from `openai` and `openai.lib`.

Includes 35 unit tests covering each family, edge cases (unknown
models, non-string inputs, frozen instances), and the realistic
"route temperature/effort decisions by model" use case from the
issue. helpers.md documents the new public surface.

The registry lives in `src/openai/lib/`, which CONTRIBUTING.md
designates as the area Stainless will not regenerate.

Closes openai#3073
@SinhSinhAn SinhSinhAn requested a review from a team as a code owner April 25, 2026 16:13
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3b79e58bf6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/openai/lib/_models.py Outdated
# beats "o1".
best: Optional[ModelCapabilities] = None
for entry in _FAMILIES:
if not candidate.startswith(entry.family):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce family boundary in prefix matching

Using candidate.startswith(entry.family) without a separator check causes unknown model IDs to be treated as known families, which breaks the documented None fallback path. For example, gpt-5.10 currently matches gpt-5.1, and typos like o1-previewed match o1-preview, so callers may route parameters based on incorrect capabilities instead of treating the model as unknown.

Useful? React with 👍 / 👎.

Comment thread src/openai/lib/_models.py Outdated
# Effort scales reused across families.
_EFFORT_O_SERIES: Tuple[ReasoningEffort, ...] = ("low", "medium", "high")
_EFFORT_GPT5: Tuple[ReasoningEffort, ...] = ("minimal", "low", "medium", "high")
_EFFORT_GPT5_1: Tuple[ReasoningEffort, ...] = ("none", "minimal", "low", "medium", "high")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Align gpt-5.1 effort options with API-supported values

The gpt-5.1 capability tuple includes "minimal", but the generated API docs in this repo (src/openai/types/shared/reasoning.py, lines 27-29) state gpt-5.1 supports none, low, medium, and high. Returning minimal here can lead consumers to render/send an unsupported effort value and trigger avoidable 400 responses.

Useful? React with 👍 / 👎.

Two P1 corrections from the automated Codex review on PR openai#3124:

1. Enforce segment boundary in prefix matching.

   `startswith()` was matching unknown identifiers like `gpt-5.10` as
   `gpt-5.1` and `o1-previewed` as `o1-preview`, causing the documented
   `None` fallback to be skipped. The lookup now requires the family
   prefix to either equal the model exactly or be followed by a `-`.

2. Align gpt-5 family effort options with the canonical SDK docs in
   `src/openai/types/shared/reasoning.py`:

   - gpt-5.1 (and -codex / -codex-max / -mini): `none/low/medium/high`
     -- removed `minimal`, which would have surfaced an unsupported
     value and triggered avoidable 400 responses.
   - gpt-5.2 / gpt-5.3 / gpt-5.4: `none/low/medium/high/xhigh`
     (`xhigh` is supported for models *after* gpt-5.1-codex-max).
   - gpt-5-pro: split into its own family with the high-only effort
     scale per the SDK docstring ("defaults to and only supports high").

Other improvements:

- Chat / search variant detection now matches dated snapshots like
  `gpt-4o-search-preview-2025-03-11` via a regex instead of a strict
  `endswith` check.
- New `TestBoundaryMatching` class with parametrized regression tests
  for prefix-collision cases (`gpt-5.10`, `gpt-5.1foo`, `o1-previewed`)
  and dated chat / search variants.
- New `TestGpt5ProFamily` class covering the high-only effort scale.

49 of 49 new tests pass; 154 of 154 lib tests pass overall.
@SinhSinhAn
Copy link
Copy Markdown
Author

SinhSinhAn commented Apr 25, 2026

Both P1 issues are addressed in 2171fb7:

P1 #1 — Family boundary in prefix matching. Replaced the bare startswith with a segment-aware check that requires the registered prefix to either equal the model exactly or be followed by -. Verified:

get_model_capabilities("gpt-5.10")    # None
get_model_capabilities("gpt-5.1foo")  # None
get_model_capabilities("o1-previewed")  # falls through to broader "o1" family, not "o1-preview"

P1 #2 — gpt-5.1 effort options. Re-read src/openai/types/shared/reasoning.py as the canonical source and corrected the registry:

  • gpt-5.1 (and -codex, -codex-max, -mini): ('none', 'low', 'medium', 'high') — removed minimal
  • gpt-5.2 / gpt-5.3 / gpt-5.4: ('none', 'low', 'medium', 'high', 'xhigh')xhigh is supported for models after gpt-5.1-codex-max per the docstring
  • gpt-5 base (unchanged): ('minimal', 'low', 'medium', 'high')

While I was in the file, I also addressed two related issues raised by the same logic chain:

  • gpt-5-pro split into its own family with the high-only effort scale (('high',)), matching "The gpt-5-pro model defaults to (and only supports) high reasoning effort."
  • Dated chat/search variants like gpt-4o-search-preview-2025-03-11 are now recognized correctly via a regex that tolerates a trailing -YYYY-MM-DD snapshot.

Added a TestBoundaryMatching class with parametrized regression tests for the prefix-collision cases and a TestGpt5ProFamily class for the high-only path. Test count: 49 new (up from 35), all passing; 154 lib tests pass overall, lint/pyright/format all clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add convenience methods for detecting model parameter support (temperature, reasoning)

1 participant