Skip to content

feat(wan22): add WAN 2.2 text-to-video adapter and dataset for MLPerf inference #293

Open
wu6u3tw wants to merge 15 commits intomlcommons:mainfrom
wu6u3tw:feat/wan22
Open

feat(wan22): add WAN 2.2 text-to-video adapter and dataset for MLPerf inference #293
wu6u3tw wants to merge 15 commits intomlcommons:mainfrom
wu6u3tw:feat/wan22

Conversation

@wu6u3tw
Copy link
Copy Markdown
Collaborator

@wu6u3tw wu6u3tw commented Apr 22, 2026

Summary

  • Adds a new wan22 module with Wan22Adapter, Wan22Accumulator, Wan22Dataset, and Pydantic wire types for the trtllm-serve POST /v1/videos/generations endpoint.
  • Wan22Adapter uses response_format=video_path: the server saves the encoded video to shared storage (Lustre) and returns only the file path, avoiding 3–5 MB of base64 video bytes per request over
    HTTP and ZMQ transport.
  • Wan22Dataset loads MLPerf WAN2.2 prompt text files (one prompt per line); dataset ID wan22_mlperf is registered with DataLoaderFactory for --dataset CLI use.
  • Registers APIType.WAN22 and wires Wan22Adapter/Wan22Accumulator into HTTPClientConfig.
  • Adds an example offline benchmark YAML for Lyris (GB200/GB300) targeting a local trtllm-serve instance with MLPerf-standard params (720×1280, 5 s, 20 steps, guidance 4.0/3.0, seed 42, 248 prompts).
  • Fixes with_updates() to reset adapter and accumulator when api_type changes.

Test plan

  • pytest -m unit tests/unit/wan22/ — adapter, dataset, factory, types, init, registration unit tests
  • pytest -m integration tests/integration/wan22/ — adapter round-trip with mock server
  • pre-commit run --all-files passes clean
  • Offline benchmark runs end-to-end against a live trtllm-serve instance on Lyris (manual)

What does this PR do?

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@wu6u3tw wu6u3tw requested a review from a team April 22, 2026 22:25
@wu6u3tw wu6u3tw self-assigned this Apr 22, 2026
@wu6u3tw wu6u3tw marked this pull request as draft April 22, 2026 22:25
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the WAN2.2 MLPerf text-to-video benchmark, including a new adapter, dataset loader, and associated Pydantic models. It also adds comprehensive documentation and an example configuration for running benchmarks on Lyris. The review feedback identifies several critical omissions and inconsistencies: the VideoPathRequest and Wan22Dataset are missing the latent_path field required for MLPerf reproducibility, and there is a mismatch between the adapter implementation and unit tests regarding the response_format and handling of VideoPayloadResponse. Additionally, the feedback suggests using None as a default for negative_prompt to allow server-side defaults and injecting the canonical MLPerf negative prompt into the dataset.

Comment thread src/inference_endpoint/videogen/types.py Outdated
Comment thread src/inference_endpoint/videogen/adapter.py Outdated
Comment thread src/inference_endpoint/videogen/adapter.py
Comment thread src/inference_endpoint/videogen/dataset.py Outdated
@wu6u3tw wu6u3tw force-pushed the feat/wan22 branch 2 times, most recently from b0878b5 to 3230fec Compare April 23, 2026 22:22
@wu6u3tw wu6u3tw marked this pull request as ready for review April 30, 2026 02:24
wu6u3tw added 5 commits April 29, 2026 19:32
…er, Wan22Dataset→VideoGenDataset

- Rename src/inference_endpoint/wan22/ → videogen/
- Rename tests/unit/wan22/ → tests/unit/videogen/
- Rename tests/integration/wan22/ → tests/integration/videogen/
- APIType.WAN22 → APIType.VIDEOGEN ("wan22" → "videogen")
- Wan22Adapter → VideoGenAdapter
- Wan22Dataset → VideoGenDataset
- Wan22Accumulator → VideoGenAccumulator
- Update all imports, maps, __all__, tests, docs, and example yaml
- Keep dataset_id="wan22_mlperf" and model_params.name="wan22" (MLPerf identifiers)
…bytes)

encode_query: response_format defaults to "video_bytes" but can be overridden
via query.data["response_format"] = "video_path" for Lustre-path mode.

decode_response: dispatches on response shape — "video_bytes" key → VideoPayloadResponse,
otherwise → VideoPathResponse.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arekay-nv Quick Q: We don't want to have this jsonl file in the PR, right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here should be fine given that it is relatively small.

Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a quick pass - will make another later.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this file to pass pre-commit.


settings:
runtime:
min_duration_ms: 60000 # 1 minute warm-up
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the warmup duration as this is counted in the performance runs.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed. Alternatively what would be nice to have here is instructions on which model (HF link) to use and how to launch a server instance as well as the benchmark.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here should be fine given that it is relatively small.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this file as well - likely related to the original datasets folder.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert.

import pytest
from aiohttp import web

DUMMY_VIDEO_PATH = "/lustre/videos/mock_video_001.mp4"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use temp dir fixture.


@classmethod
def decode_sse_message(cls, json_bytes: bytes) -> str:
raise NotImplementedError("WAN 2.2 does not use SSE streaming")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use model name or adapter name here - might be run with a different model.

Comment on lines +114 to +120
WAN 2.2 uses non-streaming HTTP. This class exists only to satisfy
the HTTPClientConfig.accumulator type contract.
"""

def __init__(self, query_id: str, stream_all_chunks: bool) -> None:
self.query_id = query_id
# stream_all_chunks is intentionally ignored: WAN 2.2 is non-streaming.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same - not use Wan2.2.

DUMMY_VIDEO_BYTES = b"\x00\x00\x00\x20ftypmp42" + b"\x00" * 24


class MockTrtllmServe:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reuse the existing echo server with a new video-gen route. That will keep things simple and reduce replication.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds WAN 2.2 text-to-video (trtllm-serve) support to the inference-endpoint client by introducing a new videogen module (adapter + wire types + dataset) and wiring it into the existing endpoint client + dataset loader factory.

Changes:

  • Introduce APIType.VIDEOGEN with VideoGenAdapter/VideoGenAccumulator and Pydantic request/response wire models for POST /v1/videos/generations.
  • Add VideoGenDataset registered as predefined dataset wan22_mlperf, plus unit/integration tests and example offline benchmark config/scripts.
  • Update dataset factory + templates/docs to recognize the new workload.

Reviewed changes

Copilot reviewed 30 out of 32 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
uv.lock Bumps transformers and adds videogen extras to lock metadata.
pyproject.toml Adds a videogen optional dependency group (currently empty).
src/inference_endpoint/videogen/types.py Adds Pydantic wire models for video generation request/response.
src/inference_endpoint/videogen/adapter.py Adds HTTP adapter + no-op accumulator for non-streaming video endpoint.
src/inference_endpoint/videogen/dataset.py Adds predefined dataset wan22_mlperf (prompt text file loader).
src/inference_endpoint/videogen/init.py Exposes videogen public API symbols.
src/inference_endpoint/core/types.py Registers APIType.VIDEOGEN and default route /v1/videos/generations.
src/inference_endpoint/endpoint_client/config.py Wires adapter/accumulator into ADAPTER_MAP/ACCUMULATOR_MAP.
src/inference_endpoint/dataset_manager/factory.py Modifies predefined dataset loader invocation to pass path=.
src/inference_endpoint/dataset_manager/init.py Imports VideoGenDataset so it registers into Dataset.PREDEFINED.
src/inference_endpoint/dataset_manager/dataset.py Adds mypy ignore on datasets imports.
src/inference_endpoint/dataset_manager/predefined/shopify_product_catalogue/init.py Adds mypy ignore on datasets import.
src/inference_endpoint/evaluation/livecodebench/generate.py Adds mypy ignore on datasets import.
src/inference_endpoint/config/templates/online_template_full.yaml Updates documented api_type options to include videogen.
src/inference_endpoint/config/templates/offline_template_full.yaml Updates documented api_type options to include videogen.
src/inference_endpoint/config/templates/concurrency_template_full.yaml Updates documented api_type options to include videogen.
tests/unit/videogen/test_types.py Unit tests for videogen Pydantic wire models.
tests/unit/videogen/test_adapter.py Unit tests for adapter encode/decode behavior and accumulator contract.
tests/unit/videogen/test_dataset.py Unit tests for dataset loading + sample shaping.
tests/unit/videogen/test_factory.py Unit tests asserting factory can create the videogen predefined dataset.
tests/unit/videogen/test_registration.py Unit tests for enum + adapter/accumulator registration.
tests/unit/videogen/test_init.py Unit tests for videogen module public exports.
tests/unit/videogen/init.py Test package init (licensing header).
tests/integration/videogen/conftest.py Adds aiohttp mock trtllm-serve fixtures for adapter integration tests.
tests/integration/videogen/test_adapter.py Integration tests for encode→POST→decode round-trip and error cases.
tests/integration/videogen/init.py Test package init (licensing header).
examples/09_Wan22_VideoGen_Example/offline_wan22.yaml Example offline benchmark config targeting videogen endpoint.
examples/09_Wan22_VideoGen_Example/setup_and_test.sh Example script to set up venv and run videogen tests.
examples/09_Wan22_VideoGen_Example/wan22_prompts.jsonl Bundled 248-prompt dataset artifact for the example benchmark.
endpoints_changed.md Design summary / documentation for WAN2.2 videogen integration.
AGENTS.md Adds videogen module to repo architecture documentation.
.gitignore Ignores .worktrees/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- http://localhost:8000
api_key: null # API key
api_type: openai # API type: openai or sglang | options: openai, sglang
api_type: openai # API type: openai or sglang | options: openai, sglang, videogen
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment still says “API type: openai or sglang” even though videogen is now supported. Update the comment text to avoid contradicting the listed options.

Suggested change
api_type: openai # API type: openai or sglang | options: openai, sglang, videogen
api_type: openai # API type: openai, sglang, or videogen | options: openai, sglang, videogen

Copilot uses AI. Check for mistakes.
- http://localhost:8000
api_key: null # API key
api_type: openai # API type: openai or sglang | options: openai, sglang
api_type: openai # API type: openai or sglang | options: openai, sglang, videogen
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment still says “API type: openai or sglang” even though videogen is now supported. Update the comment text to avoid contradicting the listed options.

Suggested change
api_type: openai # API type: openai or sglang | options: openai, sglang, videogen
api_type: openai # API type: openai, sglang, or videogen | options: openai, sglang, videogen

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +80
# exclude_none so optional fields fall back to server-side defaults
# (MLPerf: omit negative_prompt and latent_path unless explicitly set).
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here says “MLPerf: omit negative_prompt and latent_path unless explicitly set”, but this PR’s VideoGenDataset injects the MLPerf canonical negative_prompt by default (i.e., it is explicitly set for the common path). Consider rewording this comment to reflect the actual behavior: fields are omitted only when their value is None in query.data.

Suggested change
# exclude_none so optional fields fall back to server-side defaults
# (MLPerf: omit negative_prompt and latent_path unless explicitly set).
# exclude_none so optional fields with value None fall back to
# server-side defaults. In particular, negative_prompt and
# latent_path are omitted only when their value in query.data is None.

Copilot uses AI. Check for mistakes.
Comment on lines 83 to 87
return ds_cls.get_dataloader(
transforms=preset_transforms,
num_repeats=num_repeats,
path=dataset_path,
**kwargs,
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataLoaderFactory now always passes path=dataset_path into ds_cls.get_dataloader(...) for all predefined datasets. Most predefined datasets use the base Dataset.get_dataloader(), which forwards **kwargs into cls.generate(...); many generate() implementations (e.g. AIME25/CNNDailyMail/GPQA/OpenOrca) do not accept a path kwarg, so this will raise TypeError: generate() got an unexpected keyword argument 'path' whenever those datasets are loaded. Consider only passing path for dataset classes that explicitly accept it (e.g. VideoGenDataset), or mapping config.path to the existing datasets_dir parameter instead of introducing a new kwarg.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +66
query = Query(id="q1", data={"prompt": "a golden retriever running on a beach"})
request_bytes = VideoGenAdapter.encode_query(query)

status, content = _post(
f"{mock_trtllm_serve.url}/v1/videos/generations", request_bytes
)
assert status == 200

result = VideoGenAdapter.decode_response(content, query.id)
assert result.id == "q1"
assert result.error is None
assert "video_bytes" in result.metadata
decoded = base64.b64decode(result.metadata["video_bytes"])
assert decoded == DUMMY_VIDEO_BYTES
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test asserts that a default request returns video_bytes, but VideoGenAdapter.encode_query() defaults response_format to video_path. As written, the test only passes because the mock server always returns video_bytes, so it doesn’t reflect real server behavior. Either set response_format='video_bytes' in query.data for this test, or change the mock + assertions to expect video_path for the default case (and add a separate test for the video_bytes path).

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +33
datasets:
- name: wan22_prompts
path: examples/09_Wan22_VideoGen_Example/wan22_prompts.jsonl
format: jsonl
type: "performance"
samples: 248

Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This YAML says WAN2.2 MLPerf params are baked into VideoGenAdapter defaults and “only prompt is required from the dataset”, but the configured dataset is a JSONL file that (as provided) includes extra fields like negative_prompt/mode. Those fields will be forwarded into query.data and can override adapter defaults (e.g., empty negative_prompt). Consider either switching the example to wan22_mlperf (text prompts) so the adapter truly only receives prompt, or trimming the JSONL to only the fields you intend to send.

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +60
# Fixed latent path is injected into every request via VideoGenDataset.
# Set latent_path in dataset config or pass it when constructing VideoGenDataset.
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments state that a fixed latent path is injected “via VideoGenDataset”, but this example configuration does not use the predefined wan22_mlperf/VideoGenDataset loader (it loads a JSONL file directly). As a result, latent_path will not be injected unless it is explicitly present in the JSONL or added via a transform. Update the comments (or the dataset config) so the example reflects the actual injection mechanism.

Suggested change
# Fixed latent path is injected into every request via VideoGenDataset.
# Set latent_path in dataset config or pass it when constructing VideoGenDataset.
# This example loads prompts directly from JSONL and does not use
# `VideoGenDataset`, so `latent_path` is not injected automatically here.
# If required, include `latent_path` in the JSONL or add it via a transform.

Copilot uses AI. Check for mistakes.
Comment thread tests/integration/videogen/conftest.py Outdated
Comment on lines +86 to +93
return web.json_response(
{
"video_id": video_id,
"video_bytes": base64.b64encode(DUMMY_VIDEO_BYTES).decode(),
}
)


Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MockTrtllmServe claims to support both response_format='video_path' and 'video_bytes', but _handle_sync() always returns a payload containing video_bytes and never returns video_path (nor does it inspect the request’s response_format). This makes the integration suite unable to validate the adapter’s video_path decode branch and can mask regressions. Update the handler to branch on body.get('response_format') and return video_path when requested.

Suggested change
return web.json_response(
{
"video_id": video_id,
"video_bytes": base64.b64encode(DUMMY_VIDEO_BYTES).decode(),
}
)
response_format = body.get("response_format")
response_body = {"video_id": video_id}
if response_format == "video_path":
response_body["video_path"] = DUMMY_VIDEO_PATH
else:
response_body["video_bytes"] = base64.b64encode(DUMMY_VIDEO_BYTES).decode()
return web.json_response(response_body)

Copilot uses AI. Check for mistakes.
Comment thread endpoints_changed.md Outdated
HTTPEndpointClient
│ selects adapter via APIType.WAN22
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design doc refers to APIType.WAN22, but the implementation added in this PR is APIType.VIDEOGEN. This mismatch is likely to confuse readers and makes the doc inaccurate; update the diagram/text to match the actual enum value and wiring.

Suggested change
│ selects adapter via APIType.WAN22
│ selects adapter via APIType.VIDEOGEN

Copilot uses AI. Check for mistakes.
Comment thread endpoints_changed.md Outdated
videogen/
├── __init__.py Public exports
├── types.py Pydantic wire models: VideoPathRequest,
│ VideoPathResponse, VideoPayloadResponse, HealthResponse
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “New Files” section lists a HealthResponse type in videogen/types.py, but no such model is implemented/exported in this PR. Either add the missing type or remove it from the document to keep the doc accurate.

Suggested change
│ VideoPathResponse, VideoPayloadResponse, HealthResponse
│ VideoPathResponse, VideoPayloadResponse

Copilot uses AI. Check for mistakes.
wu6u3tw added 5 commits May 1, 2026 16:09
After rebasing onto origin/main, these files needed updates from:
- ruff lint/format (test imports, line wrapping)
- prettier (markdown table formatting, YAML alignment)
- regenerate-templates (api_type docstring now lists "videogen")
- uv lock refresh (pyproject.toml now has "videogen" optional group)

Also tightens an over-broad pytest.raises(Exception) in the videogen
integration tests to (ValidationError, json.JSONDecodeError) for the
500-response case and json.JSONDecodeError for the malformed-JSON case
(B017).
…t_path

Address MR review feedback (Task 3 of wan22-trtllm-plan.md):

types.py:
- VideoPathRequest.negative_prompt: str = "" -> str | None = None
- Add VideoPathRequest.latent_path: str | None = None field
  (per-request fixed latent tensor path for MLPerf reproducibility)

adapter.py:
- encode_query: read negative_prompt and latent_path with .get() (no
  default), and serialise with model_dump_json(exclude_none=True) so
  optional fields fall back to server-side defaults when absent.

dataset.py:
- Add _MLPERF_NEGATIVE_PROMPT module constant (canonical MLPerf string).
- VideoGenDataset injects this negative prompt into every sample by
  default; pass negative_prompt=None to omit. Accepts latent_path as
  a per-dataset config so all samples share the same fixed latent.
- load() conditionally includes negative_prompt and latent_path in
  each sample dict only when set, so adapter exclude_none does the
  right thing end-to-end.

Tests:
- Update test_types defaults (negative_prompt None, latent_path None)
- Update test_dataset for the canonical negative-prompt default,
  add coverage for negative_prompt=None and latent_path propagation.
- Add adapter tests for exclude_none behaviour and latent_path
  forwarding.

Note for reviewer: the other two review comments (response_format
hardcoded to video_path; decode_response only handling VideoPathResponse)
are stale; both were addressed in 7a1b4d3 "fix: make response_format
optional in VideoGenAdapter (default video_bytes)". Current adapter
already defaults response_format to video_bytes and dispatches in
decode_response on whether "video_bytes" is present in the response.
wu6u3tw added 5 commits May 1, 2026 16:09
Aligns three previously-inconsistent statements about the request default:

- Adapter `encode_query` previously fell back to "video_bytes" when
  query.data did not specify a response_format, but the Pydantic field
  default on VideoPathRequest was "video_path" — the latter was dead
  because the adapter always supplied a value.
- Dataset docstring claimed "always requests video_bytes"; types
  docstring described a perf/accuracy split.

Pick the perf/accuracy split (Option A): default = video_path (perf),
opt-in to video_bytes via query.data["response_format"] (accuracy).

- adapter.py: flip `data.get("response_format", ...)` default to
  "video_path"; rewrite class + encode_query docstrings to match.
- dataset.py: drop the "always requests video_bytes" line.
- test_adapter.py (unit + integration): split the old
  test_encode_query_always_requests_video_bytes test into
  default-is-video_path + accuracy-mode-override tests.

Also rewrite endpoints_changed.md:

- Replace the "always video_path" framing with the dual-mode reality.
- Document VideoPayloadResponse and the decode_response shape dispatch.
- Fix the payload-size claim (300 MB -> 3-5 MB; 300 MB was raw uncompressed).
- Drop stale "Pending" tasks 2 (latent_path -- already wired) and 3
  (negative_prompt None -- already done).
- Update module name `wan22` -> `videogen` and `api_type` example.

Verified on aarch64 GB200: 58 unit+integration videogen tests pass.
…wan22 refs

Move datasets/wan22_prompts.jsonl into the example folder so the example
is self-contained and drop the absolute Lustre path baked into the setup
script.

setup_and_test.sh:
- Remove PROMPTS_TXT (hardcoded /lustre/share/... path) and the entire
  prompts.txt -> JSONL conversion block. The JSONL is now bundled with
  the example, so regeneration from a Lustre source is no longer needed.
- Retarget PROMPTS_JSONL to ${SCRIPT_DIR}/wan22_prompts.jsonl.
- Drop the now-orphaned PYTHON variable (only used by the conversion
  heredoc).
- Fix stale post-rename references that were left over from
  ddac990 (wan22 -> videogen): pip extras [wan22,test] -> [videogen,test]
  and test paths tests/unit/wan22 / tests/integration/wan22 ->
  tests/unit/videogen / tests/integration/videogen. Without these the
  script failed on a fresh setup (no [wan22] extra) and collected zero
  tests.

offline_wan22.yaml: dataset path -> examples/09_Wan22_VideoGen_Example/wan22_prompts.jsonl.

endpoints_changed.md: update bundled-dataset path reference.
…ader

VideoGenDataset duplicated functionality the generic JsonlLoader already
provides, was bugged on JSONL input (read each line as a raw text prompt
instead of parsing JSON), and wasn't actually invoked by the example
config: offline_wan22.yaml uses `name: wan22_prompts`, which doesn't
match its dataset_id (`wan22_mlperf`), so DataLoaderFactory already
routed to JsonlLoader. The class was dead code in the only path the
example exercises.

Bake the MLPerf canonical negative_prompt into every row of the bundled
JSONL so runtime injection is unnecessary, then delete the workload-
specific dataset class.

- examples/09_Wan22_VideoGen_Example/wan22_prompts.jsonl: replace
  empty negative_prompt with canonical MLPerf string in all 248 rows.
- src/inference_endpoint/videogen/dataset.py: deleted.
- src/inference_endpoint/dataset_manager/__init__.py: drop the
  side-effect VideoGenDataset import and __all__ entry.
- tests/unit/videogen/test_dataset.py, test_factory.py: deleted.
- src/inference_endpoint/videogen/types.py: update negative_prompt
  field docstring to point at the bundled JSONL instead of
  VideoGenDataset.
- examples/09_Wan22_VideoGen_Example/offline_wan22.yaml: drop
  `format: jsonl` (the factory tries DatasetFormat("jsonl") and
  crashes because enum values are `.jsonl`; auto-detection from the
  path extension works), and update the comment block.
- endpoints_changed.md: replace the dataset.py / VideoGenDataset
  section with a brief note about the bundled JSONL + JsonlLoader.

Verified on aarch64 GB200:
- pre-commit run --all-files: all hooks pass
- 42 videogen tests pass (down from 58: 14 dataset tests + 3 factory
  tests removed; adapter/types/init/registration tests retained)
- end-to-end smoke: DataLoaderFactory creates a Dataset with 248
  samples, each carrying prompt + canonical negative_prompt;
  VideoGenAdapter.encode_query produces a valid request with
  response_format=video_path.
The `metrics:` top-level block was rejected by `BenchmarkConfig`
(extra="forbid", no `metrics` field in the schema), so loading the
YAML via `BenchmarkConfig.from_yaml_file()` failed validation. The
block had no effect on metrics collection — that's controlled by
`settings.runtime` and the metrics aggregator service.

Caught by an end-to-end functional smoke test that loads the YAML,
runs encode → POST → decode against an inline mock trtllm-serve in
both perf (video_path) and accuracy (video_bytes) modes, and bulk-
encodes all 248 samples.
… simplify adapter, broaden tests

Five small fixes from a pre-review pass:

- factory.py: drop `path=dataset_path` kwarg on the predefined-dataset
  branch. It was added in 9ded0e6 to feed VideoGenDataset (since deleted
  in 6ce6bfa); none of the remaining predefined datasets' generate()
  signatures accept `path`, so any user passing both `name=<predefined>`
  and `path=<file>` would TypeError. Restores the pre-9ded0e6 behavior.
  Verified the videogen example still loads end-to-end (it routes
  through Dataset.load_from_file, not the predefined branch).

- adapter.py encode_query: replace the 12-line data.get() boilerplate
  with `VideoPathRequest.model_validate({k: data[k] for k in known if
  k in data})`. Pydantic applies defaults, eliminating the drift risk
  between adapter.py and types.py. Extra keys in query.data
  (sample_id, sample_index, mode, ...) are now ignored cleanly.

- integration mock: MockTrtllmServe._handle_sync now honors
  body["response_format"] and routes to a VideoPathResponse when the
  request asks for video_path. Previously the mock always returned
  video_bytes regardless of the request, so the integration tests
  never exercised the perf-mode decode branch end-to-end.

- integration tests: add test_perf_mode_round_trip_returns_video_path
  asserting result.metadata == {video_path: ...} via real HTTP. Rename
  the renamed counterpart to test_accuracy_mode_round_trip_returns_video_bytes.
  Replace the misleadingly named test_missing_video_bytes_field_raises_validation_error
  with two targeted tests, one per decode dispatch branch.

- endpoints_changed.md: drop two stale references — `HealthResponse`
  (never existed in types.py) and `APIType.WAN22` (the actual enum is
  APIType.VIDEOGEN). Trim the doc from 155 → 84 lines (~46% reduction)
  by collapsing the architecture diagram and removing repetition;
  factual content is unchanged.

Verified on aarch64 GB200:
- pre-commit run --all-files: all hooks pass
- 43 videogen tests pass (one new perf-mode round-trip test added)
- end-to-end smoke (load YAML → factory → JsonlLoader → adapter →
  mock server) passes both perf and accuracy modes; encode_query
  ignores extra keys cleanly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants