Skip to content

Add multimodal preprocessing metrics#4640

Open
CUHKSZzxy wants to merge 7 commits into
InternLM:mainfrom
CUHKSZzxy:feat/multimodal-metrics
Open

Add multimodal preprocessing metrics#4640
CUHKSZzxy wants to merge 7 commits into
InternLM:mainfrom
CUHKSZzxy:feat/multimodal-metrics

Conversation

@CUHKSZzxy

@CUHKSZzxy CUHKSZzxy commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add multimodal preprocessing metrics for VLM requests, including request/item counters, total preprocessing latency, per-stage latency, per-request item count, and preprocessing failure counters.
  • Record multimodal metrics through the existing metrics processor and Prometheus logger path.
  • Export multimodal detailed metrics by default whenever the metrics system is enabled; no extra CLI flag is required.

Validation

  • Focused multimodal metrics tests passed.
  • Existing metrics logger tests passed.
  • Syntax checks for touched metrics, serve, CLI, and message modules passed.
  • API server help was checked to confirm only the existing metrics switch remains.

Benchmark

  • Local OpenAI-compatible API macrobenchmarks compared this branch with main on a VLM image workload.
  • No throughput regression was observed in the large image payload run; the branch was within expected run-to-run variance.

Terminal Log

When a logging interval includes multimodal preprocessing, the terminal stats line now appends the average multimodal preprocessing latency:

[2026-06-01 12:18:04 Engine 000] Avg thr (in/out): 0.0 / 0.0 tokens/s, Server (succeeded/failed/routed/waiting): 0 / 1 / 0 / 0, Engine (running/waiting): 0 / 0, KV cache: 0.0%, Avg MM preprocess: 0.051 s/req,

Assistance

Assisted with Codex + GPT-5.5 xHigh Fast

Copilot AI review requested due to automatic review settings June 1, 2026 07:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds end-to-end metrics for multimodal (VLM) prompt preprocessing, wiring new stats collection from request processing through the existing metrics processor and Prometheus/console loggers, and documenting the exported metrics.

Changes:

  • Introduces MultimodalStats to track multimodal item counts, per-stage/total preprocessing latency, and failures.
  • Instruments MultimodalProcessor and AsyncEngine.generate() to collect and emit multimodal preprocessing stats via metrics_processor.
  • Extends metrics loggers (Prometheus + periodic console logger) and updates EN/ZH metrics documentation; adds targeted tests for the new multimodal metrics.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_lmdeploy/test_metrics_multimodal.py Adds unit tests covering multimodal stats behavior and Prometheus emission.
lmdeploy/serve/processors/multimodal.py Instruments multimodal parsing + VLM preprocessing stages and threadsafe stats updates.
lmdeploy/serve/core/async_engine.py Creates/records per-request multimodal stats in the request path.
lmdeploy/metrics/stats.py Adds MultimodalStats for multimodal preprocessing accounting.
lmdeploy/metrics/metrics_processor.py Adds record_multimodal() to emit multimodal stats through configured loggers.
lmdeploy/metrics/loggers.py Implements multimodal metric export for Prometheus and aggregates for console logging.
docs/zh_cn/advance/metrics.md Documents newly exported multimodal preprocessing metrics (CN).
docs/en/advance/metrics.md Documents newly exported multimodal preprocessing metrics (EN).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mm_stats.record_failure('media_io', 'unknown')
raise NotImplementedError(f'unknown type: {item_type}')

out_message['content'].append({'type': modality.value, 'data': data, **item_params})

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing behavior from main: the stored value is a string, and Modality.eq supports comparing against that string.

Comment thread lmdeploy/serve/processors/multimodal.py
Comment thread lmdeploy/serve/processors/multimodal.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants