feat(blog): MiniMax M3 day-0 — H200 beats B200 at low concurrency on vLLM FP8 by Oseltamivir · Pull Request #454 · SemiAnalysisAI/InferenceX-app

Oseltamivir · 2026-06-13T08:10:15Z

Summary

Day-0 MiniMax M3 benchmark post for the InferenceX blog. Anchors the headline on the H200-vs-B200 low-concurrency inversion seen in the launch-window discussion and answers why it happens.

Headline: On vLLM FP8, 1024/1024, non-MTP, H200 delivers up to 3.5x the throughput per GPU of B200 at 70 tok/s/user. At matched recipe (TP=8, conc 4) H200 runs 113 tok/s/GPU @ 8.4ms TPOT vs B200's 59 @ 16.2ms — 1.9x throughput, half the latency, on the weaker chip.

Root cause (the learning): vLLM defaults Blackwell's FP8 block-scale MoE GEMM to DeepGEMM (large-batch-tuned, high fixed-latency floor at small batch); Hopper runs Marlin (low-concurrency-tuned). Identical M3 weights / MSA / routing on both SKUs — the low-batch spread is kernel selection. On-paper specs confirm it's software: B200 carries 2.27x H200's dense FP8 FLOPS and 1.67x the HBM bandwidth yet returns half the tokens at low batch. Fix in flight: flashinfer PR #3504 (MXFP8 MoE SwiGLU gated-activation params).

Honest crossover: the 8192/1024 table shows H200 winning conc 4–16 and B200 retaking conc 32–64 (1.34x at conc 64) where DeepGEMM amortizes. MTP raises the crossover to ~60 tok/s/user.

Data provenance

MCP DB tools and the GitHub data dump were unavailable (dump predates M3), so numbers were pulled live from the production API (/api/v1/benchmarks?model=MiniMax-M3, run 27451860491, date 2026-06-13). Iso-interactivity figures run through the bundled spline helper so they match the dashboard chart.

⚠️ Before merge

Chart images not yet added — packages/app/public/images/minimax-m3-vllm-fp8-h200-vs-b200-low-concurrency/benchmark-{light,dark}.png. The two <Figure> blocks reference these paths; they will 404 on the preview until the PNGs are dropped in.
Verify the engineer attributions in Acknowledgments (Roger Wang, Thien, Yongye Zhu).

Overlay support: N/A — this is a static MDX blog post, not an inference-chart feature.

Note

Low Risk
Content-only addition (static MDX); no application logic, auth, or data pipeline changes. Main merge risk is broken image paths until assets are committed.

Overview
Adds a new InferenceX blog post (minimax-m3-vllm-fp8-h200-vs-b200-low-concurrency.mdx) covering day-0 MiniMax M3 benchmarks on vLLM FP8 (2026-06-13 run), with the headline that H200 can deliver up to ~3.5× B200 throughput per GPU in the low-interactivity / low-concurrency regime while B200/B300 still win at high batch.

The post explains the inversion as a vLLM kernel-selection gap: Blackwell defaults FP8 block-scale MoE to DeepGEMM (high fixed latency at small batch) vs Hopper’s Marlin path, with identical M3 weights. It includes TP=8 tables for 1024/1024 and 8192/1024, iso-interactivity throughput and $/M token comparisons, MTP crossover notes, links to InferenceX and flashinfer PR #3504, plus DashboardCTA, Figure assets (light/dark, including 8k/1k charts), and FAQ JsonLd.

Before merge: chart PNGs under public/images/minimax-m3-vllm-fp8-h200-vs-b200-low-concurrency/ are referenced but not in this diff (preview 404s until added).

^{Reviewed by Cursor Bugbot for commit 6aeb395. Bugbot is set up for automated code reviews on this repo. Configure here.}

…vLLM FP8

vercel · 2026-06-13T08:10:21Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	Jun 13, 2026 8:25am

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e16a580. Configure here.}

cursor · 2026-06-13T08:11:32Z

+|                         50 |     368 |     560 |     278 |     463 |       2.02x |
+|                         60 |     297 |     444 |     179 |     336 |       2.48x |
+|                     **70** | **241** | **371** | **106** | **209** |   **3.50x** |
+|                         80 |     199 |     317 |      32 |      80 |      10.00x |


Iso table ratio mismatch

Medium Severity

At 80 tok/s/user the iso-interactivity table shows H200 / B200 as 10.00x while the same row lists 317 and 32 tok/s/GPU; dividing those printed values gives about 9.91x, so the ratio does not match the displayed throughputs unlike other rows (e.g. 3.50x at 70).

^{Reviewed by Cursor Bugbot for commit e16a580. Configure here.}

…ions

feat(blog): MiniMax M3 day-0 — H200 beats B200 at low concurrency on …

e16a580

…vLLM FP8

Oseltamivir requested a review from adibarra as a code owner June 13, 2026 08:10

vercel Bot deployed to Preview June 13, 2026 08:10 View deployment

cursor Bot reviewed Jun 13, 2026

View reviewed changes

feat(blog): add MiniMax M3 8K/1K throughput chart + align figure capt…

64acb8b

…ions

vercel Bot deployed to Preview June 13, 2026 08:17 View deployment

feat(blog): use clean 1K/1K hero + add 8K/1K chart, no deployment banner

6aeb395

vercel Bot deployed to Preview June 13, 2026 08:25 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blog): MiniMax M3 day-0 — H200 beats B200 at low concurrency on vLLM FP8#454

feat(blog): MiniMax M3 day-0 — H200 beats B200 at low concurrency on vLLM FP8#454
Oseltamivir wants to merge 3 commits into
masterfrom
blog/minimax-m3-vllm-fp8-h200-vs-b200-low-concurrency

Oseltamivir commented Jun 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oseltamivir commented Jun 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Data provenance

⚠️ Before merge

Uh oh!

vercel Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 13, 2026

Choose a reason for hiding this comment

Iso table ratio mismatch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Oseltamivir commented Jun 13, 2026 •

edited by cursor Bot

Loading

vercel Bot commented Jun 13, 2026 •

edited

Loading