[Metax] support FLASH_ATTN by Tryorish · Pull Request #7914 · PaddlePaddle/FastDeploy

Tryorish · 2026-05-25T08:09:20Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

(cherry picked from commit 8130e7c5a77ba39fdb47cce4db586257a3cf10e0) # Conflicts: # custom_ops/metax_ops/apply_rope_qkv.cu # custom_ops/metax_ops/maca_version.h # fastdeploy/spec_decode/mtp.py # fastdeploy/worker/input_batch.py # fastdeploy/worker/metax_model_runner.py

(cherry picked from commit 49a405b5ab0867d297c1a74643fdf83e3bb1bed5)

support cuda graph (cherry picked from commit f78cbfbe0b69eac20bad4f5b1ed7aec25f12ce73)

(cherry picked from commit a0ca9aef03a1e7fa50a205c1737dcdf084f18685)

(cherry picked from commit e8bfe916642e78ff317a398317b846a7bd448772) # Conflicts: # fastdeploy/envs.py # fastdeploy/worker/input_batch.py

(cherry picked from commit 0890acc6f4f94740c14e9788903ada9bbdaaf469)

(cherry picked from commit 712fd9c106109e54a7cba4e93ee90e8181d87a3d)

paddle-bot · 2026-05-25T08:09:29Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 面向 Metax(MACA) 平台从 rel2.5 迁移，补齐/替换注意力后端与相关自定义算子，并在 Worker/SpecDecode 路径中接入新的 forward meta 与输入缓存字段，以支持新的 FlashAttention/Triton Attention 计算链路。

Changes:

在 Metax 平台新增/切换注意力后端（FlashAttention + Triton），并扩展 MetaxForwardMeta 支持 rotary_embs_bf16。
Worker / MTP 推理链路补充 rope_emb_bf16、routing replay 初始化，以及 MTP reorder/insert 与 index_to_batch_id 的联动。
扩展并接入多份 Metax 自定义算子（RoPE、KV cache 写入、FlashAttention），同时调整 custom ops 编译链接参数。

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
fastdeploy/worker/metax_worker.py	cache 初始化时按配置初始化 routing replay manager
fastdeploy/worker/metax_model_runner.py	切到 `MetaxForwardMeta`，补充 `rope_emb_bf16` 并调整 MTP 调用参数
fastdeploy/worker/input_batch.py	MACA 下禁用部分 pin_memory；ProposerInputBatch 补充 pre_ids/平台判断
fastdeploy/spec_decode/mtp.py	MACA 条件引入 `MetaxForwardMeta` 与 `rope_emb_bf16`
fastdeploy/spec_decode/mtp_cuda.py	MACA 下 forward_meta 使用 `MetaxForwardMeta` 并传入 `rotary_embs_bf16`
fastdeploy/platforms/maca.py	扩展可选注意力后端（FLASH/TRITON），并更新提示文案
fastdeploy/platforms/base.py	`_Backend` 枚举新增 `TRITON_ATTN`
fastdeploy/model_executor/layers/backends/metax/attention/triton_attn_metax_backend.py	新增 Metax Triton 注意力后端（Python 侧封装）
fastdeploy/model_executor/layers/backends/metax/attention/triton_attn_kernels.py	新增 Triton kernel：unified attention（prefill/decode）
fastdeploy/model_executor/layers/backends/metax/attention/flash_attn_metax_backend.py	新增 Metax FlashAttention 后端（split/mix 两种 PD 模式）
fastdeploy/model_executor/layers/backends/metax/init.py	导出新增的 Metax Flash/Triton attention backend
fastdeploy/model_executor/forward_meta.py	新增 `MetaxForwardMeta`，扩展 `rotary_embs_bf16` 字段
fastdeploy/envs.py	新增 Metax FA split 开关与 KV cache lock 开关
custom_ops/setup_ops.py	增加 Metax 新算子源文件与链接库/头文件路径
custom_ops/metax_ops/write_cache_kv.cu	新增：将 K/V 写入 paged KV cache 的算子
custom_ops/metax_ops/write_cache_kv_with_rope.cu	新增：带 RoPE 的写 cache（含 speculate 分支）算子
custom_ops/metax_ops/rotary_position_embedding.cu	新增：可变长/Neox/partial rotary 的 RoPE 算子
custom_ops/metax_ops/flash_attention.cu	新增：对接 mcFlashAttn 的 varlen/kvcache 前向算子
custom_ops/metax_ops/maca_version.h	删除：MACA 版本宏头文件
custom_ops/metax_ops/fused_moe_gemm_kernels.h	移除 MACA_VERSION 条件分支，统一调用参数类型
custom_ops/metax_ops/apply_rope_qkv.cu	删除：旧的 apply_rope_qkv 实现
custom_ops/gpu_ops/gelu_tanh.cu	修正 block 线程数计算（避免超过 1024）

Comments suppressed due to low confidence (1)

custom_ops/metax_ops/flash_attention.cu:400

同上：这里同样没有真正抛出错误，失败时会静默继续执行，可能导致 NaN/越界等后续问题。建议改为 PD_THROW 直接终止并暴露错误码。

  if (status != MCFLASHATTN_STATUS_SUCCESS) {
    phi::errors::External("Error in McFlashAttn, error code is %d", status);
  }

PaddlePaddle-bot · 2026-05-25T09:05:39Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-28 11:03:14

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: c5bf2d1
Merge base: 91ca3d1 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

Required 任务仍有 2 个失败，其中 1 个为覆盖率阈值失败、1 个需要人工 Approval；请优先处理 Required 失败任务后再合入。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	36	6	0	0	0

2 任务状态汇总

日志列说明：失败任务直接使用日志链接，运行中任务使用 Job 链接。

2.1 Required任务 : 8/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	1h25m	PR问题：新增 FLASH_ATTN 分支覆盖率不足	补充 MACA FLASH_ATTN 单测	Job	-
❌	`Approval`	18s	需要 Approval	请通过人工审批	Job	-
✅	其余 8 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 28/32 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	2m18s	Job	-
❌	`Check PR Template`	29s	Job	-
❌	`CI_HPU`	1h4m	Job	-
❌	`Trigger Jenkins for PR`	12s	Job	-
✅	其余 28 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率阈值（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 覆盖率阈值
置信度: 高
根因摘要: 新增 FLASH_ATTN 分支覆盖率不足
分析器: ci_analyze_unittest_fastdeploy

失败用例: 无。日志显示 TEST_EXIT_CODE: 0 且输出 All tests passed，失败发生在覆盖率校验步骤。

根因详情:
diff_coverage.json 显示本次 diff 总覆盖率为 71%，低于 80% 阈值；其中 fastdeploy/platforms/maca.py 覆盖率仅 50%，违规行为 L64-L65。结合源码可见 PR 新增/修改了 MACAPlatform.get_attention_backend_cls() 的 _Backend.FLASH_ATTN 分支，返回 MetaxFlashAttentionBackend，但现有 tests/platforms/test_platforms.py::TestMACAPlatform 仅覆盖 NATIVE/APPEND/INVALID，未覆盖该新增分支。

关键日志:

Coverage generation failed (exit code 9)
GPU Patch Coverage Details:
{"src_stats": {"fastdeploy/platforms/maca.py": {"percent_covered": 50.0,
"violation_lines": [64, 65], "covered_lines": [61, 63]},
"fastdeploy/model_executor/forward_meta.py": {"percent_covered": 100.0}},
"total_num_lines": 7, "total_num_violations": 2,
"total_percent_covered": 71}

修复建议:

在 tests/platforms/test_platforms.py 的 TestMACAPlatform 中补充 FLASH_ATTN 分支单测，例如在 L94 附近新增断言：self.assertIn("MetaxFlashAttentionBackend", MACAPlatform.get_attention_backend_cls(_Backend.FLASH_ATTN))。
修复后重新触发 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage，确认 diff_coverage.json 总覆盖率 ≥ 80%。

修复建议摘要: 补充 MACA FLASH_ATTN 单测

关联变更: fastdeploy/platforms/maca.py L63-L65；建议补测 tests/platforms/test_platforms.py L94 附近。
链接: 查看日志

Approval — 人工审批（置信度: 高）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 13 comments.

codecov-commenter · 2026-05-26T08:02:37Z

Codecov Report

❌ Patch coverage is 0.95238% with 208 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@91ca3d1). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...ckends/metax/attention/flash_attn_metax_backend.py	0.00%	196 Missing ⚠️
fastdeploy/model_executor/forward_meta.py	25.00%	3 Missing ⚠️
fastdeploy/platforms/maca.py	25.00%	2 Missing and 1 partial ⚠️
fastdeploy/worker/metax_model_runner.py	0.00%	3 Missing ⚠️
fastdeploy/worker/metax_worker.py	0.00%	2 Missing ⚠️
...y/model_executor/layers/backends/metax/__init__.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7914   +/-   ##
==========================================
  Coverage           ?   63.60%           
==========================================
  Files              ?      468           
  Lines              ?    65244           
  Branches           ?     9987           
==========================================
  Hits               ?    41496           
  Misses             ?    20945           
  Partials           ?     2803

Flag	Coverage Δ
GPU	`72.86% <25.00%> (?)`
XPU	`7.04% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

+  if (status != MCFLASHATTN_STATUS_SUCCESS) {
+    phi::errors::External("Error in McFlashAttn, error code is %d", status);
+  }


Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 15 comments.

+  if (status != MCFLASHATTN_STATUS_SUCCESS) {
+    phi::errors::External("Error in McFlashAttn, error code is %d", status);
+  }


+  if (status != MCFLASHATTN_STATUS_SUCCESS) {
+    phi::errors::External("Error in McFlashAttn, error code is %d", status);
+  }


+        if num_requests < self.max_num_seqs:
+            self.block_tables_buffer[num_requests:] = self.block_tables_buffer[num_requests - 1]


            return "fastdeploy.model_executor.layers.attention.PaddleNativeAttnBackend"
        elif selected_backend == _Backend.APPEND_ATTN:
-            logger.info("Using FLASH ATTN backend to instead of attend attention.")
+            logger.info("Using FLASH ATTN backend to instead of APPEND ATTN.")


            extra_compile_args=metax_extra_compile_args,
            library_dirs=[os.path.join(maca_path, "lib")],
-            extra_link_args=["-lruntime_cu", "-lmctlassEx"],
+            extra_link_args=["-lruntime_cu", "-lmctlassEx", "-lmcFlashAttn"],


PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-27 10:32:53

📋 Review 摘要

PR 概述：为沐曦（Metax）GPU 新增 Flash Attention 支持，替换旧版 RoPE 实现，并修复 gelu_tanh block size 计算错误。
变更范围：custom_ops/metax_ops/、fastdeploy/worker/、fastdeploy/model_executor/layers/backends/metax/
影响面 Tag：[Metax] [OP]

问题

级别	文件	概述
🔴 Bug	`custom_ops/metax_ops/flash_attention.cu:182`	`phi::errors::External(...)` 仅构造错误对象，不抛出异常，mha 调用失败后静默继续执行
🔴 Bug	`custom_ops/metax_ops/flash_attention.cu`	`flash_attn_kvcache_forward` 中同样存在相同问题（`mha_fwd_kvcache` 失败后不抛出）

📝 PR 规范检查

PR 标题缺少功能性 Tag（[Metax] 单独使用语义不完整，建议补充 [Feature]），且 PR 描述所有章节均为空模板，未填写任何实质内容。

标题建议（可直接复制）：

[Metax][Feature] Support Flash Attention for Metax GPU

PR 描述建议（点击展开，可直接复制）

## Motivation
为沐曦（Metax）GPU 添加 Flash Attention 支持，使用 McFlashAttn 库替换原有自定义 RoPE+Attention 实现，提升推理性能。同时修复 gelu_tanh kernel 中 block size 上限计算错误（`std::max` → `std::min`），并移除已过时的 MACA 版本兼容代码（最低版本要求提升至 > 3.3.2.0）。

## Modifications
- `custom_ops/metax_ops/flash_attention.cu`：新增 Flash Attention 算子，支持 `flash_attn_varlen_forward`（变长序列 prefill）和 `flash_attn_kvcache_forward`（decode 阶段 KV Cache）两种模式
- `custom_ops/metax_ops/rotary_position_embedding.cu`：新增 RoPE 位置编码 kernel（GQA 支持，含 neox/partial 变体），替换旧版 `apply_rope_qkv.cu`
- `custom_ops/metax_ops/apply_rope_qkv.cu`：删除旧版 RoPE 实现
- `custom_ops/metax_ops/maca_version.h`：删除版本兼容头文件
- `custom_ops/metax_ops/fused_moe_gemm_kernels.h`：移除 MACA 版本条件编译分支
- `custom_ops/gpu_ops/gelu_tanh.cu`：修复 block size 计算错误（`std::max` → `std::min`）
- `fastdeploy/model_executor/layers/backends/metax/attention/flash_attn_metax_backend.py`：新增 Python 侧 Flash Attention backend 调用封装
- `fastdeploy/worker/metax_model_runner.py`、`metax_worker.py`：适配新 attention backend
- `custom_ops/setup_ops.py`：更新编译源文件列表

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

新增 Metax Flash Attention 功能整体结构清晰，但两处 phi::errors::External(...) 调用存在严重错误处理缺陷，必须修复后方可合入。PR 描述需补充完整。

PaddlePaddle-bot · 2026-05-27T02:36:19Z

+
+  if (status != MCFLASHATTN_STATUS_SUCCESS) {
+    phi::errors::External("Error in McFlashAttn, error code is %d", status);
+  }


🔴 Bug phi::errors::External(...) 仅构造错误对象但不抛出，mha_varlen_fwd 失败后程序静默继续执行，后续 release_tensor 正常调用但输出结果为无效数据。

flash_attn_kvcache_forward 中 mha_fwd_kvcache 调用后存在相同问题。

建议修复：

if (status != MCFLASHATTN_STATUS_SUCCESS) { PADDLE_THROW(phi::errors::External( "McFlashAttn failed with error code %d", status)); }

PaddlePaddle-bot · 2026-05-28T09:36:41Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-30 10:14:39

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: c5bf2d1
Merge base: 91ca3d1 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

有 2 个 required 任务失败，需优先处理后方可合并。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	36	6	0	0	0

2 任务状态汇总

2.1 Required任务 : 8/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Approval`	18s	需要 Approval	请通过人工审批	Job	-
❌	`run_tests_with_coverage`	1h25m	PR问题：FLASH_ATTN 分支覆盖率 71% < 80%	补充 FLASH_ATTN 分支单测	Job	-
✅	其余 8 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 28/32 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	2m18s	Job	-
❌	`Check PR Template`	29s	Job	-
❌	`CI_HPU`	1h4m	Job	-
❌	`Trigger Jenkins for PR`	12s	Job	-
✅	其余 28 个可选任务通过	-	-	-

3 失败详情（仅 required）

Approval — 需要人工审批（置信度: 高）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

run_tests_with_coverage — 覆盖率阈值（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 覆盖率阈值
置信度: 高
根因摘要: 新增 FLASH_ATTN 分支覆盖率不足
分析器: ci_analyze_unittest_fastdeploy

失败用例: 无。日志显示 TEST_EXIT_CODE: 0 且输出 All tests passed，失败发生在覆盖率校验步骤。

根因详情:
diff_coverage.json 显示本次 diff 总覆盖率为 71%，低于 80% 阈值；其中 fastdeploy/platforms/maca.py 覆盖率仅 50%，违规行为 L64-L65。结合源码可见 PR 新增/修改了 MACAPlatform.get_attention_backend_cls() 的 _Backend.FLASH_ATTN 分支，返回 MetaxFlashAttentionBackend，但现有 tests/platforms/test_platforms.py::TestMACAPlatform 仅覆盖 NATIVE/APPEND/INVALID，未覆盖该新增分支。

关键日志:

Coverage generation failed (exit code 9)
GPU Patch Coverage Details:
{"src_stats": {"fastdeploy/platforms/maca.py": {"percent_covered": 50.0,
"violation_lines": [64, 65], "covered_lines": [61, 63]},
"fastdeploy/model_executor/forward_meta.py": {"percent_covered": 100.0}},
"total_num_lines": 7, "total_num_violations": 2,
"total_percent_covered": 71}

修复建议:

在 tests/platforms/test_platforms.py 的 TestMACAPlatform 中补充 FLASH_ATTN 分支单测，例如在 L94 附近新增断言：self.assertIn("MetaxFlashAttentionBackend", MACAPlatform.get_attention_backend_cls(_Backend.FLASH_ATTN))。
修复后重新触发 run_tests_with_coverage，确认 diff_coverage.json 总覆盖率 ≥ 80%。

关联变更: fastdeploy/platforms/maca.py L63-L65；建议补测 tests/platforms/test_platforms.py L94 附近。
链接: 查看日志

xiaozude and others added 8 commits May 25, 2026 09:41

replace triton unified attention with flash_attn_varlen_forward

8a60fe8

(cherry picked from commit 49a405b5ab0867d297c1a74643fdf83e3bb1bed5)

support cuda graph

b03056e

support cuda graph (cherry picked from commit f78cbfbe0b69eac20bad4f5b1ed7aec25f12ce73)

update flash attn metax backend and recovery triton attn metax backend

6d9dbb4

(cherry picked from commit a0ca9aef03a1e7fa50a205c1737dcdf084f18685)

add environment variable to control PD split/mixed mode

fe33a03

(cherry picked from commit e8bfe916642e78ff317a398317b846a7bd448772) # Conflicts: # fastdeploy/envs.py # fastdeploy/worker/input_batch.py

support cuda graph again

a351153

(cherry picked from commit 0890acc6f4f94740c14e9788903ada9bbdaaf469)

refactor flash_attn_metax_backend

9071c88

(cherry picked from commit 712fd9c106109e54a7cba4e93ee90e8181d87a3d)

[Metax]fix bug

90ba92b

Copilot AI review requested due to automatic review settings May 25, 2026 08:09

Tryorish had a problem deploying to Metax_ci May 25, 2026 08:09 — with GitHub Actions Failure

paddle-bot Bot added the contributor External developers label May 25, 2026

Copilot started reviewing on behalf of Tryorish May 25, 2026 08:09 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

remove redundant code

2707a5d

Tryorish had a problem deploying to Metax_ci May 26, 2026 02:44 — with GitHub Actions Error

revert

0edaab1

Copilot AI review requested due to automatic review settings May 26, 2026 02:52

Tryorish had a problem deploying to Metax_ci May 26, 2026 02:52 — with GitHub Actions Failure

Copilot AI reviewed May 26, 2026

View reviewed changes

Tryorish had a problem deploying to Metax_ci May 26, 2026 03:04 — with GitHub Actions Failure

Tryorish had a problem deploying to Metax_ci May 26, 2026 03:23 — with GitHub Actions Error

Tryorish had a problem deploying to Metax_ci May 26, 2026 03:24 — with GitHub Actions Failure

Tryorish had a problem deploying to Metax_ci May 26, 2026 03:49 — with GitHub Actions Failure

revert

f5ff324

Tryorish had a problem deploying to Metax_ci May 26, 2026 06:06 — with GitHub Actions Error

revert

469c909

Copilot AI review requested due to automatic review settings May 26, 2026 06:22

Tryorish had a problem deploying to Metax_ci May 26, 2026 06:22 — with GitHub Actions Failure

Tryorish changed the title ~~[Metax] Migrate from rel2.5~~ [Metax] support FLASH_ATTN May 26, 2026

Copilot AI reviewed May 26, 2026

View reviewed changes

add rotary_embs_bf16

5741329

Tryorish had a problem deploying to Metax_ci May 26, 2026 07:35 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

Copilot AI review requested due to automatic review settings May 26, 2026 09:11

plusNew001 had a problem deploying to Metax_ci May 26, 2026 09:11 — with GitHub Actions Failure

Copilot started reviewing on behalf of plusNew001 May 26, 2026 09:11 View session

plusNew001 had a problem deploying to Metax_ci May 26, 2026 09:17 — with GitHub Actions Failure

Copilot AI reviewed May 26, 2026

View reviewed changes

plusNew001 had a problem deploying to Metax_ci May 26, 2026 09:26 — with GitHub Actions Failure

plusNew001 had a problem deploying to Metax_ci May 26, 2026 09:27 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

update doc

c5bf2d1

Copilot AI review requested due to automatic review settings May 27, 2026 02:26

Tryorish had a problem deploying to Metax_ci May 27, 2026 02:26 — with GitHub Actions Failure

Copilot AI reviewed May 27, 2026

View reviewed changes

Tryorish force-pushed the migrate-from-rel2.5 branch from bde3c09 to c5bf2d1 Compare May 27, 2026 02:30

Tryorish had a problem deploying to Metax_ci May 27, 2026 02:30 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes May 27, 2026

View reviewed changes

		if num_requests < self.max_num_seqs:
		self.block_tables_buffer[num_requests:] = self.block_tables_buffer[num_requests - 1]

Conversation

Tryorish commented May 25, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PaddlePaddle-bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 8/10 通过

2.2 可选任务 — 28/32 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

PaddlePaddle-bot commented May 25, 2026 •

edited

Loading

codecov-commenter commented May 26, 2026 •

edited

Loading

PaddlePaddle-bot commented May 28, 2026 •

edited

Loading