APITest 测试优化:🐛 修复 paddleonly 模式对 torch 的安装依赖#660
Open
cangtianhuang wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🧭 背景
PaddleAPITest 的
paddle_only模式用于单独验证 Paddle API 配置能否被引擎解析并在 Paddle 侧正常执行,理论上不需要安装 PyTorch。当前测试框架里 accuracy、性能和部分 GPU cache 能力会依赖 torch,但这些依赖不应影响只跑 Paddle 的场景,安装 torch 会显著增加环境准备成本。本次改动围绕主力入口
engineV2.py和engineV4.py,将 paddleonly 路径从 torch 相关类加载、cache 清理和 GPU cache 数据生成中拆出来,保证普通 paddleonly 以及--use_gpu_cache_mode=True下都不会因为缺少 torch 报错。🔎 问题定位
旧逻辑中,主进程或 worker 初始化会一次性导入所有 tester 类,间接触发 accuracy / performance 等 torch 相关模块加载;同时
tester/base.py和tester/api_config/config_analyzer.py顶层直接import torch,导致仅导入 Paddle-only tester 也要求环境存在 torch。进一步排查发现,
paddle_only + --use_gpu_cache_mode=True还会在 GPU cache 判断和数据生成阶段访问torch.cuda.is_available()、Torch 随机张量和 DLPack 转换。对 paddleonly 来说,这些判断和张量生成都可以由 Paddle 完成。🛠️ 实现方案
主入口侧采用按模式延迟加载:paddleonly 只加载
APIConfig和APITestPaddleOnly,只有 accuracy、CINN、性能、custom device 等确实需要 torch 的模式才加载 torch 相关 tester;cache 清理同样只在 torch 模式下调用torch.cuda.empty_cache()。shared tester 侧将 torch 改为 lazy 访问,避免模块 import 阶段强制加载 torch。对于 GPU cache,Paddle-only 输入和输出梯度改为 Paddle 原生生成与缓存;accuracy 路径仍保留原本的 Paddle/Torch 成对梯度逻辑,确保对比模式下 paddle grad 和 torch grad 来源一致。
🔧 主要变更
1. 主力入口延迟加载 torch 相关能力
engineV2.py和engineV4.py新增按模式判断、测试类加载和 cache 清理封装。paddleonly worker 和单 case 路径不再无条件导入 torch,也不再无条件加载 accuracy/performance tester。2. Paddle-only tester 去除初始化期 torch 依赖
APITestBase支持use_torch开关,APITestPaddleOnly显式关闭 torch 初始化。tester/base.py与tester/api_config/config_analyzer.py使用 lazy torch 代理,只有真正访问 torch 功能时才加载 torch。3. paddleonly GPU cache 改为 Paddle 原生路径
config_analyzer.py中 GPU cache 可用性判断改为 Paddle 当前设备判断,不再使用torch.cuda.is_available()。Paddle-only 的 GPU 输入 cache 使用 Paddle tensor 生成,避免为了生成 Paddle 输入而创建 torch tensor。📁 改动文件