Skip to content

fix(telemetry): propagate span context in async generators and fix member agent input tracing#77

Merged
weimch merged 1 commit into
trpc-group:mainfrom
Eliozhang:fix/telemetry-span-context-propagation
Jun 11, 2026
Merged

fix(telemetry): propagate span context in async generators and fix member agent input tracing#77
weimch merged 1 commit into
trpc-group:mainfrom
Eliozhang:fix/telemetry-span-context-propagation

Conversation

@Eliozhang

@Eliozhang Eliozhang commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

问题

1. Span context 在 async generator 中丢失

start_as_current_span 返回的 context manager 在 async generator 被 cancel 时,__aexit__ 不保证执行(Python async generator 的已知行为)。这导致:

  • context.detach() 未被调用,span context 丢失
  • 子 span(agent_runcall_llmexecute_tool 等)无法正确解析父 span
  • 链路追踪断裂,无法看到完整的调用链

复现路径: TeamAgent 调用 member agent 时,member agent 的执行是 async generator,cancel 时 span context 丢失。

2. Member agent 的 trace input 不准确

trace_agent 函数始终使用 user_content 记录 agent input,但当 member agent 被 TeamAgent 委派时,user_content 仍然是原始用户发给 leader agent 的内容,而不是 leader agent 转发给 member agent 的 override_messages。这导致 trace 中 member agent 的 input 和实际执行不匹配。

修复

Fix 1: runners.py + agents/_base_agent.py — span context 传播

start_span + context_api.attach/detach 替代 start_as_current_span

  • start_span 创建 span 但不自动设为 current
  • context_api.attach(set_span_in_context(span, current_ctx)) 手动将 span 设为 current,返回 token
  • try/finallyfinally 中调用 context_api.detach(token)
  • 关键: try/finallyCancelledError 下也会执行(PEP 492),而 context manager 的 __aexit__ 不保证

Fix 2: telemetry/_trace.py — override_messages 优先

trace_agent 中优先检查 invocation_context.override_messages

  • 如果存在,从 override_messages 提取 text parts 作为 input
  • 否则回退到原有的 user_content 逻辑

改动文件

文件 改动
trpc_agent_sdk/runners.py start_span + attach/detach 传播 span context
trpc_agent_sdk/agents/_base_agent.py 同上
trpc_agent_sdk/telemetry/_trace.py override_messages 优先于 user_content

测试

  • 本地运行 TeamAgent + Member Agent 场景,确认 span 链路完整
  • Cancel member agent 执行,确认 span 正确 close 且无 detach token error
  • Member agent trace 中 input 为 leader 转发的内容而非原始用户输入
  • 非 TeamAgent 场景(单 agent 直接运行)行为不变

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@Eliozhang

Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@Eliozhang Eliozhang force-pushed the fix/telemetry-span-context-propagation branch from 67b17c9 to 56ee6e8 Compare June 11, 2026 02:22
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 63.15789% with 7 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@240e3c9). Learn more about missing BASE report.

Files with missing lines Patch % Lines
trpc_agent_sdk/telemetry/_trace.py 36.36364% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main         #77   +/-   ##
==========================================
  Coverage        ?   87.36905%           
==========================================
  Files           ?         416           
  Lines           ?       40472           
  Branches        ?           0           
==========================================
  Hits            ?       35360           
  Misses          ?        5112           
  Partials        ?           0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Eliozhang Eliozhang force-pushed the fix/telemetry-span-context-propagation branch from 56ee6e8 to d9731fe Compare June 11, 2026 02:45
@Eliozhang

Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@Eliozhang Eliozhang force-pushed the fix/telemetry-span-context-propagation branch from d9731fe to ec1abfa Compare June 11, 2026 03:11
…mber agent input tracing

- Use start_span + attach/detach instead of start_as_current_span in
  runners.py and _base_agent.py to properly propagate span context
  in async generators (CancelledError safe per PEP 492)
- Fix trace_agent to prefer override_messages over user_content when
  tracing member agents delegated by TeamAgent
@Eliozhang Eliozhang force-pushed the fix/telemetry-span-context-propagation branch from ec1abfa to 7971826 Compare June 11, 2026 06:14
Rook1ex added a commit to trpc-group/cla-database that referenced this pull request Jun 11, 2026
@weimch weimch closed this Jun 11, 2026
@weimch weimch reopened this Jun 11, 2026
@weimch weimch merged commit ea2e4bb into trpc-group:main Jun 11, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants