Skip to content

feat(channel): complete bidirectional file support#386

Open
xiami762 wants to merge 4 commits into
devfrom
feat/channel-file-roundtrip
Open

feat(channel): complete bidirectional file support#386
xiami762 wants to merge 4 commits into
devfrom
feat/channel-file-roundtrip

Conversation

@xiami762
Copy link
Copy Markdown
Contributor

@xiami762 xiami762 commented Jun 6, 2026

Summary

  • Complete bidirectional file/image support across the three IM channels: wecom (ported from PR feat(wecom): support inbound and outbound file attachments #190), dingtalk, and telegram.
  • Each channel can now download inbound attachments to a local FilePart and send outbound media (image / file / video / audio / animation) with correct kind routing. The inbound dispatcher is refactored into a per-channel hook registry so future channels can plug in without touching dispatcher.py.

Key Changes

Per-channel inbound (download → FilePart)

  • wecom (flocks/channel/builtin/wecom/inbound_media.py)
    • AES-256-CBC decrypt via wecom_aibot_sdk, 30 MB cap.
    • Nested mixed-message aeskey resolution.
    • Filename extraction from Content-Disposition.
  • dingtalk (flocks/channel/builtin/dingtalk/inbound_media.py)
    • download_code exchange via OAPI /v1.0/robot/messageFiles/download, 20 MB cap.
    • Separate error classification for exchange vs. download steps.
  • telegram (flocks/channel/builtin/telegram/inbound_media.py)
    • Resolve telegram://<kind>/<file_id> via getFile + download from
      https://api.telegram.org/file/bot<token>/<file_path>.

Per-channel outbound (send_media)

  • wecom — SDK upload_mediasend_media_message / reply_media; companion text sent as a follow-up markdown message.
  • dingtalk — OAPI multipart upload; msgKey=file (downloadCode + fileName) for any type, msgKey=image (photoURL) for remote image
    URLs.
  • telegram — Route to sendPhoto / sendDocument / sendVideo / sendAudio / sendVoice / sendAnimation by inferred kind; agent
    can force document via telegram:document:<url> prefix.

Dispatcher refactor (flocks/channel/inbound/dispatcher.py)

  • register_inbound_media_downloader(channel_id, downloader) + module-level _DOWNLOADERS table.
  • Dynamic per-channel lookup so test monkeypatches on the channel's inbound_media module still apply.
  • SSE message.part.updated events emitted for both the FilePart and the rewritten text part; placeholder text ([图片消息] /
    [文件消息] / [文件消息: x]) replaced with Attached files: <path>.

Brought along on the branch (precursor fixes required for the round-trip)

  • #369 — assign session ownership and restrict ownerless access for channel-originated sessions.
  • #356/clear slash command in channel sessions.
  • #350 — clear history + surface Feishu websocket disconnects; per-account restart with backoff.
  • #344 — archive previous session on IM /new.
  • 28af9f9aSessionCompaction.process return type now Literal["continue", "stop", "skipped"]; channel /compact reports the three
    states distinctly.

Impact Scope

  • User-visible behavior: IM users on wecom / dingtalk / telegram can now send and receive images, PDFs, audio, video, and generic
    files. Inbound media is surfaced to the agent as a FilePart; outbound media is rendered natively in each channel. Placeholder text is
    replaced with Attached files: <path>.
  • Compatibility:
    • feishu already registered a downloader; behavior is unchanged.
    • Channel session ownership semantics changed by #369: ownerless sessions are no longer shared across all users; only admins can
      manage legacy ownerless ones. No data migration — existing sessions are covered by the new code path.
    • SessionCompaction.process return type widened to include "skipped" — callers that did result == "continue" only are unaffected;
      callers that compared against the previous union are updated.
  • Configuration & environment:
    • Each channel's existing bot/app credentials (token, app key, AES key, etc.) are reused. No new config keys.
    • Telegram outbound now uses public api.telegram.org endpoints — requires network egress to Telegram from the server (same as before;
      the bot token path was already in use).
  • Dependencies: No new third-party packages. wecom_aibot_sdk was already a dep (used by feat(wecom): support inbound and outbound file attachments #190).
  • Performance & resources:
    • Inbound: streamed decrypt/download with 20–30 MB caps enforced per channel. Larger files are rejected, not buffered.
    • Outbound: multipart upload to dingtalk OAPI; binary POST to Telegram. No large in-memory buffers beyond one attachment.
  • Security & permissions:
    • Filenames are taken from Content-Disposition (wecom) or file_name (dingtalk) — passed through unchanged into the local FilePart
      path. Filename sanitization for downstream consumers (TUI / WebUI) is unchanged from existing policy.
    • Telegram telegram:document:<url> prefix is a forced-route hint parsed at the channel layer; URLs are passed to the standard
      sendDocument call without additional validation.
    • Session ownership change in #369 tightens who can manage channel-originated sessions; previously any authenticated local user could.

Business Logic to Review

The following logic carries the most risk and warrants close review:

  1. Dispatcher hook registry (flocks/channel/inbound/dispatcher.py)

    • register_inbound_media_downloader is called at import time by each channel module. Verify the registration order is idempotent and
      that monkeypatched downloaders (used in tests) are picked up by the dynamic lookup.
    • _is_placeholder_text must match exactly the three placeholder forms emitted by upstream channel parsers. A drift here silently
      leaks placeholders to the user.
    • The placeholder text is replaced in-place on the rewritten text part — confirm the rewritten text part is the same MessagePart
      that gets streamed, not a duplicate.
  2. wecom inbound (wecom/inbound_media.py)

    • Mixed-message aeskey resolution: verify the nested path matches the upstream wecom mixed-message spec; the existing test
      (test_mixed_file_aeskey) covers the happy path only.
    • AES-256-CBC: PKCS#7 padding and IV are derived from the message; confirm they match the SDK defaults. The decrypt path is the
      confidentiality boundary — a single offset error produces garbage that still passes the magic-byte check downstream.
  3. dingtalk inbound (dingtalk/inbound_media.py)

    • Two-step download_code exchange + actual download. Errors are classified separately so retry can target the right step. Verify the
      classifier does not double-classify a 5xx from the download endpoint as an exchange error.
    • 20 MB cap is enforced after the download — confirm we stream the response (or use Content-Length pre-check) so we don't OOM on
      a multi-GB response with a missing/lying Content-Length.
  4. telegram inbound (telegram/inbound_media.py)

    • getFile returns a file_path that is then fetched from a different host (api.telegram.org/file/bot<token>/...). Confirm the
      bot token is interpolated only into the URL and not logged.
    • Kind inference from MIME / extension — confirm the default falls back to sendDocument rather than guessing sendPhoto.
  5. Outbound kind routing (telegram)

    • telegram:document:<url> prefix is parsed before the regular kind inference. Verify the prefix is not a valid URL that an agent
      could be tricked into typing, and that it is consumed (not re-emitted) when calling sendDocument.
  6. Session ownership change (#369)

    • Ownerless sessions are now admin-only for write/delete, but read stays shared. Confirm this matches the intended product behavior
      — the prior behavior was "any authenticated user can do anything," which is the change.
    • Channel-created sessions resolve a local admin owner at create time. If no admin exists, the session remains ownerless. Verify the
      resolver does not pick a random user.
  7. SessionCompaction.process "skipped" state (PR feat/context compaction v2 #321 follow-up)

    • Callers that previously assumed process returned a 2-state union must now handle "skipped". The channel /compact handler is
      updated; audit other call sites (run_compaction is updated, but please grep for compaction.process( to confirm).

Why This Approach

  • Per-channel hook registry over hard-coded dispatcher branches. Future channels register their own downloader; dispatcher.py stops
    growing.
  • Reuse PR feat(wecom): support inbound and outbound file attachments #190 (wecom) verbatim for the SDK path so we don't fork the AES decrypt logic. Extend the same shape to dingtalk and
    telegram.
  • Attached files: <path> as the placeholder text keeps downstream prompts/UI stable — a FilePart is still emitted as a separate
    part, the text is just user-facing fluff.
  • No new deps. wecom_aibot_sdk is already in the lockfile; dingtalk and telegram use aiohttp already pulled in by the channel
    layer.

Test Plan

  • uv run pytest tests/channel/ -q — all 354 channel tests pass.
  • New tests/channel/test_e2e_file_roundtrip.py — 7 in-process fake-server tests covering real PNG byte round-trips for all five
    channels (feishu + wecom + dingtalk + telegram).
  • New unit tests:
    • wecom: +14 (send_media, inbound_media, content-disposition, mixed file)
    • dingtalk: +9 (download_code exchange, oversized guard, send_media routing, text-after-file, image URL inline)
    • telegram: +15 (file_id resolution, kind inference, endpoint routing, kind override prefix, error path)
    • dispatcher: +9 (per-channel routing, placeholder detection, end-to-end pipeline)
  • uv run ruff check . — clean.
  • Manual verification on a staging wecom / dingtalk / telegram bot with a real attachment in each direction. (No live bot tokens in
    CI.)

xiami762 and others added 4 commits June 6, 2026 23:36
…ingtalk, telegram

Apply PR #190 (wecom inbound + outbound file attachments) and extend the
same pattern to dingtalk and telegram. Refactor the inbound dispatcher
into a per-channel hook registry so new channels no longer need to
modify dispatcher.py.

Inbound (download to local FilePart):
- wecom: AES-256-CBC decrypt via wecom_aibot_sdk, 30MB cap, nested
  mixed-message aeskey, Content-Disposition filename extraction
- dingtalk: exchange download_code via OAPI /v1.0/robot/messageFiles/download,
  20MB cap, separate exchange/download error classification
- telegram: resolve telegram://<kind>/<file_id> via getFile +
  https://api.telegram.org/file/bot<token>/<file_path> download

Outbound (send_media per channel):
- wecom: SDK upload_media → send_media_message / reply_media; companion
  text sent as a follow-up markdown message
- dingtalk: OAPI multipart upload → msgKey=file (downloadCode+fileName)
  for any type; msgKey=image (photoURL) for remote image URLs
- telegram: route to sendPhoto / sendDocument / sendVideo / sendAudio /
  sendVoice / sendAnimation based on inferred kind; agent can force
  document via telegram:document:<url> prefix

Dispatcher:
- register_inbound_media_downloader() + _DOWNLOADERS table
- dynamic per-channel lookup so test monkeypatches on the channel's
  inbound_media module still apply
- SSE message.part.updated events for both FilePart and the rewritten
  text part; placeholder text replaced with 'Attached files: <path>'

Tests:
- wecom: +14 (send_media, inbound_media, content-disposition, mixed file)
- dingtalk: +9 (download_code exchange, oversized guard, send_media
  routing, text-after-file, image URL inline)
- telegram: +15 (file_id resolution, kind inference for image/pdf/gif/ogg,
  endpoint routing, kind override prefix, error path)
- dispatcher: +9 (per-channel routing, placeholder detection,
  end-to-end per-channel pipeline)
- test_e2e_file_roundtrip.py: 7 new tests covering real PNG byte
  round-trips for all five channels with in-process fake servers

All 354 channel tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Documents the bidirectional file/image contract for all built-in
channels, the dispatcher refactor, the per-channel downloader hook
pattern, and the channel-specific outbound quirks reviewers need
to know before approving changes to the media path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…escription)

The channel file/image review notes are better kept in the PR
description itself (where reviewers see them first) than in a
root-level doc that the gitignore policy excludes from the
docs/ folder.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ge points, impact scope, and review focus

Rewrite the "Pull Request Guidelines" section in CONTRIBUTING.md so the
required PR description structure is explicit:

- Key Changes (改动点) — concrete deltas grouped by area.
- Impact Scope (影响范围) — user-visible behavior, compatibility,
  configuration, dependencies, performance, security.
- Business Logic to Focus On During Review (需重点 Review 的业务逻辑) —
  the parts of the change that deserve extra reviewer attention.

The previous section listed five reviewer-facing questions but did not
constrain the order or depth of the description, which led to PRs that
mentioned the impact but omitted the logic that needed a careful read.

Also update the PR description template to match the new structure and
add a Why-This-Approach section plus an explicit Compatibility, Migration
& Rollback section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant