Skip to content

Reduce record payload memory for large prompts#8

Merged
cquil11 merged 2 commits into
SemiAnalysisAI:cjq/agentx-v0.3from
weireweire:record-strip-payload-bytes-v0.3
Jun 15, 2026
Merged

Reduce record payload memory for large prompts#8
cquil11 merged 2 commits into
SemiAnalysisAI:cjq/agentx-v0.3from
weireweire:record-strip-payload-bytes-v0.3

Conversation

@weireweire

@weireweire weireweire commented Jun 11, 2026

Copy link
Copy Markdown

Summary

  • Add an opt-in AIPERF_RECORD_STRIP_PAYLOAD_BYTES setting for large-prompt runs.
  • Drop canonical request payload bytes from RecordContext after dispatch when the setting is enabled.
  • Add unit coverage for the payload-stripping path.

Validation

  • PYTHONPATH=tests uv run --extra dev pytest tests/unit/workers/test_inference_client.py -q
  • uv run --extra dev ruff check src/aiperf/common/environment.py src/aiperf/workers/inference_client.py tests/unit/workers/test_inference_client.py

Note

Medium Risk
Opt-in strip mode changes which record metrics and exports are available; the DAG fast-path behavior change affects memory and routing for PAYLOAD_BYTES subagent runs but aligns with verbatim replay semantics.

Overview
Adds AIPERF_RECORD_STRIP_PAYLOAD_BYTES (default off). When enabled, InferenceClient sets RecordContext.payload_bytes to None after dispatch while leaving RequestInfo.payload_bytes intact, shrinking ZMQ/record-pipeline memory for huge prompts at the cost of client-side input tokenization, body-derived media counts, and raw request export.

In PAYLOAD_BYTES mmap mode, Worker._process_credit no longer forces DAG children (agent_depth > 0) through the session path; every credit that can load pre-encoded bytes uses the fast path, avoiding duplicated full-history payloads in worker memory.

Unit tests cover stripping and updated fast-path routing for subagents.

Reviewed by Cursor Bugbot for commit 3bcd555. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@3bcd555dfedc18899c8c1465c3ad8d3c82307192

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@3bcd555dfedc18899c8c1465c3ad8d3c82307192

Last updated for commit: 3bcd555Browse code

@cquil11 cquil11 marked this pull request as ready for review June 15, 2026 18:35
@cquil11 cquil11 merged commit 0c101fc into SemiAnalysisAI:cjq/agentx-v0.3 Jun 15, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants