UN-3420 [FIX] Remove duplicate INFILE write in _write_pipeline_outputs#1920
Conversation
The INFILE overwrite (fs.json_dump to input_file_path) appeared twice in _write_pipeline_outputs(), causing an unnecessary extra MinIO PUT per file execution. Remove the duplicate that was a copy-paste artifact from PR #1849. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
| Filename | Overview |
|---|---|
| workers/file_processing/structure_tool_task.py | Removes duplicate fs.json_dump(path=input_file_path, ...) call in _write_pipeline_outputs; write count corrected from 4 to 3, fixing pre-existing test assertion failures. |
Sequence Diagram
sequenceDiagram
participant C as Caller
participant W as _write_pipeline_outputs
participant FS as fs (MinIO/Storage)
C->>W: call(structured_output, ...)
Note over W,FS: Write #1 — primary output file
W->>FS: json_dump(output_path, structured_output)
Note over W,FS: Write #2 — INFILE overwrite (intentional)
W->>FS: json_dump(input_file_path, structured_output)
Note over W,FS: Write #3 — COPY_TO_FOLDER
W->>FS: mkdir(copy_to_folder)
W->>FS: json_dump(copy_output_path, structured_output)
Note over W,FS: REMOVED: Write #4 was duplicate of #2
W-->>C: return None
Reviews (1): Last reviewed commit: "UN-3420 [FIX] Remove duplicate INFILE wr..." | Re-trigger Greptile
|
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
Summary by CodeRabbit
WalkthroughRemoved redundant file write operations in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |



What
Remove a duplicate
fs.json_dump(path=input_file_path, data=structured_output)call in_write_pipeline_outputs()inworkers/file_processing/structure_tool_task.py. The function wrote the INFILE overwrite twice — identical path, identical data, no intervening mutation.Why
PR #1849 (UN-3266) introduced a copy-paste artifact that causes an unnecessary extra MinIO PUT request per file execution. The
_write_pipeline_outputs()function performs 4 writes when only 3 are needed:output_path) — intentionalinput_file_path) — intentionalcopy_output_path) — intentionalinput_file_path) — duplicate of Profile manager migration fix #2This wastes I/O and causes existing test assertions (
json_dump.call_count == 3) to fail.How
Deleted the 3-line duplicate block (
logger.info+fs.json_dump) that repeated the INFILE overwrite. Pure deletion, no logic changes.Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
No. The deleted write is an exact duplicate — same destination path, same data, with no mutation of
structured_outputbetween the original and the duplicate. The INFILE is still overwritten exactly once (write #2), satisfying the destination connector's requirement to read JSON instead of the original PDF. The COPY_TO_FOLDER write is untouched.Database Migrations
None.
Env Config
No changes.
Relevant Docs
N/A
Related Issues or PRs
Dependencies Versions
No dependency changes.
Notes on Testing
workers/tests/test_sanity_phase3.pyalready assertsjson_dump.call_count == 3in two places — these assertions were written for the correct behavior and now passjson_dump(path=input_file_pathcall remains in_write_pipeline_outputsScreenshots
N/A — backend-only fix, no UI changes.
Checklist
I have read and understood the Contribution Guidelines.