Skip to content

Truncate large coro repr in retry log output#9197

Merged
crusaderky merged 6 commits into
dask:mainfrom
ernestprovo23:fix-retry-excessive-logs
May 19, 2026
Merged

Truncate large coro repr in retry log output#9197
crusaderky merged 6 commits into
dask:mainfrom
ernestprovo23:fix-retry-excessive-logs

Conversation

@ernestprovo23
Copy link
Copy Markdown
Contributor

Summary

  • When retry() falls back to str(coro) for log messages, truncate the representation to 200 characters to prevent excessively large logs
  • This was observed in P2P shuffles where functools.partial repr includes serialized binary data (see Fix excessive logging on P2P retry #8511)
  • Added test verifying truncation behavior

Closes #8529

Test plan

  • New test test_retry_truncates_large_coro_repr verifies 500-char repr is truncated
  • Existing retry tests unaffected (no truncation when repr is small)
  • Explicit operation parameter bypasses truncation (preserves full user-provided descriptions)

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 22, 2026

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    31 files  ± 0      31 suites  ±0   10h 53m 10s ⏱️ - 7m 48s
 4 074 tests + 2   3 968 ✅ + 2    104 💤 ±0  2 ❌ ±0 
59 131 runs  +30  56 696 ✅ +30  2 433 💤 ±0  2 ❌ ±0 

For more details on these failures, see this check.

Results for commit 630991a. ± Comparison against base commit deb1004.

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a recommended solution in the issue to use reprlib to help with this truncation instead of simply chopping the end of the string off #8529 (comment).

@ernestprovo23
Copy link
Copy Markdown
Contributor Author

thanks for catching that, updated to use reprlib.Repr() with maxother=200 instead of manual slicing. keeps the truncation logic cleaner and more standard.

@ernestprovo23
Copy link
Copy Markdown
Contributor Author

hey @jacobtomlinson — just checking in on this. the reprlib approach is in place per your suggestion (lines 385-390). the Windows CI failures look like pre-existing flakiness (investigating now), all Linux and macOS tests pass. happy to adjust anything else if needed.

Copy link
Copy Markdown
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple more comments

Comment thread distributed/utils_comm.py Outdated
Comment on lines +389 to +391
aRepr = reprlib.Repr()
aRepr.maxother = 200
operation = aRepr.repr(coro)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clean this up into a simple one-liner.

Suggested change
aRepr = reprlib.Repr()
aRepr.maxother = 200
operation = aRepr.repr(coro)
operation = reprlib.Repr(maxother=200).repr(coro)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done — condensed to a one-liner.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out Repr() doesnt accept kwargs per mypy, maxother is an instance attribute. switched to a two-liner that sets it after init.

Comment thread distributed/tests/test_utils_comm.py Outdated
jitter_fraction=0,
)

import logging
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move imports to the top of the file. Claude loves importing stuff randomly in the middle of code for some reason.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to the top, good catch.

Comment thread distributed/tests/test_utils_comm.py Outdated
Comment on lines +259 to +268
handler = logging.Handler()
handler.emit = lambda record: log_messages.append(record.getMessage())

logger = logging.getLogger("distributed.utils_comm")
logger.addHandler(handler)
try:
with pytest.raises(MyEx):
asyncio_run(f(), loop_factory=get_loop_factory())
finally:
logger.removeHandler(handler)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just use pytest's caplog instead of doing all this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, switched to caplog — much cleaner.

@ernestprovo23
Copy link
Copy Markdown
Contributor Author

Hi @fjetter — just checking in on this. The reprlib approach from the Mar 26 commit addresses the earlier feedback (using reprlib.Repr() with maxother set post-init to satisfy mypy, plus caplog in tests). The "not ci1" failures appear pre-existing and unrelated. Let me know if there's anything else you'd like adjusted.

@crusaderky
Copy link
Copy Markdown
Collaborator

@ernestprovo23 CI is failing; could you look into it before the next round of review?

@ernestprovo23
Copy link
Copy Markdown
Contributor Author

np. i will check out today

When retry() falls back to str(coro) for the log message, truncate
the representation to 200 characters to prevent excessively large
logs. This was observed in P2P shuffles where functools.partial repr
includes serialized binary data.

Closes dask#8529
Replace manual str[:200] + "..." truncation with stdlib reprlib.Repr()
per reviewer feedback. reprlib handles truncation consistently and is
a well-known stdlib pattern for this use case.
- Condense reprlib setup to single line per suggestion
- Replace manual logging handler with pytest caplog fixture
- Move logging import to top of test file
distributed/config.py sets propagate=False on the top-level 'distributed'
logger and attaches its own handler. Records from distributed.utils_comm
propagate up to 'distributed' but stop there, so pytest's caplog
(attached to root) never sees them. Use monkeypatch to flip propagate=True
on the 'distributed' logger for the test duration.
@ernestprovo23 ernestprovo23 force-pushed the fix-retry-excessive-logs branch from 40472e5 to ac79e3e Compare May 19, 2026 01:55
@ernestprovo23
Copy link
Copy Markdown
Contributor Author

@crusaderky CI fixed. The caplog capture was silently dropping records because distributed/config.py sets propagate=False on the top-level distributed logger (and attaches its own stderr handler there), so records from distributed.utils_comm stop at the distributed logger and never reach caplog's root handler. Used monkeypatch.setattr(logging.getLogger('distributed'), 'propagate', True) to flip it for the test duration. Verified locally; also rebased on main. Ready for another look.

@crusaderky
Copy link
Copy Markdown
Collaborator

@crusaderky CI fixed. The caplog capture was silently dropping records because distributed/config.py sets propagate=False on the top-level distributed logger (and attaches its own stderr handler there), so records from distributed.utils_comm stop at the distributed logger and never reach caplog's root handler. Used monkeypatch.setattr(logging.getLogger('distributed'), 'propagate', True) to flip it for the test duration. Verified locally; also rebased on main. Ready for another look.

We use captured_logger to avoid this issue. I fixed it for you.

@crusaderky crusaderky merged commit c813e6d into dask:main May 19, 2026
33 of 37 checks passed
@ernestprovo23 ernestprovo23 deleted the fix-retry-excessive-logs branch May 19, 2026 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

retry can produce excessively large logs

3 participants