Skip to content

Use streaming in all assertion comparisons consumers#14523

Open
Pierre-Sassoulas wants to merge 18 commits into
pytest-dev:mainfrom
Pierre-Sassoulas:stream-comparisons
Open

Use streaming in all assertion comparisons consumers#14523
Pierre-Sassoulas wants to merge 18 commits into
pytest-dev:mainfrom
Pierre-Sassoulas:stream-comparisons

Conversation

@Pierre-Sassoulas
Copy link
Copy Markdown
Member

@Pierre-Sassoulas Pierre-Sassoulas commented May 26, 2026

Follow-up to #14521, if we push the concept of generator for comparators to all the consumers of comparator, we can avoid computing big diff when they are going to be truncated anyway, and ultimately make pytest faster (maybe also make the output with -vvv more fluid).

I didn't do it because the truncation footer could no longer report an exact hidden-line count. It would change from "...Full output truncated (499992 lines hidden), use '-vv' to show" to something like "...Full output truncated, use '-vv' to show", but on 500k element list/dict/set it makes pytest 2x faster.

Ways to claim that speedup:

  • drop the count entirely so the footer becomes ``...Full output truncated, use '-vv' to show`
  • gate the count on -v.
  • "we have at least 3 lines mores" (the same regardless of whether the diff is 12 or 12 million lines).
    Fishing for opinions here :)

@Pierre-Sassoulas Pierre-Sassoulas added type: performance performance or memory problem/improvement type: refactoring internal improvements to the code labels May 26, 2026
@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as draft May 26, 2026 10:17
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pytest that referenced this pull request May 26, 2026
@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided (automation) changelog entry is part of PR label May 26, 2026
@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as ready for review May 26, 2026 11:39
Copy link
Copy Markdown
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@Pierre-Sassoulas
Copy link
Copy Markdown
Member Author

Thank you ! Do you have an opinion about the next step ? (3 options in the PR description)

Copy link
Copy Markdown
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my last comment, I think it might affect the plan, so I'll wait before reviewing the rest.

Comment thread src/_pytest/assertion/util.py Outdated
try:
if op == "==":
explanation = _compare_eq_any(
source: Iterator[str] = _compare_eq_any(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the annotation needed?

Comment thread src/_pytest/assertion/util.py Outdated
elif op == "not in" and istext(left) and istext(right):
source = _notin_text(left, right, verbose)
elif op in {"!=", ">=", "<=", ">", "<"} and isset(left) and isset(right):
source = iter(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change SetComparisonFunction to return Iterator instead of Iterable, can avoid the iter here. Alternatively, change source to Iterable[str].

Comment thread src/_pytest/assertion/__init__.py Outdated
Comment on lines +220 to +228
) -> list[str] | None:
) -> Iterator[str]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can't change the return type here, it is stable API (see hookspec.py).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can truncate in util.assertrepr_compare and still get the performance improvements without modifying the return type of the stable API ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the truncation part, but if you can avoid the API break I'd be happy to take a look.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I moved the truncation to keep the performance without touching the API, but had to hack truncation a little.

First, we have to truncate twice because plugin maintainer expect the truncation to be handled where it was before I suppose.

Second, if we choose to drop the detail in the truncation message ("x lines hidden") then we can also drop the hacky check to see if there's the truncation header in the string for the second truncation. (I would favor that, small price to pay, imo), but right now I don't see an elegant way to do that as we have multiple way to truncate and it's not idempotent.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we choose to drop the detail in the truncation message ("x lines hidden")

I'm OK with dropping that detail; it is a small price to pay comparing to the benefits.

Pierre-Sassoulas added a commit to Pierre-Sassoulas/pytest that referenced this pull request May 27, 2026
Addresses review feedback on PR pytest-dev#14523:
* drop the redundant ``: Iterator[str]`` annotation on ``source`` —
  every branch already produces an ``Iterator[str]``.
* return ``Iterator[str]`` from ``SetComparisonFunction`` instead of
  ``Iterable[str]`` so the call site no longer needs ``iter(...)``;
  the ``!=`` branch is promoted from a list-returning lambda to a
  named generator so the new contract holds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pytest that referenced this pull request May 28, 2026
Adds two regression tests to close the patch-coverage gaps in
``callbinrepr`` reported by codecov on PR pytest-dev#14523:
* a plugin returning a truthy-but-empty iterator (``iter([])``) to
  exercise the second ``if not new_expl: continue`` after
  ``materialize_with_truncation``.
* a ``--assert=plain`` run to exercise the false branch of the
  ``assertmode == "rewrite"`` guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pytest that referenced this pull request May 28, 2026
When every ``pytest_assertrepr_compare`` impl returns ``None`` (e.g.
``assert 1 == 2`` — no specialised comparator applies), the dispatcher
exhausts ``hook_result``, exits the loop normally, and returns
``None``. The previously-uncovered ``continue → loop exit`` arc on
the first ``if not new_expl: continue`` line was the last patch
coverage gap on PR pytest-dev#14523.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Pierre-Sassoulas Pierre-Sassoulas requested a review from bluetech May 28, 2026 06:19
Pierre-Sassoulas and others added 17 commits May 30, 2026 20:42
Following Ronny's review comment on pytest-dev#13762, switch the set comparison
helpers in ``_compare_set.py`` to return ``Iterator[str]`` so the
composition is direct: ``_set_one_sided_diff`` ``yield``s, and the
other helpers ``yield from`` it. This avoids the manual
``explanation = []; .append/.extend`` boilerplate.

The "equal sets" branch of ``_compare_gt_set`` / ``_compare_lt_set``
used to peek at the diff for emptiness; replace that with a direct
``left == right`` check so the generator form stays idiomatic.

``SET_COMPARISON_FUNCTIONS`` and ``_compare_eq_set`` now return
``Iterable[str]`` / ``Iterator[str]``; the consumers in
``_compare_eq_any`` materialise with ``list(...)``.
Drop the ``list(...)`` wraps around each per-type comparator call in
the match dispatch and ``yield from`` instead. ``_compare_eq_any``
becomes an ``Iterator[str]`` that yields nothing when no specialised
explanation applies (replaces the previous ``list[str] | None``
sentinel).

The two callers materialise:

* ``util.assertrepr_compare`` does
  ``list(_compare_eq_any(...))`` before its empty/summary check.
* ``_compare_eq_cls`` iterates the generator directly via
  ``for line in _compare_eq_any(...)``.

No behavior change yet — this is the stepping stone for letting the
truncator upstream consume the iterator lazily so huge diffs don't
materialise just to be thrown away.
Turn ``assertrepr_compare`` into a generator. The first line yielded is
the summary; subsequent lines are the explanation produced by
``_compare_eq_any``. Yields nothing when no specialised explanation
applies — the consumer maps an empty iterator to ``None``.

The ``pytest_assertrepr_compare`` hook impl in ``assertion/__init__``
materialises the iterator and returns ``list[str] | None`` so the
public hook contract is unchanged. A follow-up commit replaces the
``list(...)`` call with a streaming truncator so an enormous diff
doesn't have to be built in full just to be discarded.

Behaviour change: previously, if an exception was raised while
building the explanation (e.g. a faulty ``__repr__``), the partial
output was discarded and only the failure notice was returned. The
generator can't unyield lines it has already produced, so the new
form preserves the partial output and appends the failure notice
after it. This is arguably more useful — the reader sees what was
being compared at the point the comparison failed.

``test_list_bad_repr`` is updated to assert that the failure notice
appears at the end of the explanation instead of replacing the body.
…ions

The existing ``truncate_if_required`` takes a ``list[str]`` — it can
only trim *after* the full explanation has been built. Add a streaming
counterpart that takes an ``Iterable[str]`` and stops pulling lines as
soon as the truncation threshold is reached, so a huge comparison
doesn't have to materialise its entire output just to be discarded.

The remaining lines are still iterated past the cap (without storing)
so the truncation footer can report the exact hidden-line count, and
``_truncate_explanation`` gains an ``extra_hidden`` argument to fold
that count into the message.

``_get_truncation_parameters`` is also refactored to take a ``Config``
directly (it never used anything else from ``Item``), so the new
streaming helper can be called from places that don't have an item
handy.

The new helper isn't wired up yet — that's the next commit.
Wire the built-in ``pytest_assertrepr_compare`` hook to return the
iterator produced by ``util.assertrepr_compare`` directly, and update
``callbinrepr`` to consume it through ``materialize_with_truncation``.
The result: a comparison that would produce millions of explanation
lines stops at the truncation threshold (default 8 lines / 640 chars)
without materialising the rest, only counting the remaining lines so
the truncation footer still reports the exact hidden-line count.

The ``callbinrepr`` dispatcher's ``materialize_with_truncation`` call
accepts both lists (returned by third-party plugins implementing the
hook) and iterators (returned by the built-in impl), so the change is
transparent to plugin authors.

``callop`` in ``test_assertion`` now materialises the iterator so
tests keep comparing against literal lists.
* Drop ``truncate.truncate_if_required`` — all callers migrated to
  ``materialize_with_truncation`` and the function had no remaining
  users.

* Add ``TestMaterializeWithTruncation`` covering:
    - iterator within limits returns all lines
    - iterator past limits is bounded and contains a truncation marker
    - sized and unsized inputs produce equivalent shapes
    - truncation is skipped at ``-vv``
    - the lines that survive truncation start with the original input

  Assertions check behaviour (the presence of a "truncated" marker,
  the length being bounded, the first lines being preserved), never
  the literal footer wording — so the tests survive a future decision
  to drop the ``(N lines hidden)`` count from the message.

* Add ``test_plugin_hook_returning_none_is_skipped`` to cover the
  ``if new_expl is None: continue`` branch in ``callbinrepr``.

* Add ``test_exception_before_first_yield_emits_summary_and_notice``
  to cover the ``summary_yielded is False`` arm of
  ``assertrepr_compare``'s exception handler — when the comparator
  raises before yielding anything, the summary is still produced so
  the reader sees what was compared.
… None``

The hookspec advertises ``list[str] | None`` as the stable return type;
the streaming refactor had changed the built-in impl to ``Iterator[str]``.
Restore the spec'd shape by materialising inside the hook through
``materialize_with_truncation`` — the iterator from
``util.assertrepr_compare`` is still consumed lazily, so a huge diff
short-circuits at the truncation threshold without being fully built.

To avoid double-truncation between the hook and ``callbinrepr`` (which
still truncates plugin-supplied lists), make
``materialize_with_truncation`` idempotent on inputs that already end in
our truncation footer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses review feedback on PR pytest-dev#14523:
* drop the redundant ``: Iterator[str]`` annotation on ``source`` —
  every branch already produces an ``Iterator[str]``.
* return ``Iterator[str]`` from ``SetComparisonFunction`` instead of
  ``Iterable[str]`` so the call site no longer needs ``iter(...)``;
  the ``!=`` branch is promoted from a list-returning lambda to a
  named generator so the new contract holds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two regression tests to close the patch-coverage gaps in
``callbinrepr`` reported by codecov on PR pytest-dev#14523:
* a plugin returning a truthy-but-empty iterator (``iter([])``) to
  exercise the second ``if not new_expl: continue`` after
  ``materialize_with_truncation``.
* a ``--assert=plain`` run to exercise the false branch of the
  ``assertmode == "rewrite"`` guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When every ``pytest_assertrepr_compare`` impl returns ``None`` (e.g.
``assert 1 == 2`` — no specialised comparator applies), the dispatcher
exhausts ``hook_result``, exits the loop normally, and returns
``None``. The previously-uncovered ``continue → loop exit`` arc on
the first ``if not new_expl: continue`` line was the last patch
coverage gap on PR pytest-dev#14523.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nest the truthiness checks instead of using ``continue`` to skip to
the next ``hook_result`` entry. Behaviourally identical, but each
``continue`` was being reported as a partial branch by codecov even
when both arcs were hit by tests — pytester-driven in-process tests
don't always show the ``continue → loop exit`` arc, so the partials
were sticky. With the nested form the for-loop has a single fall-
through edge and the partial disappears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The streaming truncator previously drained the input iterator just to
count how many lines it was discarding, so the footer could read
``...Full output truncated (N lines hidden), use '-vv' to show``. With
the count gone the footer is a constant string, the iterator is
released as soon as the budget is reached, and the idempotency check
on already-truncated lists collapses to "the message is a constant,
so re-emitting it is harmless".

Benchmark — ``materialize_with_truncation`` over a lazy generator
whose per-line cost is non-trivial (upper bound of the change's
value):

  n         before        after       speedup
  1,000     0.39 ms       0.03 ms     ~13x
  10,000    4.18 ms       0.03 ms     ~140x
  100,000   41.29 ms      0.04 ms     ~1,000x
  1,000,000 419.90 ms     0.04 ms     ~10,000x

Before is O(n) in iterator length, after is O(1) — confirms the rest
of the generator is dropped instead of drained.

Benchmark — full ``util.assertrepr_compare`` path on two large sets
(today's reality, where ``_compare_eq_iterable`` still builds the
diff up-front via ``pformat().splitlines()`` + ``difflib.ndiff``):

  n         before        after       delta
  1,000     2.91 ms       2.68 ms     -8%
  10,000    31.09 ms      28.26 ms    -9%
  100,000   376.91 ms     361.03 ms   -4%
  500,000   2299 ms       2194 ms     -5%

Marginal today because the upstream isn't lazy yet; the large win
will land when ``_compare_eq_iterable`` is made to yield instead of
pre-building.
…_eq_iterable``

The "Full diff:" block built the entire diff as a single joined string
(``"\n".join(difflib.ndiff(...))``), passed it through the syntax
highlighter, and then split it back into lines. That eagerly drained
``difflib.ndiff`` even when the streaming truncator only intended to
consume the first handful of lines.

Yield each ndiff line through the highlighter individually instead.
The diff lexer is line-oriented, so per-line highlighting produces
equivalent output — each line just starts with a redundant
``\x1b[0m`` reset (invisible to the terminal). One
``test_comparisons_handle_colors`` parametrisation was updated to
match the new byte stream.

Benchmark — ``util.assertrepr_compare`` on two large sets with a
small symmetric difference (the realistic huge-diff case), measured
through ``materialize_with_truncation``:

  n         before        after       delta
  1,000     2.68 ms       2.44 ms     -9%
  10,000    28.26 ms      25.44 ms    -10%
  100,000   361.03 ms     316.03 ms   -12%
  500,000   2194 ms       1993 ms     -9%

Modest because ``PrettyPrinter().pformat(...)`` and
``SequenceMatcher``'s opcode computation still run up-front; the
saving is just the bookkeeping for the ndiff output lines past the
truncator's budget. A streaming pformat was prototyped (a
line-collecting stream class) but reverted: a pure-Python
``write``-handler can't keep up with C-level ``StringIO`` +
``splitlines``, and the slowdown outweighed the memory saving.
…ast path

Two related changes unlock ``_compare_eq_iterable`` for huge
collections, where it was previously O(N) in input size regardless
of how few lines the truncator wanted downstream.

1. ``_LineBudgetStream`` + ``PrettyPrinter.pformat_lines(max_lines=...)``:
   a stream whose ``write`` raises ``_LineBudgetExceeded`` once enough
   ``\n``-terminated lines have accumulated. The formatter's recursion
   unwinds at the next write and the caller keeps the partial output.
   PrettyPrinter writes one chunk per element, so the abort fires on
   element boundaries — a 500,000-element set's pformat becomes O(100)
   lines instead of O(500,000).

2. ``_pprint_set`` now tries ``sorted(object)`` first and only falls
   back to ``sorted(object, key=_safe_key)`` on ``TypeError``. The
   ``_safe_key`` wrap exists for heterogeneous-type sets, but it
   allocates a wrapper per element — for 500,000 ints, plain
   ``sorted`` is ~16x faster (15 ms vs 252 ms).

``_compare_eq_iterable`` uses ``pformat_lines(max_lines=100)`` only
when the truncator will be active downstream (``verbose < 2`` and
not running under CI). At ``-vv`` or under CI the user expects the
full diff and the truncator is disabled, so the cap is dropped.

Benchmark — ``util.assertrepr_compare`` on two large sets with a
small symmetric difference, measured through
``materialize_with_truncation``:

  n         before        after         speedup
  1,000     2.44 ms       0.44 ms       ~6x
  10,000    25.44 ms      1.00 ms       ~25x
  100,000   316 ms        8.41 ms       ~38x
  500,000   1993 ms       42.5 ms       ~47x

Scaling went from near-linear to sub-linear — the per-element work
in pformat and ``difflib.ndiff`` no longer dominates. The remaining
floor is the inherent cost of computing the set difference and the
``saferepr`` calls on the symmetric-diff items.

A previous attempt to make ``pformat_lines`` non-bailing (just
streaming the writes) was reverted because a pure-Python ``write``
handler can't compete with C ``StringIO`` + C ``str.splitlines``;
the budget-aware variant only pays Python overhead until the abort
fires, by which point it has already saved far more than it cost.
…``'s pformat cap

The previous commit hardcoded a 100-line ``pformat_cap`` inside
``_compare_eq_iterable``. The dispatcher already knows the real budget
— ``truncate._get_truncation_parameters(config)`` returns it — so
thread that value through ``util.assertrepr_compare`` and
``_compare_eq_any`` instead. The cap is now ``truncation_limit_lines
+ 3`` (matching the truncator's own ``line_cap``: 2 lines for the
truncation message it appends + 1 for overshoot detection), or
``None`` when the truncator is disabled. Users who bump
``truncation_limit_lines`` or run ``-vv`` get the diff at the budget
they asked for, not a hardcoded one.

The verbose/CI heuristic moves out of ``_compare_eq_iterable`` —
``pformat_cap=None`` from the dispatcher already encodes "truncator
off, give me everything", so the helper just respects the cap.

Full test suite: 4111 passed, 107 skipped, 12 xfailed, 1 xpassed.

Cumulative benchmark for this branch's four-commit perf series
(precision-drop + ndiff streaming + lazy PrettyPrinter + cap
plumbing) — synthetic lazy generator with non-trivial per-yield
work, upper bound of the streaming-truncation effect:

  n           before       after      speedup
  1,000       0.39 ms      0.04 ms    ~10x
  10,000      3.65 ms      0.04 ms    ~90x
  100,000     37.35 ms     0.04 ms    ~930x
  1,000,000   389.67 ms    0.04 ms    ~9700x

Before is O(n) in iterator length, after is O(1).

Real ``util.assertrepr_compare`` on two large sets with a small
symmetric difference (the realistic huge-diff case), measured
through ``materialize_with_truncation``:

  n         before       after        speedup
  1,000     5.66 ms      0.23 ms      ~25x
  10,000    40.28 ms     1.02 ms      ~40x
  100,000   395.61 ms    11.53 ms     ~34x
  500,000   2422 ms      58.50 ms     ~41x

Scaling went from near-linear to sub-linear. The remaining floor is
the inherent O(N) cost of the set difference and the ``saferepr``
calls on the symmetric-diff items themselves.
…udget

The lazy-PrettyPrinter commit routed every ``pformat_lines(max_lines=K)``
call through the pure-Python ``_LineBudgetStream``, which is ~1.3x
slower than C ``StringIO`` + C ``splitlines`` on tiny inputs (per-
write overhead dominates). Most assertion diffs that hit
``_compare_eq_iterable`` are small lists/dicts, so the worst-case
path was being taken even when there was nothing to bound.

Skip the budget stream when ``len(object) <= max_lines``: one pformat
line per element is the lower bound for any container, so this is a
sufficient condition for "the budget cap will never fire" — the
plain ``pformat().splitlines()`` path produces the same output more
cheaply. Inputs without ``__len__`` (generators, etc.) still go
through the budget stream.

Doesn't touch the huge-set fast path: at ``len > max_lines`` we still
take the budget-aware path that bounds work to O(max_lines).

Realistic benchmark — pylint's own test suite (2,287 tests, 278
assertion failures, ~115 s per run), interleaved 4 rounds upstream
vs HEAD to control for thermal/cache drift:

  metric    upstream/main   HEAD       delta
  min       114.14 s        106.43 s   -6.8%
  median    115.69 s        115.84 s   +0.1%
  mean      115.95 s        114.20 s   -1.5%

Statistically indistinguishable — the macro workload doesn't exercise
the streaming-truncation path enough to measure. (Earlier sequential
benches showed a 5-9% "regression" that turned out to be a thermal
artifact: when one side runs first and the other second, the second
side runs hotter; interleaving cancels it.) Same 278/1747/2 outcome
on every run, confirming no behavioral change.

The micro-benchmark on huge-set comparisons stays ~40-50x faster —
that's the case this perf series was designed for.
Tighten the entry to two short paragraphs: a one-line summary of the
change, the micro vs realistic-suite numbers side by side, and the
user-visible footer / per-line reset side-effects. Drops the
implementation walk-through that earlier drafts carried.
@Pierre-Sassoulas
Copy link
Copy Markdown
Member Author

I removed the number of removed line in the message. Benchmark on pathological case with very big set show big improvement, but I benchmarked on pylint test suite for a realistic perf change:

round1-upstream: 118.29s
round1-HEAD : 115.39s
round2-upstream: 114.87s
round2-HEAD : 106.43s
round3-upstream: 116.51s
round3-HEAD : 118.69s
round4-upstream: 114.14s
round4-HEAD : 116.28s

Look like it's within noise. pylint does not compare big data structure (the bigger checks are probably the functional test output ~=80 lines).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided (automation) changelog entry is part of PR type: performance performance or memory problem/improvement type: refactoring internal improvements to the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants