[LEADS-415] Responses streaming support by xmican10 · Pull Request #255 · lightspeed-core/lightspeed-evaluation

xmican10 · 2026-06-12T14:34:49Z

Description

Followup PR with supporting responses stream=True parameter

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Added support for streaming responses in the API's /responses endpoint, enabling real-time response streaming for improved latency.
Documentation
- Updated all example documentation to demonstrate explicit output directory configuration using the --output-dir flag for evaluation results.

coderabbitai · 2026-06-12T14:34:57Z

Warning

Review limit reached

@xmican10, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 54 minutes and 27 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 36c2f477-c1f4-4308-89c3-7d2e1063e73f

📥 Commits

Reviewing files that changed from the base of the PR and between 8c03464 and 54e60be.

📒 Files selected for processing (4)

src/lightspeed_evaluation/core/api/client.py
src/lightspeed_evaluation/core/api/streaming_parser.py
tests/unit/core/api/test_client_responses.py
tests/unit/core/api/test_streaming_parser.py

Walkthrough

The PR refactors the streaming parser from Pydantic models to dataclasses and replaces the monolithic event switch with handler-dispatch tables. It adds a new /responses-specific SSE parser (parse_responses_streaming) and wires a streaming branch into _responses_query in APIClient. Unit tests and example README --output-dir flags are updated accordingly.

Changes

Responses Endpoint Streaming Support

Layer / File(s)	Summary
Streaming parser refactor and responses SSE parser `src/lightspeed_evaluation/core/api/streaming_parser.py`	Converts `_PerformanceTracker` and `StreamingContext` from Pydantic `BaseModel` to dataclasses; replaces `_process_event` switch with `_STREAMING_EVENT_HANDLERS` dispatch; removes `_process_tool_result`; updates `_parse_tool_call` to support `name/args` and legacy `tool_name/arguments` shapes; adds `ResponsesStreamingContext` dataclass, `/responses` event handlers, `parse_responses_streaming` with `[DONE]` termination, and `_validate_streaming_response`/`_validate_responses_response` helpers.
APIClient streaming dispatch for `/responses` `src/lightspeed_evaluation/core/api/client.py`	Imports `parse_responses_streaming`; adds a streaming branch in `_responses_query` that uses `httpx.Client.stream`, validates HTTP status via `_handle_response_errors`, invokes `parse_responses_streaming`, and returns `APIResponse` directly when `responses_request.stream` is truthy.
Unit tests: client responses and streaming parser `tests/unit/core/api/test_client_responses.py`, `tests/unit/core/api/test_streaming_parser.py`	Adds `TestResponsesEndpoint` covering success mapping, `extra_request_params` merging, `file_search_call`/`mcp_call` extraction, error-field propagation, missing output, timeout, HTTP errors, streaming dispatch, and 429 retry; updates existing tool-call fixtures for `name/args` format; adds `TestParseResponsesStreaming` for basic response, MCP tools, file search, missing fields, and `[DONE]` sentinel behavior.

Example README Updates

Layer / File(s)	Summary
Add `--output-dir` to all example run commands `examples/01_getting_started/basic_setup/README.md`, `examples/02_metrics/*/README.md`	Adds `--output-dir` pointing to each example's local `eval_output/` directory in the `uv run lightspeed-eval` command snippet across seven READMEs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

lightspeed-core/lightspeed-evaluation#254: Adds initial non-streaming /responses endpoint support in APIClient and response/tool-call mapping that this PR extends with streaming dispatch and parse_responses_streaming.

Suggested reviewers

asamal4
VladimirKadlec

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[LEADS-415] Responses streaming support' directly and clearly describes the main change: implementing streaming support for the responses endpoint.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

asamal4 · 2026-06-17T00:35:24Z

@coderabbitai full review

coderabbitai · 2026-06-17T00:35:32Z

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/lightspeed_evaluation/core/api/streaming_parser.py (1)

356-358: 💤 Low value

Unreachable code: arguments can never be None after line 350.

Line 350 uses or {} as a fallback, so arguments will always be at least an empty dict. The condition if arguments is None on line 356 will never be true.

Consider removing this unreachable block:

Proposed fix

         if not tool_name:
             logger.debug("Tool call missing name/tool_name field")
             return None
 
-        if arguments is None:
-            logger.debug("Tool call missing args/arguments field for %s", tool_name)
-            return None
-
         tool_call: dict[str, Any] = {"tool_name": tool_name, "arguments": arguments}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/core/api/streaming_parser.py` around lines 356 -
358, Remove the unreachable `if arguments is None:` condition and its associated
debug logging block. Since the arguments variable is assigned with an `or {}`
fallback pattern earlier in the code, it will always contain at least an empty
dictionary and can never be None, making this entire conditional block dead code
that should be deleted.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/core/api/test_client_responses.py`:
- Line 1: Remove the module-level pylint disable comment at the top of
test_client_responses.py that suppresses protected-access and duplicate-code
warnings. Instead of disabling these checks globally, identify the specific code
locations that trigger these warnings (likely places where protected members are
accessed with underscore prefixes or where code is duplicated) and either
refactor the code to avoid the issue, or if necessary, apply targeted localized
lint suppressions directly to those specific lines. This ensures code quality
standards are maintained and warnings are addressed rather than hidden.

---

Nitpick comments:
In `@src/lightspeed_evaluation/core/api/streaming_parser.py`:
- Around line 356-358: Remove the unreachable `if arguments is None:` condition
and its associated debug logging block. Since the arguments variable is assigned
with an `or {}` fallback pattern earlier in the code, it will always contain at
least an empty dictionary and can never be None, making this entire conditional
block dead code that should be deleted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 00d2cc99-273a-4e6c-8365-056cd39844d9

📥 Commits

Reviewing files that changed from the base of the PR and between 454d871 and 5bb24c3.

📒 Files selected for processing (19)

config/system.yaml
examples/01_getting_started/basic_setup/README.md
examples/02_metrics/context_quality/README.md
examples/02_metrics/conversation_quality/README.md
examples/02_metrics/keywords_evaluation/README.md
examples/02_metrics/nlp_metrics/README.md
examples/02_metrics/response_quality/README.md
examples/02_metrics/tool_evaluation/README.md
examples/03_endpoints/responses/README.md
examples/03_endpoints/responses/eval_data.yaml
examples/03_endpoints/responses/system.yaml
src/lightspeed_evaluation/core/api/client.py
src/lightspeed_evaluation/core/api/streaming_parser.py
src/lightspeed_evaluation/core/constants.py
src/lightspeed_evaluation/core/metrics/custom/tool_eval.py
tests/unit/core/api/conftest.py
tests/unit/core/api/test_client_responses.py
tests/unit/core/api/test_streaming_parser.py
tests/unit/core/metrics/custom/test_tool_eval.py

coderabbitai · 2026-06-17T00:40:51Z

@@ -0,0 +1,386 @@
+# pylint: disable=protected-access,duplicate-code


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove module-level lint suppression and fix the underlying warnings.

Line 1 disables pylint checks for the whole module. Please remove this directive and resolve the concrete warnings directly (or localize/refactor the triggering code) instead of suppressing them globally.

As per coding guidelines, "**/*.py: Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/core/api/test_client_responses.py` at line 1, Remove the module-level pylint disable comment at the top of test_client_responses.py that suppresses protected-access and duplicate-code warnings. Instead of disabling these checks globally, identify the specific code locations that trigger these warnings (likely places where protected members are accessed with underscore prefixes or where code is duplicated) and either refactor the code to avoid the issue, or if necessary, apply targeted localized lint suppressions directly to those specific lines. This ensures code quality standards are maintained and warnings are addressed rather than hidden.

Source: Coding guidelines

xmican10 · 2026-06-17T07:40:43Z

I need to rebase..

xmican10 · 2026-06-17T09:18:59Z

@coderabbitai full review

asamal4

Thanks !! Some minor comments - applicable in multiples places

asamal4 · 2026-06-17T09:44:28Z

+        handler(ctx, item)
+
+
+def _responses_handle_mcp_call_item(


didn't we add similar logic in _parse_mcp_call_item

asamal4 · 2026-06-17T09:45:34Z

        Dictionary with tool_name, arguments, and optional result, or None if invalid.
    """
    try:
-        # Support both "name"/"args" (new format) and "tool_name"/"arguments" (legacy)


Why are we removing these comments ?

asamal4 · 2026-06-17T09:47:29Z

            'data: {"event": "start", "data": {"conversation_id": "conv_new"}}',
            'data: {"event": "tool_call", "data": '
-            '{"id": "tc_1", "name": "pods_list", "args": {"namespace": "default"}}}',
+            '{"id": "tc_1", "name": "get_weather", "args": {"city": "London"}}}',


Why are we changing this ? This is just test data - but still I would prefer to keep it more realistic + there is no technical reason for chaning this.

Suggested change

'{"id": "tc_1", "name": "get_weather", "args": {"city": "London"}}}',

'{"id": "tc_1", "name": "pods_list", "args": {"namespace": "default"}}}',

xmican10 force-pushed the LEADS-415-responses-streaming-support branch 3 times, most recently from e8a6dd9 to 5bb24c3 Compare June 12, 2026 14:55

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Responses endpoint support

131cd39

xmican10 force-pushed the LEADS-415-responses-streaming-support branch from 5bb24c3 to 8c03464 Compare June 17, 2026 07:43

Adding streaming support for responses

54e60be

xmican10 force-pushed the LEADS-415-responses-streaming-support branch from 8c03464 to 54e60be Compare June 17, 2026 07:49

asamal4 reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LEADS-415] Responses streaming support#255

[LEADS-415] Responses streaming support#255
xmican10 wants to merge 2 commits into
lightspeed-core:mainfrom
xmican10:LEADS-415-responses-streaming-support

xmican10 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Review limit reached

❌ Failed checks (1 warning)

Uh oh!

asamal4 commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

xmican10 commented Jun 17, 2026

Uh oh!

xmican10 commented Jun 17, 2026

Uh oh!

asamal4 left a comment

Uh oh!

asamal4 Jun 17, 2026

Uh oh!

asamal4 Jun 17, 2026

Uh oh!

asamal4 Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,386 @@
		# pylint: disable=protected-access,duplicate-code

	'{"id": "tc_1", "name": "get_weather", "args": {"city": "London"}}}',
	'{"id": "tc_1", "name": "pods_list", "args": {"namespace": "default"}}}',

Conversation

xmican10 commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

asamal4 commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

xmican10 commented Jun 17, 2026

Uh oh!

xmican10 commented Jun 17, 2026

Uh oh!

asamal4 left a comment

Choose a reason for hiding this comment

Uh oh!

asamal4 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

asamal4 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

asamal4 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xmican10 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading