Skip to content

[LEADS-415] Responses streaming support#255

Open
xmican10 wants to merge 2 commits into
lightspeed-core:mainfrom
xmican10:LEADS-415-responses-streaming-support
Open

[LEADS-415] Responses streaming support#255
xmican10 wants to merge 2 commits into
lightspeed-core:mainfrom
xmican10:LEADS-415-responses-streaming-support

Conversation

@xmican10

@xmican10 xmican10 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Description

Followup PR with supporting responses stream=True parameter

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added support for streaming responses in the API's /responses endpoint, enabling real-time response streaming for improved latency.
  • Documentation

    • Updated all example documentation to demonstrate explicit output directory configuration using the --output-dir flag for evaluation results.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@xmican10, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 54 minutes and 27 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 36c2f477-c1f4-4308-89c3-7d2e1063e73f

📥 Commits

Reviewing files that changed from the base of the PR and between 8c03464 and 54e60be.

📒 Files selected for processing (4)
  • src/lightspeed_evaluation/core/api/client.py
  • src/lightspeed_evaluation/core/api/streaming_parser.py
  • tests/unit/core/api/test_client_responses.py
  • tests/unit/core/api/test_streaming_parser.py

Walkthrough

The PR refactors the streaming parser from Pydantic models to dataclasses and replaces the monolithic event switch with handler-dispatch tables. It adds a new /responses-specific SSE parser (parse_responses_streaming) and wires a streaming branch into _responses_query in APIClient. Unit tests and example README --output-dir flags are updated accordingly.

Changes

Responses Endpoint Streaming Support

Layer / File(s) Summary
Streaming parser refactor and responses SSE parser
src/lightspeed_evaluation/core/api/streaming_parser.py
Converts _PerformanceTracker and StreamingContext from Pydantic BaseModel to dataclasses; replaces _process_event switch with _STREAMING_EVENT_HANDLERS dispatch; removes _process_tool_result; updates _parse_tool_call to support name/args and legacy tool_name/arguments shapes; adds ResponsesStreamingContext dataclass, /responses event handlers, parse_responses_streaming with [DONE] termination, and _validate_streaming_response/_validate_responses_response helpers.
APIClient streaming dispatch for /responses
src/lightspeed_evaluation/core/api/client.py
Imports parse_responses_streaming; adds a streaming branch in _responses_query that uses httpx.Client.stream, validates HTTP status via _handle_response_errors, invokes parse_responses_streaming, and returns APIResponse directly when responses_request.stream is truthy.
Unit tests: client responses and streaming parser
tests/unit/core/api/test_client_responses.py, tests/unit/core/api/test_streaming_parser.py
Adds TestResponsesEndpoint covering success mapping, extra_request_params merging, file_search_call/mcp_call extraction, error-field propagation, missing output, timeout, HTTP errors, streaming dispatch, and 429 retry; updates existing tool-call fixtures for name/args format; adds TestParseResponsesStreaming for basic response, MCP tools, file search, missing fields, and [DONE] sentinel behavior.

Example README Updates

Layer / File(s) Summary
Add --output-dir to all example run commands
examples/01_getting_started/basic_setup/README.md, examples/02_metrics/*/README.md
Adds --output-dir pointing to each example's local eval_output/ directory in the uv run lightspeed-eval command snippet across seven READMEs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • lightspeed-core/lightspeed-evaluation#254: Adds initial non-streaming /responses endpoint support in APIClient and response/tool-call mapping that this PR extends with streaming dispatch and parse_responses_streaming.

Suggested reviewers

  • asamal4
  • VladimirKadlec
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[LEADS-415] Responses streaming support' directly and clearly describes the main change: implementing streaming support for the responses endpoint.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@xmican10 xmican10 force-pushed the LEADS-415-responses-streaming-support branch 3 times, most recently from e8a6dd9 to 5bb24c3 Compare June 12, 2026 14:55
@asamal4

asamal4 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lightspeed_evaluation/core/api/streaming_parser.py (1)

356-358: 💤 Low value

Unreachable code: arguments can never be None after line 350.

Line 350 uses or {} as a fallback, so arguments will always be at least an empty dict. The condition if arguments is None on line 356 will never be true.

Consider removing this unreachable block:

Proposed fix
         if not tool_name:
             logger.debug("Tool call missing name/tool_name field")
             return None
 
-        if arguments is None:
-            logger.debug("Tool call missing args/arguments field for %s", tool_name)
-            return None
-
         tool_call: dict[str, Any] = {"tool_name": tool_name, "arguments": arguments}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/core/api/streaming_parser.py` around lines 356 -
358, Remove the unreachable `if arguments is None:` condition and its associated
debug logging block. Since the arguments variable is assigned with an `or {}`
fallback pattern earlier in the code, it will always contain at least an empty
dictionary and can never be None, making this entire conditional block dead code
that should be deleted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/core/api/test_client_responses.py`:
- Line 1: Remove the module-level pylint disable comment at the top of
test_client_responses.py that suppresses protected-access and duplicate-code
warnings. Instead of disabling these checks globally, identify the specific code
locations that trigger these warnings (likely places where protected members are
accessed with underscore prefixes or where code is duplicated) and either
refactor the code to avoid the issue, or if necessary, apply targeted localized
lint suppressions directly to those specific lines. This ensures code quality
standards are maintained and warnings are addressed rather than hidden.

---

Nitpick comments:
In `@src/lightspeed_evaluation/core/api/streaming_parser.py`:
- Around line 356-358: Remove the unreachable `if arguments is None:` condition
and its associated debug logging block. Since the arguments variable is assigned
with an `or {}` fallback pattern earlier in the code, it will always contain at
least an empty dictionary and can never be None, making this entire conditional
block dead code that should be deleted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 00d2cc99-273a-4e6c-8365-056cd39844d9

📥 Commits

Reviewing files that changed from the base of the PR and between 454d871 and 5bb24c3.

📒 Files selected for processing (19)
  • config/system.yaml
  • examples/01_getting_started/basic_setup/README.md
  • examples/02_metrics/context_quality/README.md
  • examples/02_metrics/conversation_quality/README.md
  • examples/02_metrics/keywords_evaluation/README.md
  • examples/02_metrics/nlp_metrics/README.md
  • examples/02_metrics/response_quality/README.md
  • examples/02_metrics/tool_evaluation/README.md
  • examples/03_endpoints/responses/README.md
  • examples/03_endpoints/responses/eval_data.yaml
  • examples/03_endpoints/responses/system.yaml
  • src/lightspeed_evaluation/core/api/client.py
  • src/lightspeed_evaluation/core/api/streaming_parser.py
  • src/lightspeed_evaluation/core/constants.py
  • src/lightspeed_evaluation/core/metrics/custom/tool_eval.py
  • tests/unit/core/api/conftest.py
  • tests/unit/core/api/test_client_responses.py
  • tests/unit/core/api/test_streaming_parser.py
  • tests/unit/core/metrics/custom/test_tool_eval.py

@@ -0,0 +1,386 @@
# pylint: disable=protected-access,duplicate-code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove module-level lint suppression and fix the underlying warnings.

Line 1 disables pylint checks for the whole module. Please remove this directive and resolve the concrete warnings directly (or localize/refactor the triggering code) instead of suppressing them globally.

As per coding guidelines, "**/*.py: Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/core/api/test_client_responses.py` at line 1, Remove the
module-level pylint disable comment at the top of test_client_responses.py that
suppresses protected-access and duplicate-code warnings. Instead of disabling
these checks globally, identify the specific code locations that trigger these
warnings (likely places where protected members are accessed with underscore
prefixes or where code is duplicated) and either refactor the code to avoid the
issue, or if necessary, apply targeted localized lint suppressions directly to
those specific lines. This ensures code quality standards are maintained and
warnings are addressed rather than hidden.

Source: Coding guidelines

@xmican10

Copy link
Copy Markdown
Collaborator Author

I need to rebase..

@xmican10 xmican10 force-pushed the LEADS-415-responses-streaming-support branch from 5bb24c3 to 8c03464 Compare June 17, 2026 07:43
@xmican10 xmican10 force-pushed the LEADS-415-responses-streaming-support branch from 8c03464 to 54e60be Compare June 17, 2026 07:49
@xmican10

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@asamal4 asamal4 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !! Some minor comments - applicable in multiples places

handler(ctx, item)


def _responses_handle_mcp_call_item(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we add similar logic in _parse_mcp_call_item

Dictionary with tool_name, arguments, and optional result, or None if invalid.
"""
try:
# Support both "name"/"args" (new format) and "tool_name"/"arguments" (legacy)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing these comments ?

'data: {"event": "start", "data": {"conversation_id": "conv_new"}}',
'data: {"event": "tool_call", "data": '
'{"id": "tc_1", "name": "pods_list", "args": {"namespace": "default"}}}',
'{"id": "tc_1", "name": "get_weather", "args": {"city": "London"}}}',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing this ? This is just test data - but still I would prefer to keep it more realistic + there is no technical reason for chaning this.

Suggested change
'{"id": "tc_1", "name": "get_weather", "args": {"city": "London"}}}',
'{"id": "tc_1", "name": "pods_list", "args": {"namespace": "default"}}}',

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants