Skip to content

feat: batch event ingestion on POST /track via { type: 'batch' } envelope#377

Open
mraj602-tohands wants to merge 7 commits into
Openpanel-dev:mainfrom
Tohands-Private-Limited:feat/batch-v2-upstream
Open

feat: batch event ingestion on POST /track via { type: 'batch' } envelope#377
mraj602-tohands wants to merge 7 commits into
Openpanel-dev:mainfrom
Tohands-Private-Limited:feat/batch-v2-upstream

Conversation

@mraj602-tohands
Copy link
Copy Markdown

@mraj602-tohands mraj602-tohands commented May 27, 2026

Reworked per review (2026-06-04): POST /track/batch is gone — batch ingestion now rides on the existing POST /track as a { "type": "batch", "payload": [...] } envelope, and all worker/session changes have been dropped (deferring to the session-management rework). This PR is now transport-only: 4 files, all API-side.

Closes #373. Supersedes #374.

Summary

POST /track accepts a new body shape alongside the existing single-event one:

{ "type": "batch", "payload": [ { "type": "track", "payload": { ... } }, ... ] }

Up to 2000 events / 10 MB per request. Each item is dispatched through the exact same per-event pipeline as a single-event request. Built so offline-capable SDKs (mobile, IoT, edge) can drain a local buffer in one round-trip.

What changed per review

Review ask Done
Drop the separate /track/batch endpoint, piggyback on /track with a batch type zTrackBatchHandlerPayload joins the body union on POST /track; no new route
Remove the worker incoming-events/session changes apps/worker, packages/queue, apps/api/src/utils/ids.ts, event.controller.ts are all untouched vs main. getTimestamp / isTimestampFromThePast semantics unchanged.

Diff is now: packages/validation/src/track.validation.ts, apps/api/src/controllers/track.controller.ts, apps/api/src/routes/track.router.ts, plus a test file.

Behavior

Response: single events keep their 200 { deviceId, sessionId }. Batch envelopes return 202 { accepted, rejected[] }.

Per-row rejection instead of whole-batch failure. The envelope validates loosely (payload: z.unknown()[], 1–2000 items); each item is safeParsed through zTrackHandlerPayload in the controller. Failures become { index, reason: 'validation' | 'internal', error } entries — a 2000-event batch with one malformed row doesn't fail the other 1999, and the caller can re-send just the bad indices. alias is rejected per-row with the same message as the single-event 400.

Salts + geo fetched once per request. buildSharedRequestContext does the salts query and request-IP geo lookup once; buildEventContext reuses them per item. Only events overriding __ip trigger a second geo lookup. For a 2000-event batch that's 1 PG round-trip + 1 geo lookup instead of 2000 of each.

Bounded concurrency. Items are processed 50 at a time — each event touches Redis (and possibly geo), so an unbounded Promise.all over 2000 events would spike connection-pool/rate budgets on smaller self-hosted instances.

Caps. 2000 events / 10 MB mirrors Mixpanel /import. The 10 MB body cap uses Fastify's existing bodyLimit on the route — no new mechanism.

Refactor note: the single-event switch in handler moved into a shared dispatchEvent (with a never exhaustiveness check) so single and batch paths can't drift apart.

Out of scope (intentionally)

Historical events with __timestamp still get their session derived at processing time, exactly as on main — this PR doesn't touch session handling. Once the session-management rework lands, batched historical events should benefit from it automatically since they flow through the same per-event pipeline.

Testing

apps/api/src/routes/track-batch.router.test.ts (449 lines): envelope happy path, per-index rejection (validation + internal), mixed valid/invalid rows, alias-in-batch, nested-batch rejection, empty/oversized payload bounds, __ip per-event geo override, non-object items, all dispatch arms (group, assign_group, increment, decrement, replay), single-event regression (200 shape unchanged), and chunk-boundary index alignment. Full apps/api suite passing.

Summary by CodeRabbit

  • New Features
    • Tracking API now supports batch event ingestion, accepting up to 2,000 events per request with a 10 MB payload limit.
    • Batch requests return per-item acceptance and rejection details.
    • Failed validation for individual items no longer blocks processing of valid items in the batch.

Reimplements the batch ingestion feature from PR Openpanel-dev#374 on top of current
upstream/main (post buffer-perf, kafka client, ClickHouse round-robin).
Adds __syncedAt property to all worker-processed events.

See PR description for full architectural details.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fd25152-7171-48d5-a4a0-65a06d2b317f

📥 Commits

Reviewing files that changed from the base of the PR and between 6149a20 and 74c229f.

📒 Files selected for processing (4)
  • apps/api/src/controllers/track.controller.ts
  • apps/api/src/routes/track-batch.router.test.ts
  • apps/api/src/routes/track.router.ts
  • packages/validation/src/track.validation.ts
💤 Files with no reviewable changes (1)
  • packages/validation/src/track.validation.ts

📝 Walkthrough

Walkthrough

This PR adds POST /track/batch for buffered multi-event ingestion. Batch requests now share one geo lookup and device salt per request, validate each event independently, and dispatch up to 2000 events with bounded concurrency (50 concurrent). Per-item validation failures aggregate into a 202 response without failing the whole batch.

Changes

Batch event ingestion with per-event validation and bounded concurrency

Layer / File(s) Summary
Batch envelope validation schema
packages/validation/src/track.validation.ts
Added TRACK_BATCH_MAX_EVENTS = 2000, zTrackBatchHandlerPayload enforcing { type: 'batch', payload: unknown[] [1..2000] }, and inferred ITrackBatchHandlerPayload type.
Per-request and per-event context refactor
apps/api/src/controllers/track.controller.ts
Introduced SharedRequestContext (projectId/geo/salts) pooled once per request via buildSharedRequestContext, and buildEventContext to derive validated timestamp and identity/device/session per event with optional __ip override triggering conditional geo re-fetch.
Event routing and single-event handler
apps/api/src/controllers/track.controller.ts
Added internal dispatchEvent to route validated event types to handlers, rejecting alias and unknown types with 400. Refactored exported handler to build shared context once, dispatch via dispatchEvent, and return { deviceId, sessionId }.
Batch handler with bounded concurrency
apps/api/src/controllers/track.controller.ts
Implemented handleBatch with BATCH_CONCURRENCY = 50, per-item zTrackHandlerPayload.safeParse validation, per-item buildEventContext + dispatchEvent execution, and error aggregation into 202 response with accepted count and rejected[index, reason, error] array (reason: 'validation' for 4xx, 'internal' otherwise).
Route schema, body limit, and response types
apps/api/src/routes/track.router.ts
Updated POST / to accept union of single-event and batch payloads with optional client credentials, set bodyLimit: 10 MB, and extended OpenAPI responses to include 202 for batches with accepted and per-item rejected[] schema.
Test infrastructure
apps/api/src/routes/track-batch.router.test.ts
Added hoisted mocks (db/queue/geo/common-server/redis), test lifecycle with cleanup, and request/payload helpers (postTrack, postBatch, factories).
Batch endpoint and validation tests
apps/api/src/routes/track-batch.router.test.ts
Auth/envelope validation (401 missing clientId, 400 on invalid payloads), happy path 202 responses, queue dispatch for track, profile upserts for identify, mixed event types, error handling, geo re-fetch, timestamp validation, device ID pass-through, and non-object rejection.
Per-item validation and chunk-boundary regression
apps/api/src/routes/track-batch.router.test.ts
Per-index validation failures don't fail batch, alias rejected per-item, nested batches rejected per-item, all-invalid batches skip queue, and 200-event batch regression asserts exact rejected index preservation across concurrency chunks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Behold, the batch arrives in one swift hop,
Two thousand events skip the network hop!
Each validated, dispatched, no whole-batch flop,
With geo pooled once—efficiency at the top!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: batch event ingestion on POST /track via { type: 'batch' } envelope' accurately reflects the main change: adding batch event support to the existing POST /track endpoint using a batch envelope structure.
Linked Issues check ✅ Passed The PR implementation addresses all primary coding requirements from #373: batch ingestion supporting 2000 events/10MB, per-item validation without whole-batch failure (202 with rejected[]), timestamp-based session derivation, alias rejection, bounded concurrency (50), 5-day historical window enforcement, and preservation of single-event /track behavior.
Out of Scope Changes check ✅ Passed All changes are directly related to batch event ingestion scope: controller refactoring for shared context and event dispatch, batch validation schemas, router updates for union payload types, and comprehensive test coverage. No extraneous changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
apps/worker/src/jobs/events.incoming-events.test.ts (1)

553-596: 💤 Low value

Consider adding test coverage for historical events when lock is not acquired.

This test verifies historical event handling when the lock is acquired (session_start emitted). Consider adding a companion test where getLock returns false to verify that:

  1. session_start is skipped
  2. The event itself is still created
  3. sessionEnd is still not scheduled (historical path)

This would complete coverage of the historical event path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/worker/src/jobs/events.incoming-events.test.ts` around lines 553 - 596,
Add a new test alongside the existing historical-event test that simulates the
lock NOT being acquired by mocking getLock to return false; call incomingEvent
with the same historical payload and assert that createEvent is still called for
the event (but there is no 'session_start' createEvent call), sessionsQueue.add
is not called (no sessionEnd scheduled), and that the event createEvent call
preserves deviceId/sessionId; reference the incomingEvent handler, getLock mock,
createEvent mock, and sessionsQueue.add to locate and update the test harness.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/worker/src/jobs/events.incoming-event.ts`:
- Around line 217-219: The referrer fields (enrichment.referrer,
enrichment.referrer_name, enrichment.referrer_type) use "?? undefined" which can
yield undefined instead of falling back to the base event; change them to use
the same fallback pattern as the other fields by replacing "enrichment.referrer
?? undefined" with "enrichment.referrer ?? baseEvent.referrer" and likewise
"enrichment.referrer_name ?? baseEvent.referrer_name" and
"enrichment.referrer_type ?? baseEvent.referrer_type" so string-type
expectations are preserved.

---

Nitpick comments:
In `@apps/worker/src/jobs/events.incoming-events.test.ts`:
- Around line 553-596: Add a new test alongside the existing historical-event
test that simulates the lock NOT being acquired by mocking getLock to return
false; call incomingEvent with the same historical payload and assert that
createEvent is still called for the event (but there is no 'session_start'
createEvent call), sessionsQueue.add is not called (no sessionEnd scheduled),
and that the event createEvent call preserves deviceId/sessionId; reference the
incomingEvent handler, getLock mock, createEvent mock, and sessionsQueue.add to
locate and update the test harness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 91718804-8af2-4a6e-9663-17127061acf7

📥 Commits

Reviewing files that changed from the base of the PR and between c470b8b and 4a8a8c4.

📒 Files selected for processing (9)
  • apps/api/src/controllers/event.controller.ts
  • apps/api/src/controllers/track.controller.ts
  • apps/api/src/routes/track-batch.router.test.ts
  • apps/api/src/routes/track.router.ts
  • apps/api/src/utils/ids.ts
  • apps/worker/src/jobs/events.incoming-event.ts
  • apps/worker/src/jobs/events.incoming-events.test.ts
  • packages/queue/src/queues.ts
  • packages/validation/src/track.validation.ts
💤 Files with no reviewable changes (1)
  • packages/queue/src/queues.ts

Comment thread apps/worker/src/jobs/events.incoming-event.ts Outdated
mraj602 and others added 2 commits May 28, 2026 05:19
Tests use toStrictEqual, so the new __syncedAt property in event
properties needs to be included in assertions. Uses expect.any(String)
since the exact ISO timestamp varies per run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hment

When session enrichment has null referrer fields, fall back to
baseEvent values instead of undefined. Consistent with all other
enrichment fields (path, os, browser, geo, etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mraj602-tohands
Copy link
Copy Markdown
Author

Hi @lindesvard, just following up on this PR in case it was missed. Happy to make any changes or adjustments if needed. Would appreciate a review when you get a chance 🙂

@lindesvard
Copy link
Copy Markdown
Contributor

Hey, today our session management is very time dependent, so batching events might cause issues here. I have started working on improving this locally and will probably need to merge that before I can merge this (this will avoid your changes in the incoming-events file etc)

Secondly I think we should just piggy back on our existing /track endpoint. It takes a generic body

{
  "type": "batch",
  "payload": [{event}, {event}, {event}]
}

This would make proxying etc easier since we dont need to add yet another endpoint to proxy.

So what I would like you todo is to:

  1. Remove the batch endpoint and use our existing track endpoint
  2. Remove changes in the incoming events file (that will be solved in my local work to handle sessions better)

mraj602 and others added 4 commits June 4, 2026 18:46
…changes

Per review feedback on Openpanel-dev#377:
- Remove the separate /track/batch endpoint; POST /track now accepts
  { "type": "batch", "payload": [event, ...] } alongside single events,
  so proxies only ever deal with one endpoint.
- Revert all worker/session changes (events.incoming-event.ts, queue
  payload, ids.ts, event.controller.ts) back to upstream behavior —
  session handling improvements will land separately.
- Batch items reuse the existing per-event pipeline unchanged, including
  the isTimestampFromThePast flag semantics; per-item validation errors
  are reported per-index in rejected[] without failing the whole batch.
- Raise the /track body limit to 10 MB to fit max-size batches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tch arms

Adds batch coverage for group/assign_group/increment/decrement/replay
dispatch, per-event __ip geo override, non-object items, single-event
alias 400, and queue-failure internal rejections.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mraj602-tohands mraj602-tohands changed the title feat: bulk event ingestion (POST /track/batch) for offline-first SDKs feat: batch event ingestion on POST /track via { type: 'batch' } envelope Jun 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/api/src/controllers/track.controller.ts (2)

594-606: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Map rejected[] to the public response shape instead of returning internal state.

This currently pushes the full BatchItemResult objects into the response, so each rejected item includes status: 'rejected'. apps/api/src/routes/track.router.ts documents the 202 body as { index, reason, error }, so the controller should emit that exact shape instead of relying on serializer behavior.

Suggested fix
   let accepted = 0;
-  const rejected: Extract<BatchItemResult, { status: 'rejected' }>[] = [];
+  const rejected: Array<{
+    index: number;
+    reason: 'validation' | 'internal';
+    error: string;
+  }> = [];
   for (const result of results) {
     if (result.status === 'accepted') {
       accepted += 1;
     } else {
-      rejected.push(result);
+      rejected.push({
+        index: result.index,
+        reason: result.reason,
+        error: result.error,
+      });
     }
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 594 - 606, The
current loop pushes full BatchItemResult objects into the rejected array and
returns them in the 202 response; update the controller (the loop that iterates
over results and the rejected array sent in reply.status(202).send) to map each
rejected item to the public shape { index, reason, error } instead of the
internal object (i.e., when result.status !== 'accepted' push or collect {
index: result.index, reason: result.reason, error: result.error }), then send
that mapped array in the response so it matches the documented contract.

156-158: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the “no user-agent => no identity” behavior.

apps/api/src/utils/ids.ts explicitly returns empty deviceId/sessionId when ua is missing. Replacing a missing header with 'unknown/1.0' here changes that contract and can collapse unrelated server-side traffic behind the same IP into the same device/session.

Suggested fix
 interface SharedRequestContext {
   projectId: string;
   requestIp: string;
-  requestUa: string;
+  requestUa?: string;
   requestHeaders: Record<string, string | undefined>;
   requestGeo: GeoLocation;
   salts: Salts;
 }
@@
-  const requestIp = request.clientIp;
-  const requestUa = request.headers['user-agent'] ?? 'unknown/1.0';
-  const requestHeaders = getStringHeaders(request.headers);
+  const requestIp = request.clientIp;
+  const requestHeaders = getStringHeaders(request.headers);
+  const requestUa = requestHeaders['user-agent'];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 156 - 158, The
code currently forces a default user-agent string by setting requestUa =
request.headers['user-agent'] ?? 'unknown/1.0', which breaks the contract in
ids.ts that expects a missing UA to yield empty deviceId/sessionId; change this
to preserve "no UA => no identity" by leaving requestUa undefined/null when the
header is absent (e.g., requestUa = request.headers['user-agent'] as string |
undefined) and ensure downstream uses of requestUa (the identity generation code
that calls into ids.ts) continue to treat undefined/empty as "no UA" rather than
a placeholder.
🧹 Nitpick comments (2)
apps/api/src/routes/track.router.ts (1)

37-37: 💤 Low value

Consider splitting the long description for readability.

The description string is comprehensive but spans a very long single line. While this works fine for OpenAPI documentation, breaking it into a template literal with multiple lines would improve source readability.

♻️ Optional refactor for readability
-      description: `Ingest a tracking event (track, identify, group, increment, decrement, replay) or a batch of events ({ "type": "batch", "payload": [event, ...] }). Batch requests accept up to ${TRACK_BATCH_MAX_EVENTS} events and 10MB uncompressed per request; each event is dispatched through the same pipeline as a single-event request. Per-event validation failures are returned in the rejected[] array — the whole batch does not fail on a single bad row.`,
+      description: `Ingest a tracking event (track, identify, group, increment, decrement, replay) or a batch of events.
+        Batch requests use { "type": "batch", "payload": [event, ...] } and accept up to ${TRACK_BATCH_MAX_EVENTS} events and 10MB uncompressed per request.
+        Each event is dispatched through the same pipeline as a single-event request.
+        Per-event validation failures are returned in the rejected[] array — the whole batch does not fail on a single bad row.`,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/routes/track.router.ts` at line 37, The long single-line OpenAPI
description on the route config should be split across multiple template-literal
lines for readability: update the description property in track.router.ts (the
route's description string that references TRACK_BATCH_MAX_EVENTS) to a
multi-line template literal or joined string segments so the content flows
across lines while preserving the embedded ${TRACK_BATCH_MAX_EVENTS}
interpolation and exact wording; locate the description key in the route
definition and break the string into readable lines (or an array .join(''))
without changing the text or interpolation semantics.
apps/api/src/routes/track-batch.router.test.ts (1)

204-359: ⚡ Quick win

Consider adding test coverage for the 5-day historical floor.

The test suite covers historical timestamps (two hours ago), but the PR objectives state: "Enforce a 5-day historical acceptance window" with "older events rejected per-row." There's no test verifying that events with timestamps older than 5 days are rejected with a per-item validation error.

📋 Suggested test case
it('rejects events older than 5 days per-item', async () => {
  const sixDaysAgo = Date.now() - 6 * 24 * 60 * 60 * 1000;
  const res = await postBatch([
    validTrack('recent'),
    {
      type: 'track' as const,
      payload: {
        name: 'ancient_event',
        properties: { __timestamp: new Date(sixDaysAgo).toISOString() },
      },
    },
  ]);
  expect(res.statusCode).toBe(202);
  const body = res.json();
  expect(body.accepted).toBe(1);
  expect(body.rejected).toHaveLength(1);
  expect(body.rejected[0]).toMatchObject({
    index: 1,
    reason: 'validation',
  });
  expect(body.rejected[0].error).toMatch(/timestamp|past|day/i);
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/routes/track-batch.router.test.ts` around lines 204 - 359, Add a
test that verifies the 5-day historical floor by calling postBatch with one
valid track (use validTrack) and one track whose payload.properties.__timestamp
is older than 5 days (e.g., Date.now() - 6*24*60*60*1000), then assert the
response is 202, accepted is 1, rejected length is 1, and the rejected item
matches { index: 1, reason: 'validation' } and its error message matches a
timestamp/past/day regex; name the test "rejects events older than 5 days
per-item" and place it alongside the other batch tests so it exercises the same
pipeline (postBatch, validTrack).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/api/src/controllers/track.controller.ts`:
- Around line 109-128: The change removed the 5-day cutoff and now accepts
arbitrarily old __timestamp values; reintroduce a FIVE_DAYS_MS constant (e.g., 5
* 24 * 60 * 60 * 1000) and add an early check using clientTimestampNumber and
safeTimestamp: if clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS, return
the per-item rejection path required by the batch contract (the same
rejection/validation response your controller uses elsewhere) before the
existing ONE_MINUTE_MS and FIFTEEN_MINUTES_MS logic so old events are rejected
instead of persisted.

---

Outside diff comments:
In `@apps/api/src/controllers/track.controller.ts`:
- Around line 594-606: The current loop pushes full BatchItemResult objects into
the rejected array and returns them in the 202 response; update the controller
(the loop that iterates over results and the rejected array sent in
reply.status(202).send) to map each rejected item to the public shape { index,
reason, error } instead of the internal object (i.e., when result.status !==
'accepted' push or collect { index: result.index, reason: result.reason, error:
result.error }), then send that mapped array in the response so it matches the
documented contract.
- Around line 156-158: The code currently forces a default user-agent string by
setting requestUa = request.headers['user-agent'] ?? 'unknown/1.0', which breaks
the contract in ids.ts that expects a missing UA to yield empty
deviceId/sessionId; change this to preserve "no UA => no identity" by leaving
requestUa undefined/null when the header is absent (e.g., requestUa =
request.headers['user-agent'] as string | undefined) and ensure downstream uses
of requestUa (the identity generation code that calls into ids.ts) continue to
treat undefined/empty as "no UA" rather than a placeholder.

---

Nitpick comments:
In `@apps/api/src/routes/track-batch.router.test.ts`:
- Around line 204-359: Add a test that verifies the 5-day historical floor by
calling postBatch with one valid track (use validTrack) and one track whose
payload.properties.__timestamp is older than 5 days (e.g., Date.now() -
6*24*60*60*1000), then assert the response is 202, accepted is 1, rejected
length is 1, and the rejected item matches { index: 1, reason: 'validation' }
and its error message matches a timestamp/past/day regex; name the test "rejects
events older than 5 days per-item" and place it alongside the other batch tests
so it exercises the same pipeline (postBatch, validTrack).

In `@apps/api/src/routes/track.router.ts`:
- Line 37: The long single-line OpenAPI description on the route config should
be split across multiple template-literal lines for readability: update the
description property in track.router.ts (the route's description string that
references TRACK_BATCH_MAX_EVENTS) to a multi-line template literal or joined
string segments so the content flows across lines while preserving the embedded
${TRACK_BATCH_MAX_EVENTS} interpolation and exact wording; locate the
description key in the route definition and break the string into readable lines
(or an array .join('')) without changing the text or interpolation semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fd25152-7171-48d5-a4a0-65a06d2b317f

📥 Commits

Reviewing files that changed from the base of the PR and between 6149a20 and 74c229f.

📒 Files selected for processing (4)
  • apps/api/src/controllers/track.controller.ts
  • apps/api/src/routes/track-batch.router.test.ts
  • apps/api/src/routes/track.router.ts
  • packages/validation/src/track.validation.ts
💤 Files with no reviewable changes (1)
  • packages/validation/src/track.validation.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/api/src/controllers/track.controller.ts (2)

594-606: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Map rejected[] to the public response shape instead of returning internal state.

This currently pushes the full BatchItemResult objects into the response, so each rejected item includes status: 'rejected'. apps/api/src/routes/track.router.ts documents the 202 body as { index, reason, error }, so the controller should emit that exact shape instead of relying on serializer behavior.

Suggested fix
   let accepted = 0;
-  const rejected: Extract<BatchItemResult, { status: 'rejected' }>[] = [];
+  const rejected: Array<{
+    index: number;
+    reason: 'validation' | 'internal';
+    error: string;
+  }> = [];
   for (const result of results) {
     if (result.status === 'accepted') {
       accepted += 1;
     } else {
-      rejected.push(result);
+      rejected.push({
+        index: result.index,
+        reason: result.reason,
+        error: result.error,
+      });
     }
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 594 - 606, The
current loop pushes full BatchItemResult objects into the rejected array and
returns them in the 202 response; update the controller (the loop that iterates
over results and the rejected array sent in reply.status(202).send) to map each
rejected item to the public shape { index, reason, error } instead of the
internal object (i.e., when result.status !== 'accepted' push or collect {
index: result.index, reason: result.reason, error: result.error }), then send
that mapped array in the response so it matches the documented contract.

156-158: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the “no user-agent => no identity” behavior.

apps/api/src/utils/ids.ts explicitly returns empty deviceId/sessionId when ua is missing. Replacing a missing header with 'unknown/1.0' here changes that contract and can collapse unrelated server-side traffic behind the same IP into the same device/session.

Suggested fix
 interface SharedRequestContext {
   projectId: string;
   requestIp: string;
-  requestUa: string;
+  requestUa?: string;
   requestHeaders: Record<string, string | undefined>;
   requestGeo: GeoLocation;
   salts: Salts;
 }
@@
-  const requestIp = request.clientIp;
-  const requestUa = request.headers['user-agent'] ?? 'unknown/1.0';
-  const requestHeaders = getStringHeaders(request.headers);
+  const requestIp = request.clientIp;
+  const requestHeaders = getStringHeaders(request.headers);
+  const requestUa = requestHeaders['user-agent'];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 156 - 158, The
code currently forces a default user-agent string by setting requestUa =
request.headers['user-agent'] ?? 'unknown/1.0', which breaks the contract in
ids.ts that expects a missing UA to yield empty deviceId/sessionId; change this
to preserve "no UA => no identity" by leaving requestUa undefined/null when the
header is absent (e.g., requestUa = request.headers['user-agent'] as string |
undefined) and ensure downstream uses of requestUa (the identity generation code
that calls into ids.ts) continue to treat undefined/empty as "no UA" rather than
a placeholder.
🧹 Nitpick comments (2)
apps/api/src/routes/track.router.ts (1)

37-37: 💤 Low value

Consider splitting the long description for readability.

The description string is comprehensive but spans a very long single line. While this works fine for OpenAPI documentation, breaking it into a template literal with multiple lines would improve source readability.

♻️ Optional refactor for readability
-      description: `Ingest a tracking event (track, identify, group, increment, decrement, replay) or a batch of events ({ "type": "batch", "payload": [event, ...] }). Batch requests accept up to ${TRACK_BATCH_MAX_EVENTS} events and 10MB uncompressed per request; each event is dispatched through the same pipeline as a single-event request. Per-event validation failures are returned in the rejected[] array — the whole batch does not fail on a single bad row.`,
+      description: `Ingest a tracking event (track, identify, group, increment, decrement, replay) or a batch of events.
+        Batch requests use { "type": "batch", "payload": [event, ...] } and accept up to ${TRACK_BATCH_MAX_EVENTS} events and 10MB uncompressed per request.
+        Each event is dispatched through the same pipeline as a single-event request.
+        Per-event validation failures are returned in the rejected[] array — the whole batch does not fail on a single bad row.`,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/routes/track.router.ts` at line 37, The long single-line OpenAPI
description on the route config should be split across multiple template-literal
lines for readability: update the description property in track.router.ts (the
route's description string that references TRACK_BATCH_MAX_EVENTS) to a
multi-line template literal or joined string segments so the content flows
across lines while preserving the embedded ${TRACK_BATCH_MAX_EVENTS}
interpolation and exact wording; locate the description key in the route
definition and break the string into readable lines (or an array .join(''))
without changing the text or interpolation semantics.
apps/api/src/routes/track-batch.router.test.ts (1)

204-359: ⚡ Quick win

Consider adding test coverage for the 5-day historical floor.

The test suite covers historical timestamps (two hours ago), but the PR objectives state: "Enforce a 5-day historical acceptance window" with "older events rejected per-row." There's no test verifying that events with timestamps older than 5 days are rejected with a per-item validation error.

📋 Suggested test case
it('rejects events older than 5 days per-item', async () => {
  const sixDaysAgo = Date.now() - 6 * 24 * 60 * 60 * 1000;
  const res = await postBatch([
    validTrack('recent'),
    {
      type: 'track' as const,
      payload: {
        name: 'ancient_event',
        properties: { __timestamp: new Date(sixDaysAgo).toISOString() },
      },
    },
  ]);
  expect(res.statusCode).toBe(202);
  const body = res.json();
  expect(body.accepted).toBe(1);
  expect(body.rejected).toHaveLength(1);
  expect(body.rejected[0]).toMatchObject({
    index: 1,
    reason: 'validation',
  });
  expect(body.rejected[0].error).toMatch(/timestamp|past|day/i);
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/routes/track-batch.router.test.ts` around lines 204 - 359, Add a
test that verifies the 5-day historical floor by calling postBatch with one
valid track (use validTrack) and one track whose payload.properties.__timestamp
is older than 5 days (e.g., Date.now() - 6*24*60*60*1000), then assert the
response is 202, accepted is 1, rejected length is 1, and the rejected item
matches { index: 1, reason: 'validation' } and its error message matches a
timestamp/past/day regex; name the test "rejects events older than 5 days
per-item" and place it alongside the other batch tests so it exercises the same
pipeline (postBatch, validTrack).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/api/src/controllers/track.controller.ts`:
- Around line 109-128: The change removed the 5-day cutoff and now accepts
arbitrarily old __timestamp values; reintroduce a FIVE_DAYS_MS constant (e.g., 5
* 24 * 60 * 60 * 1000) and add an early check using clientTimestampNumber and
safeTimestamp: if clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS, return
the per-item rejection path required by the batch contract (the same
rejection/validation response your controller uses elsewhere) before the
existing ONE_MINUTE_MS and FIFTEEN_MINUTES_MS logic so old events are rejected
instead of persisted.

---

Outside diff comments:
In `@apps/api/src/controllers/track.controller.ts`:
- Around line 594-606: The current loop pushes full BatchItemResult objects into
the rejected array and returns them in the 202 response; update the controller
(the loop that iterates over results and the rejected array sent in
reply.status(202).send) to map each rejected item to the public shape { index,
reason, error } instead of the internal object (i.e., when result.status !==
'accepted' push or collect { index: result.index, reason: result.reason, error:
result.error }), then send that mapped array in the response so it matches the
documented contract.
- Around line 156-158: The code currently forces a default user-agent string by
setting requestUa = request.headers['user-agent'] ?? 'unknown/1.0', which breaks
the contract in ids.ts that expects a missing UA to yield empty
deviceId/sessionId; change this to preserve "no UA => no identity" by leaving
requestUa undefined/null when the header is absent (e.g., requestUa =
request.headers['user-agent'] as string | undefined) and ensure downstream uses
of requestUa (the identity generation code that calls into ids.ts) continue to
treat undefined/empty as "no UA" rather than a placeholder.

---

Nitpick comments:
In `@apps/api/src/routes/track-batch.router.test.ts`:
- Around line 204-359: Add a test that verifies the 5-day historical floor by
calling postBatch with one valid track (use validTrack) and one track whose
payload.properties.__timestamp is older than 5 days (e.g., Date.now() -
6*24*60*60*1000), then assert the response is 202, accepted is 1, rejected
length is 1, and the rejected item matches { index: 1, reason: 'validation' }
and its error message matches a timestamp/past/day regex; name the test "rejects
events older than 5 days per-item" and place it alongside the other batch tests
so it exercises the same pipeline (postBatch, validTrack).

In `@apps/api/src/routes/track.router.ts`:
- Line 37: The long single-line OpenAPI description on the route config should
be split across multiple template-literal lines for readability: update the
description property in track.router.ts (the route's description string that
references TRACK_BATCH_MAX_EVENTS) to a multi-line template literal or joined
string segments so the content flows across lines while preserving the embedded
${TRACK_BATCH_MAX_EVENTS} interpolation and exact wording; locate the
description key in the route definition and break the string into readable lines
(or an array .join('')) without changing the text or interpolation semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fd25152-7171-48d5-a4a0-65a06d2b317f

📥 Commits

Reviewing files that changed from the base of the PR and between 6149a20 and 74c229f.

📒 Files selected for processing (4)
  • apps/api/src/controllers/track.controller.ts
  • apps/api/src/routes/track-batch.router.test.ts
  • apps/api/src/routes/track.router.ts
  • packages/validation/src/track.validation.ts
💤 Files with no reviewable changes (1)
  • packages/validation/src/track.validation.ts
🛑 Comments failed to post (1)
apps/api/src/controllers/track.controller.ts (1)

109-128: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the 5-day rejection path for historical events.

This now accepts any past __timestamp and only marks it as historical after 15 minutes. That breaks the batch contract in this PR and will persist arbitrarily old events instead of rejecting them per item.

Suggested fix
   const ONE_MINUTE_MS = 60 * 1000;
+  const FIVE_DAYS_MS = 5 * 24 * 60 * ONE_MINUTE_MS;
   const FIFTEEN_MINUTES_MS = 15 * ONE_MINUTE_MS;

   // Use safeTimestamp if invalid or more than 1 minute in the future
   if (
     Number.isNaN(clientTimestampNumber) ||
     clientTimestampNumber > safeTimestamp + ONE_MINUTE_MS
   ) {
     return { timestamp: safeTimestamp, isTimestampFromThePast: false };
   }
+
+  if (clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS) {
+    throw new HttpError('Timestamp is older than 5 days', { status: 400 });
+  }

   // isTimestampFromThePast is true only if timestamp is older than 15 minutes
   const isTimestampFromThePast =
     clientTimestampNumber < safeTimestamp - FIFTEEN_MINUTES_MS;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

  // Constants for time validation
  const ONE_MINUTE_MS = 60 * 1000;
  const FIVE_DAYS_MS = 5 * 24 * 60 * ONE_MINUTE_MS;
  const FIFTEEN_MINUTES_MS = 15 * ONE_MINUTE_MS;

  // Use safeTimestamp if invalid or more than 1 minute in the future
  if (
    Number.isNaN(clientTimestampNumber) ||
    clientTimestampNumber > safeTimestamp + ONE_MINUTE_MS
  ) {
    return { timestamp: safeTimestamp, isTimestampFromThePast: false };
  }

  if (clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS) {
    throw new HttpError('Timestamp is older than 5 days', { status: 400 });
  }

  // isTimestampFromThePast is true only if timestamp is older than 15 minutes
  const isTimestampFromThePast =
    clientTimestampNumber < safeTimestamp - FIFTEEN_MINUTES_MS;

  return {
    timestamp: clientTimestampNumber,
    isTimestampFromThePast,
  };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/controllers/track.controller.ts` around lines 109 - 128, The
change removed the 5-day cutoff and now accepts arbitrarily old __timestamp
values; reintroduce a FIVE_DAYS_MS constant (e.g., 5 * 24 * 60 * 60 * 1000) and
add an early check using clientTimestampNumber and safeTimestamp: if
clientTimestampNumber < safeTimestamp - FIVE_DAYS_MS, return the per-item
rejection path required by the batch contract (the same rejection/validation
response your controller uses elsewhere) before the existing ONE_MINUTE_MS and
FIFTEEN_MINUTES_MS logic so old events are rejected instead of persisted.

@mraj602-tohands
Copy link
Copy Markdown
Author

@lindesvard Reworked as requested:

  1. /track/batch is gone — batch now piggybacks on POST /track as { "type": "batch", "payload": [...] }. Single-event requests are untouched (same 200 response); envelopes return 202 with { accepted, rejected[] } for per-row errors.
  2. All worker/incoming-events changes removed — the diff is now 4 API-side files (validation, controller, router, tests). getTimestamp/session semantics are exactly as on main, so your session rework won't collide with this; batched historical events will pick it up automatically since they go through the same per-event pipeline.

Happy to adjust the caps (2000 events / 10 MB) or the response shape if you'd prefer different numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bulk event ingestion (POST /track/batch) for offline-first SDKs — IoT, mobile, edge

3 participants