Skip to content

feat(kiloclaw) Admin instance actions scheduler#3000

Open
St0rmz1 wants to merge 9 commits intomainfrom
kiloclaw-scheduled-actions-pr2
Open

feat(kiloclaw) Admin instance actions scheduler#3000
St0rmz1 wants to merge 9 commits intomainfrom
kiloclaw-scheduled-actions-pr2

Conversation

@St0rmz1
Copy link
Copy Markdown
Contributor

@St0rmz1 St0rmz1 commented May 1, 2026

Summary

Adds a scheduling primitive for kiloclaw admin actions covering two action types: scheduled_restart (the worker DO redeploys on its current image at the chosen time) and version_change (the worker DO redeploys on a chosen image tag, with optional pin override). Three new tables back the feature: kiloclaw_scheduled_actions (parent), kiloclaw_scheduled_action_stages, and kiloclaw_scheduled_action_targets. The DO alarm() handler gains a wedge that queries Postgres for due targets and dispatches per action_type. Admin UI surfaces are: a new Scheduler tab on the kiloclaw dashboard, an upcoming action indicator on the per instance page, a Schedule for later toggle on the bulk Change Version dialog, and the same toggle on the per instance Change Version dialog. The Upgrade to Latest button keeps its immediate semantics (renamed "Upgrade to Latest Now") and now opens a confirmation dialog. Notifications are intentionally not in this PR; the admin tooling says so where it matters.

Architectural notes worth flagging:

  • The DO apply path mirrors the tracked_image_tag sync wedge from PR Kiloclaw tracked image tag #2942: runScheduledActionApply runs inside alarm() via waitUntil, so reconciliation is unaffected when Postgres is unreachable.
  • No new DO storage field, no scheduleAlarm() math. The existing reconcile cadence (5 minutes for running instances, longer for hibernated) is the only timing source. The scheduled time is a "no earlier than" bound, not an exact fire time. Drift is up to one alarm interval and is surfaced to admins in the UI.
  • Wake on schedule push: a new DO method notifyScheduledActionPending() calls scheduleAlarm() to refresh the alarm to a fresh near future timestamp. Exposed via worker route POST /api/platform/scheduled-action/wake. The web scheduleAction mutation fires this best effort after the inserts, batched at 20 concurrent so a 500 instance schedule does not open 500 outbound sockets at once.
  • Concurrency hardening: the conflict check and inserts in scheduleAction run inside one db.transaction(..., { isolationLevel: 'serializable' }) so two admins racing the same instance cannot both succeed. Outcome recording (recordScheduledActionTargetOutcome) wraps the target write and both counter writes in a single transaction so a partial failure rolls back cleanly. The DO apply path claims the target with a pending to running CAS before invoking the side effect so two concurrent waitUntil passes cannot both fire restartMachine.
  • One pending scheduled action per instance: the API rejects with CONFLICT when any selected instance already has a pending action under a parent in scheduled or running status. Bulk schedules silently filter destroyed instances (mirrors the bulkChangeVersion Apply Now path) and reject only if every selected instance is destroyed.
  • Per target cancel: a new endpoint cancelScheduledActionTarget drops one instance from a bulk schedule via atomic CAS on the target row and runs the same promotion sweep as the alarm path so an action whose targets are all individually cancelled does not stay scheduled forever.
  • GDPR: kiloclaw_scheduled_action_targets.user_id references kilocode_users.id. The targets table is retained operational state; softDeleteUser is updated and a coverage test confirms it succeeds when the user owns scheduled action targets.

Verification

  • Scheduled a restart on a dev instance from the Scheduler tab. Watched the wake call land, the alarm fire, the wedge run, the worker actually redeploy.
  • Scheduled a version change on a dev instance from the per instance Change Version dialog (Scheduled tab). Verified the source and target tag rendering, the override pins behavior, and that Apply Now still works alongside.
  • Bulk scheduled from the instances toolbar dialog in Schedule for later mode. Verified every selected instance appears as a target in the Scheduler tab and the parent shows the right total count.
  • Cancelled a single instance schedule from the upcoming action indicator. Confirmed the dialog shows just one option for single instance schedules.
  • Cancelled one instance from a bulk schedule (the dialog offered "Cancel only this instance" and "Cancel entire batch (N instances)"). Confirmed the parent stays running for the remaining targets.
  • Confirm dialog on Upgrade to Latest Now (the previous one click action was easy to fire by accident).
  • Local DB rebuild (docker compose rm -s -f -v postgres, docker volume rm dev_postgres_data, pnpm drizzle migrate) applied every migration cleanly with the regenerated 0109 migration after rebase.
  • 79 router tests pass. Format check, typecheck, and lint all clean.

Visual Changes

Before After
(no Scheduler tab) New Scheduler tab at /admin/kiloclaw?tab=scheduler with a sortable Recent actions table, separate Applied / Skipped / Failed columns, a Run at column, and a per row Detail dialog showing every target with source and target tags.
(no scheduled action awareness on the instance page) Instance Information card always shows an "Upcoming scheduled action" row. Empty state is "None"; otherwise the action type renders in yellow, with run time, source and target tags labelled with OpenClaw versions, and an inline Cancel button.
(immediate only) Change Version dialog gains a Now / Scheduled tab toggle. Bulk Change Version dialog gains Apply now / Schedule for later. Both warn that notifications are not implemented yet.
Upgrade to Latest fires on click Renamed Upgrade to Latest Now and opens a confirmation dialog before firing.
Screenshot 2026-05-01 at 2 10 03 PM Screenshot 2026-05-01 at 2 10 32 PM Screenshot 2026-05-01 at 2 11 01 PM

Reviewer Notes

  • The schedule path relies on the existing reconcile alarm cadence (no new alarm slot) and accepts up to one alarm interval of drift past the chosen time. The Scheduler tab copy says so. Tightening it would require pushing the actual scheduled timestamp into DO storage and modifying scheduleAlarm to pick min(reconcileNextAt, scheduledActionNextAt). Deferred until drift becomes operational pain.
  • The wake fan out to up to 500 DOs is best effort and concurrency capped at 20. A wake failure does not block the schedule; the next user activity will eventually rearm the alarm.
  • Notifications are not in this PR. End user sessions are interrupted with no warning at the scheduled time. The existing immediate bulkChangeVersion path (PR feat(kiloclaw) Admin tools for bulk version deployments #2975) has the same property; a follow up PR will add a notifications framework that makes both paths polite.

Comment thread apps/web/src/routers/admin-kiloclaw-instances-router.ts Outdated
Comment thread services/kiloclaw/src/db/index.ts Outdated
}

try {
const outcome = await dispatchByActionType(target, ctx, db);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Dispatch runs before the target is claimed

Because the external side effect starts before the pending target is atomically claimed, overlapping alarm/waitUntil passes can both process the same due row. One pass can kick off restartMachine, while another records failed or skipped first, leaving the database outcome inconsistent with the real restart/version change. Claim the target before dispatching, or otherwise serialize the apply path per target so only the claimed pass can perform the side effect.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in db68801. Apply path now claims the target with an atomic pending to running CAS via the new claimScheduledActionTarget helper before invoking the dispatcher. If two concurrent waitUntil passes find the same due row, only one wins the claim and proceeds to dispatch; the other gets 0 rows back and skips. Added running to the target status enum (TS only annotation, no migration since the column is text) and relaxed the CAS in recordScheduledActionTargetOutcome to accept either pending or running so both code paths still work.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 1, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Resolved Previous Findings (click to expand)
File Previous Issue Resolution
apps/web/src/routers/admin-kiloclaw-instances-router.ts Concurrent schedule requests could bypass the pending-action conflict check. Fixed by moving the conflict check and inserts into a serializable transaction.
services/kiloclaw/src/db/index.ts Outcome recording updated target status before counters without a transaction. Fixed by recording target outcome and counter updates in one transaction.
services/kiloclaw/src/durable-objects/kiloclaw-instance/scheduled-action-apply.ts Scheduled action dispatch ran before the target was claimed. Fixed by claiming the target with a pending-to-running CAS before dispatch.
Files Reviewed (20 files)
  • apps/web/src/app/admin/components/KiloclawDashboard.tsx
  • apps/web/src/app/admin/components/KiloclawInstances/BulkChangeVersionDialog.tsx
  • apps/web/src/app/admin/components/KiloclawInstances/KiloclawInstanceDetail.tsx
  • apps/web/src/app/admin/components/KiloclawScheduler/KiloclawSchedulerTab.tsx
  • apps/web/src/lib/kiloclaw/kiloclaw-internal-client.ts
  • apps/web/src/lib/user.test.ts
  • apps/web/src/lib/user.ts
  • apps/web/src/routers/admin-kiloclaw-instances-router.test.ts
  • apps/web/src/routers/admin-kiloclaw-instances-router.ts
  • packages/db/src/migrations/0109_conscious_kingpin.sql
  • packages/db/src/migrations/meta/0109_snapshot.json
  • packages/db/src/migrations/meta/_journal.json
  • packages/db/src/schema-types.ts
  • packages/db/src/schema.ts
  • services/kiloclaw/src/db/index.ts
  • services/kiloclaw/src/durable-objects/kiloclaw-instance.test.ts
  • services/kiloclaw/src/durable-objects/kiloclaw-instance/index.ts
  • services/kiloclaw/src/durable-objects/kiloclaw-instance/scheduled-action-apply.ts
  • services/kiloclaw/src/durable-objects/kiloclaw-instance/version-change-apply.ts
  • services/kiloclaw/src/routes/platform.ts

Reviewed by gpt-5.5-20260423 · 3,686,939 tokens

@St0rmz1 St0rmz1 changed the title Kiloclaw scheduled actions pr2 feat(kiloclaw) Admin instance actions scheduler May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant