Skip to content

feat(agent-studio): Phase 1 scaffold — Playbooks + Blueprints#329

Draft
padak wants to merge 9 commits into
mainfrom
feat/personal-ai-agents
Draft

feat(agent-studio): Phase 1 scaffold — Playbooks + Blueprints#329
padak wants to merge 9 commits into
mainfrom
feat/personal-ai-agents

Conversation

@padak
Copy link
Copy Markdown
Member

@padak padak commented May 22, 2026

Draft. First, reviewable chunk of Agent Studio — the Playbook-first
agentic surface for kbagent serve. Ships the documentation foundation
plus four vertical UI/API slices. Run loop, Tool Broker, and governance
are deliberately out of scope here and tracked in #327.

Documentation foundation

  • docs/agents-v2.md — Playbook-first PRD that supersedes the
    heavyweight Team/Role/WorkItem v1. Closes every blocking finding from
    docs/agents-review.md (budget caps in MVP, scoped per-run JWTs,
    stable /v1 API contract, body_hash + 5s undo, untrusted wrapping,
    approval expires_at/scope).
  • docs/agent-studio-design-system.md — canonical NERD UI spec
    (light mode primary, dark secondary). Single source of truth for the
    visual contract; the new pages reuse only existing .nerd-*
    primitives.
  • docs/mockups/ — 6 light primary screens + 6 dark backups.
  • docs/agent-studio-progress.md — cross-session build tracker.

Code slices

1 — Scaffold

  • agent_studio/models/playbook.py: minimal Playbook shape (§ 7).
  • agent_studio/storage.py: YAML persistence under
    <config_dir>/playbooks/, 0600 files / 0700 dir, atomic
    temp-then-rename, corrupt-YAML-tolerant list.
  • /v1/agent-studio/playbooks CRUD router, registered in create_app.
  • Playbooks library page (sidebar entry under AI / Tools), 3-col
    .nerd-card grid, TwoPathEmpty empty state, New Playbook modal.

1.2 — Detail Drawer

  • Click a card → right-side Drawer with description / connections /
    skills / plugins / triggers (JSON) / timestamps. Two-step Delete.

2.a — PlaybookRun stub

  • PlaybookRun model + runs/ storage. POST /{id}/run creates a run
    and marks it done immediately (no real execution yet — proves the
    data flow). GET /runs[?playbook_id=X] + GET /runs/{id}. Drawer
    gains a Run button + Recent Runs section.

2.5 — Blueprints catalogue

  • Read-only catalogue (9 designed cards) from a static in-code seed.
    GET /blueprints[?category=X], GET /{id}, POST /{id}/fork
    (mints a draft Playbook). Blueprints page with category filter +
    search; "Use this blueprint" forks and navigates to the library.

Out of scope (tracked in #327)

Real run loop (subprocess via agent_runner.py), Tool Broker + scoped
JWTs, budget enforcer, approval queue + body_hash + 5s undo,
untrusted-content wrapping, skill loader, connection auto-discovery,
data-cleanup native plugin. Also: live browser QA of the three
new surfaces (so far verified via HTTP TestClient + tsc + vite build
only).

Test plan

Closes nothing yet — keeps #327 open as the umbrella for the rest of
Phase 1.


apps/ scaffolding (use-case apps over kbagent serve)

Infrastructure for AI-generated, use-case-specific apps that run inside
kbagent serve --ui, separate from the Playbook runtime:

  • OpenAPI type pipelinescripts/dump_openapi.py builds the
    FastAPI app in-memory and dumps the schema; make web-gen-types feeds
    it to openapi-typescript → web/frontend/src/api/generated.ts.
    make web-types-check guards drift.
  • apps/ convention — drop an app under web/frontend/src/apps/<slug>/,
    export an AppManifest; _registry.tsx (import.meta.glob) wires it
    into Router + Sidebar automatically. No manual routing.
  • Local-AI helperapi/ai.ts askLocalAi() wraps
    POST /ai/chat/stream (claude/codex/gemini). Apps default to the
    user's local CLI — no master token, unlike hosted Kai.
  • Two reference apps: morning-brief (Dashboard archetype —
    cross-project job cost outliers) and type-inspector (Inspector
    archetype — per-column profiling + AI type proposals + Playbook stub).
  • Skill build-app-over-kbagent-serve.md — guide for an AI agent to
    scaffold a new app (conventions, NERD UI primitives, gotchas).

Tests: morning-brief compute (8) + type-inspector profile/ai_parse (18).

padak added 9 commits May 18, 2026 09:10
Adds docs/agents-review.md -- structured review of the Agent Teams PRD
(docs/agents.md, merged in PR #305) with findings classified by
severity (blocking / non-blocking / nit), a "Stable API surface" gap
analysis against the current 24-router serve layer, and a recommended
edit sequence. Companion document to the PRD, used as the basis for
the upcoming personal-AI-agents feature work on this branch.
…ss tracker

Lays the documentation foundation for the Agent Studio Phase 1 effort:

- docs/agents-v2.md — Playbook-first PRD that supersedes the heavyweight
  Team/Role/WorkItem v1. Closes every blocking finding from
  docs/agents-review.md: budget caps in MVP, scoped per-run JWTs,
  stable API contract, body_hash + 5s undo on external_send,
  untrusted-content wrapping, expires_at + scope on approvals.

- docs/agent-studio-design-system.md — canonical NERD UI spec, single
  source of truth for visual contract. Light mode primary, dark
  secondary. Reference implementation runs at http://127.0.0.1:8001/
  via `kbagent serve --ui`.

- docs/mockups/ — 6 light primary screens (conditioning approach
  via Playwright reference + nano-banana edit mode) + 6 dark
  secondary backups. README documents the regen workflow.

- docs/agent-studio-progress.md — persistent cross-session tracker for
  the Phase 1 build. New chat sessions can pick up from this file.

Customer-validated workflow (product-cost-allocation Solution) drove
five v2 updates:
- §9.3 xlsx-renderer added to first-party tools
- §18 6th Solution product-cost-allocation (Finance Ops) with full spec
- §21 Phase 2 promoted basic view scoping (created_by + allowed_users)
- §21 Phase 1 acceptance criterion now includes the controller-handoff
  scenario
- §24 Open Q #5 split (view scoping = Phase 2 done, approval routing
  still Phase 5+)
- §26 Appendix E "Deployment Patterns" added (local / single-server
  shared-team / future SaaS)
…+ /v1 router + UI library page

First vertical slice of the Agent Studio Phase 1 plan from
docs/agents-v2.md § 21. Goal of the slice: user opens
`kbagent serve --ui`, clicks "Playbooks" in the sidebar, sees a
library of Playbook cards loaded from real YAML files on disk, can
create a new draft. No run loop yet, no Tool Broker yet — those
land in their own slices.

Backend (src/keboola_agent_cli/agent_studio/):
- models/playbook.py: minimal Pydantic shape per § 7 (id, name,
  description, revision, enabled, status, timestamps + opaque
  placeholders for connections/skills/plugins/triggers so the
  on-disk YAML stays forward-compatible with later slices).
- storage.py: YAML load/save with 0600 file perms + 0700 dir,
  atomic temp-then-rename writes, corrupt-YAML tolerant list().
- routers/agent_studio_playbooks.py: GET list / GET detail / POST
  create / DELETE under `/v1/agent-studio/playbooks` (the stable
  surface defined in § 19.2). Server stamps id/timestamps so the
  client cannot smuggle them in.
- server/__init__.py: register the router in `create_app`.

Tests (tests/test_playbook_*.py — 27 new tests):
- test_playbook_model.py: Pydantic validation + status enum + Summary
  projection.
- test_playbook_storage.py: 0600 perms, 0700 dir, round-trip,
  atomic write, corrupt-YAML skip, deterministic sort.
- test_playbook_router.py: auth, CRUD round-trip, 404s, OpenAPI
  registration guard.

Frontend (web/frontend/src/):
- pages/Playbooks.tsx: library page wired to /v1/agent-studio/playbooks
  with the 3-col `.nerd-card` grid + TwoPathEmpty empty state +
  "New Playbook" modal. Renders the design system primitives only
  (.nerd-card, .nerd-btn, .nerd-input, .nerd-pill-*) — no new CSS.
- state.tsx: extend BuiltinPageId with "playbooks".
- layout/Sidebar.tsx: add Playbooks under AI / Tools (BookOpen icon).
- App.tsx: add `case "playbooks"` to the Router switch.

Also (related but coupled):
- web/frontend/index.html: anti-FOUC bootstrap defaults to light
  per the design system pivot (a user whose OS pref is dark still
  lands in dark).
- web/frontend/src/apps/: lands the in-flight dynamic-apps registry
  that Sidebar.tsx + App.tsx + state.tsx now depend on — required
  to compile, would otherwise break tsc with missing AppPageId /
  findApp / isAppPageId / slugFromAppPageId symbols. Sample app
  (morning-brief) keeps the registry exercised.

What's NOT in this commit (deliberately):
- Tool Broker, scoped JWTs, budget enforcer, approval queue,
  untrusted-content wrapping — all queued for follow-up slices.
- A "Blueprints" page — phase 2.
- Sample Playbook YAMLs — the empty-state TwoPathEmpty is the
  on-ramp; pre-shipping fake data felt worse for first-run UX.
- Migration of existing AgentTask state — `AgentTask` keeps the
  full bearer token (§ 23), Playbook runs will get scoped JWTs
  when the run loop wires up.

ruff check + ruff format + ty check + pytest all green
(27 new tests + existing server smoke).
The progress tracker now reflects what landed in the two prior commits
(scaffold + design docs) and explicitly enumerates the next 9 slices
that turn the scaffold into "Phase 1 acceptance criteria met" per
docs/agents-v2.md § 21.

Order of the next 9 is "what unblocks the most downstream work": the
Playbook detail Drawer (frontend-only, easy first follow-up) ahead of
the run loop ahead of Tool Broker → budget enforcer → approval queue →
untrusted wrapping → skill loader → connection discovery →
data-cleanup plugin.
Wires PlaybookCard onto a right-side Drawer that fetches the full
Playbook from GET /v1/agent-studio/playbooks/{id}. Body sections
mirror the Pydantic shape from docs/agents-v2.md § 7:

- Status pill + enabled/disabled pill
- Description (italic placeholder when null)
- Connections / Skills / Plugins lists rendered as outlined
  mono pills, with an italic "None — set in a later slice"
  empty state.
- Triggers list rendered as .nerd-code JSON blocks so the opaque
  config shape (typed in Phase 2) is still legible.
- Created / Updated timestamps localised via toLocaleString,
  keeping UTC ISO-8601 on disk for audit consumers.

Drawer actions surface a Delete button (red-on-hover) that pops a
two-step confirm modal — the modal calls out the on-disk path so
the user knows exactly what the destructive operation touches.

Creating a new Playbook now auto-opens its drawer (was: drop user
back on the library and make them find the new row).

No backend changes — GET detail + DELETE endpoints already exist
from the Phase 1 scaffold. tsc clean, 27 backend tests still green.
Progress tracker now points at slice 2 (run loop tied into
server/agent_runner.py) as the next priority.
…s + UI Run button

Slice 2.a of the run loop. The "Run" button now produces a real
(stub) run record the user can see; real subprocess execution lands
in slice 2.b.

Backend:
- models/playbook_run.py: minimal PlaybookRun (id, playbook_id,
  playbook_revision, status, started_at, ended_at, summary,
  objective_override). Cost/token/workspace/SSE-log fields per § 7
  arrive with the real run loop.
- storage.py: generalised the YAML load/save helpers (_safe_load is
  now generic over the model type via PEP 695 `[T: BaseModel]`),
  added runs_dir / list_runs (newest-first, optional playbook_id
  filter) / get_run / save_run. Same 0600 file + 0700 dir perms.
- routers/agent_studio_playbooks.py: POST /{id}/run — stub that
  creates a run, marks it `done` immediately with a clear summary,
  propagates an optional objective_override from the body.
- routers/agent_studio_runs.py: GET /v1/agent-studio/runs
  [?playbook_id=X] + GET /v1/agent-studio/runs/{run_id}.
- server/__init__.py: register the runs router.
- agent_studio + models __init__: export PlaybookRun (the exports
  were missed in the scaffold commit because the Write hit a
  not-yet-read guard; functionality was unaffected since storage +
  routers import the concrete module path directly).

Frontend (Playbooks.tsx):
- Drawer header gains a Run button (keboola-hover) beside Delete.
- New Recent Runs section in the drawer body: status pill + short
  run id + start time + computed duration, truncated to the last 5
  with a "+ N earlier runs" marker pointing at the future Past Jobs
  tab. Polls every 10s like the library.
- Running a Playbook invalidates both the run list and the library
  query so the card status pill stays in sync.

Tests: +21 (model 4, storage 7, router 10). Full Playbook + run +
server smoke suite = 63 green. ruff + ty + tsc clean.
Infrastructure for use-case-specific apps that run inside
`kbagent serve --ui`, alongside the morning-brief reference app.

- OpenAPI type pipeline: scripts/dump_openapi.py builds the FastAPI app
  in-memory (no uvicorn boot) and dumps the schema; `make web-gen-types`
  feeds it to openapi-typescript -> web/frontend/src/api/generated.ts.
  `make web-types-check` guards drift the same way skill-check does.
- web/frontend/src/api/ai.ts: askLocalAi() wraps POST /ai/chat/stream.
  Apps default to the user's local claude/codex/gemini install -- no
  master token, unlike hosted Kai (/kai/ask), which most users lack.
- apps/type-inspector: reference Inspector-archetype app. Profiles a
  Storage table per column (null %, distinct, inferred type, samples),
  asks the local AI for native-type proposals, approve/edit per column.
  The destructive table-swap step is a documented Playbook stub -- apps
  produce the typed column list; a Playbook executes the swap.
- build-app-over-kbagent-serve.md skill: the guide an AI agent reads to
  scaffold a new app -- apps/ convention, NERD UI primitives, typed
  client usage, local-AI invocation patterns, and the gotchas hit while
  building (response envelopes, vite-env.d.ts, --ui-dist override).

Tests: 18 vitest cases (profile + ai_parse). TypeScript clean.
Slice 1.4. Second visible surface, matching
docs/mockups/02-blueprints-catalog.png.

Backend:
- models/blueprint.py: Blueprint shape (id, name, category,
  description, systems, connections, skills, plugins) +
  BLUEPRINT_CATEGORIES tuple driving the filter chips.
- blueprints_catalog.py: static in-code seed of the 9 designed
  cards. v2 §11/§12 wants these as YAML data files for a
  marketplace eventually; in-code keeps Phase 1 dependency-free
  and uncorruptable. list_blueprints(category) + get_blueprint(id).
- routers/agent_studio_blueprints.py:
  GET /v1/agent-studio/blueprints[?category=X],
  GET /{id}, POST /{id}/fork. Fork mints a draft Playbook prefilled
  with the blueprint's connections/skills/plugins (the parts the
  current Playbook model can carry; SOP/budget/approval arrive when
  those substructures exist).
- server/__init__.py: register the blueprints router.

Frontend:
- pages/Blueprints.tsx: category filter row (active chip =
  keboola-green outline) + search + 3-col card grid. "Use this
  blueprint" forks then navigates to the Playbooks library.
- state.tsx / Sidebar.tsx / App.tsx: new "blueprints" PageId,
  sidebar entry (LayoutGrid icon, under AI / Tools after Playbooks),
  route.
- Playbooks.tsx: the empty-state "Browse Blueprints" button is now
  wired to navigate (was disabled "Phase 2" placeholder).

Tests: +16 (catalogue 8, router/fork 8). Full agent-studio + smoke
suite = 79 green. ruff + ty + tsc + vite build all clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant