feat(agent-studio): Phase 1 scaffold — Playbooks + Blueprints by padak · Pull Request #329 · keboola/cli

padak · 2026-05-22T12:04:43Z

Draft. First, reviewable chunk of Agent Studio — the Playbook-first
agentic surface for kbagent serve. Ships the documentation foundation
plus four vertical UI/API slices. Run loop, Tool Broker, and governance
are deliberately out of scope here and tracked in #327.

Documentation foundation

docs/agents-v2.md — Playbook-first PRD that supersedes the
heavyweight Team/Role/WorkItem v1. Closes every blocking finding from
docs/agents-review.md (budget caps in MVP, scoped per-run JWTs,
stable /v1 API contract, body_hash + 5s undo, untrusted wrapping,
approval expires_at/scope).
docs/agent-studio-design-system.md — canonical NERD UI spec
(light mode primary, dark secondary). Single source of truth for the
visual contract; the new pages reuse only existing .nerd-*
primitives.
docs/mockups/ — 6 light primary screens + 6 dark backups.
docs/agent-studio-progress.md — cross-session build tracker.

Code slices

1 — Scaffold

agent_studio/models/playbook.py: minimal Playbook shape (§ 7).
agent_studio/storage.py: YAML persistence under
<config_dir>/playbooks/, 0600 files / 0700 dir, atomic
temp-then-rename, corrupt-YAML-tolerant list.
/v1/agent-studio/playbooks CRUD router, registered in create_app.
Playbooks library page (sidebar entry under AI / Tools), 3-col
.nerd-card grid, TwoPathEmpty empty state, New Playbook modal.

1.2 — Detail Drawer

Click a card → right-side Drawer with description / connections /
skills / plugins / triggers (JSON) / timestamps. Two-step Delete.

2.a — PlaybookRun stub

PlaybookRun model + runs/ storage. POST /{id}/run creates a run
and marks it done immediately (no real execution yet — proves the
data flow). GET /runs[?playbook_id=X] + GET /runs/{id}. Drawer
gains a Run button + Recent Runs section.

2.5 — Blueprints catalogue

Read-only catalogue (9 designed cards) from a static in-code seed.
GET /blueprints[?category=X], GET /{id}, POST /{id}/fork
(mints a draft Playbook). Blueprints page with category filter +
search; "Use this blueprint" forks and navigates to the library.

Out of scope (tracked in #327)

Real run loop (subprocess via agent_runner.py), Tool Broker + scoped
JWTs, budget enforcer, approval queue + body_hash + 5s undo,
untrusted-content wrapping, skill loader, connection auto-discovery,
data-cleanup native plugin. Also: live browser QA of the three
new surfaces (so far verified via HTTP TestClient + tsc + vite build
only).

Test plan

79 backend tests green (47 new agent-studio: model / storage /
router / run / blueprint / fork)
ruff check + ruff format --check + ty check clean
tsc --noEmit + vite build clean
Live browser QA — create → run → fork click-through (issue Agent Studio Phase 1 — remaining slices (run loop, Tool Broker, governance, plugins) #327, item 0)
Wire the real run loop before this leaves Draft

Closes nothing yet — keeps #327 open as the umbrella for the rest of
Phase 1.

apps/ scaffolding (use-case apps over kbagent serve)

Infrastructure for AI-generated, use-case-specific apps that run inside
kbagent serve --ui, separate from the Playbook runtime:

OpenAPI type pipeline — scripts/dump_openapi.py builds the
FastAPI app in-memory and dumps the schema; make web-gen-types feeds
it to openapi-typescript → web/frontend/src/api/generated.ts.
make web-types-check guards drift.
apps/ convention — drop an app under web/frontend/src/apps/<slug>/,
export an AppManifest; _registry.tsx (import.meta.glob) wires it
into Router + Sidebar automatically. No manual routing.
Local-AI helper — api/ai.ts askLocalAi() wraps
POST /ai/chat/stream (claude/codex/gemini). Apps default to the
user's local CLI — no master token, unlike hosted Kai.
Two reference apps: morning-brief (Dashboard archetype —
cross-project job cost outliers) and type-inspector (Inspector
archetype — per-column profiling + AI type proposals + Playbook stub).
Skill build-app-over-kbagent-serve.md — guide for an AI agent to
scaffold a new app (conventions, NERD UI primitives, gotchas).

Tests: morning-brief compute (8) + type-inspector profile/ai_parse (18).

Adds docs/agents-review.md -- structured review of the Agent Teams PRD (docs/agents.md, merged in PR #305) with findings classified by severity (blocking / non-blocking / nit), a "Stable API surface" gap analysis against the current 24-router serve layer, and a recommended edit sequence. Companion document to the PRD, used as the basis for the upcoming personal-AI-agents feature work on this branch.

…ss tracker Lays the documentation foundation for the Agent Studio Phase 1 effort: - docs/agents-v2.md — Playbook-first PRD that supersedes the heavyweight Team/Role/WorkItem v1. Closes every blocking finding from docs/agents-review.md: budget caps in MVP, scoped per-run JWTs, stable API contract, body_hash + 5s undo on external_send, untrusted-content wrapping, expires_at + scope on approvals. - docs/agent-studio-design-system.md — canonical NERD UI spec, single source of truth for visual contract. Light mode primary, dark secondary. Reference implementation runs at http://127.0.0.1:8001/ via `kbagent serve --ui`. - docs/mockups/ — 6 light primary screens (conditioning approach via Playwright reference + nano-banana edit mode) + 6 dark secondary backups. README documents the regen workflow. - docs/agent-studio-progress.md — persistent cross-session tracker for the Phase 1 build. New chat sessions can pick up from this file. Customer-validated workflow (product-cost-allocation Solution) drove five v2 updates: - §9.3 xlsx-renderer added to first-party tools - §18 6th Solution product-cost-allocation (Finance Ops) with full spec - §21 Phase 2 promoted basic view scoping (created_by + allowed_users) - §21 Phase 1 acceptance criterion now includes the controller-handoff scenario - §24 Open Q #5 split (view scoping = Phase 2 done, approval routing still Phase 5+) - §26 Appendix E "Deployment Patterns" added (local / single-server shared-team / future SaaS)

…+ /v1 router + UI library page First vertical slice of the Agent Studio Phase 1 plan from docs/agents-v2.md § 21. Goal of the slice: user opens `kbagent serve --ui`, clicks "Playbooks" in the sidebar, sees a library of Playbook cards loaded from real YAML files on disk, can create a new draft. No run loop yet, no Tool Broker yet — those land in their own slices. Backend (src/keboola_agent_cli/agent_studio/): - models/playbook.py: minimal Pydantic shape per § 7 (id, name, description, revision, enabled, status, timestamps + opaque placeholders for connections/skills/plugins/triggers so the on-disk YAML stays forward-compatible with later slices). - storage.py: YAML load/save with 0600 file perms + 0700 dir, atomic temp-then-rename writes, corrupt-YAML tolerant list(). - routers/agent_studio_playbooks.py: GET list / GET detail / POST create / DELETE under `/v1/agent-studio/playbooks` (the stable surface defined in § 19.2). Server stamps id/timestamps so the client cannot smuggle them in. - server/__init__.py: register the router in `create_app`. Tests (tests/test_playbook_*.py — 27 new tests): - test_playbook_model.py: Pydantic validation + status enum + Summary projection. - test_playbook_storage.py: 0600 perms, 0700 dir, round-trip, atomic write, corrupt-YAML skip, deterministic sort. - test_playbook_router.py: auth, CRUD round-trip, 404s, OpenAPI registration guard. Frontend (web/frontend/src/): - pages/Playbooks.tsx: library page wired to /v1/agent-studio/playbooks with the 3-col `.nerd-card` grid + TwoPathEmpty empty state + "New Playbook" modal. Renders the design system primitives only (.nerd-card, .nerd-btn, .nerd-input, .nerd-pill-*) — no new CSS. - state.tsx: extend BuiltinPageId with "playbooks". - layout/Sidebar.tsx: add Playbooks under AI / Tools (BookOpen icon). - App.tsx: add `case "playbooks"` to the Router switch. Also (related but coupled): - web/frontend/index.html: anti-FOUC bootstrap defaults to light per the design system pivot (a user whose OS pref is dark still lands in dark). - web/frontend/src/apps/: lands the in-flight dynamic-apps registry that Sidebar.tsx + App.tsx + state.tsx now depend on — required to compile, would otherwise break tsc with missing AppPageId / findApp / isAppPageId / slugFromAppPageId symbols. Sample app (morning-brief) keeps the registry exercised. What's NOT in this commit (deliberately): - Tool Broker, scoped JWTs, budget enforcer, approval queue, untrusted-content wrapping — all queued for follow-up slices. - A "Blueprints" page — phase 2. - Sample Playbook YAMLs — the empty-state TwoPathEmpty is the on-ramp; pre-shipping fake data felt worse for first-run UX. - Migration of existing AgentTask state — `AgentTask` keeps the full bearer token (§ 23), Playbook runs will get scoped JWTs when the run loop wires up. ruff check + ruff format + ty check + pytest all green (27 new tests + existing server smoke).

The progress tracker now reflects what landed in the two prior commits (scaffold + design docs) and explicitly enumerates the next 9 slices that turn the scaffold into "Phase 1 acceptance criteria met" per docs/agents-v2.md § 21. Order of the next 9 is "what unblocks the most downstream work": the Playbook detail Drawer (frontend-only, easy first follow-up) ahead of the run loop ahead of Tool Broker → budget enforcer → approval queue → untrusted wrapping → skill loader → connection discovery → data-cleanup plugin.

Wires PlaybookCard onto a right-side Drawer that fetches the full Playbook from GET /v1/agent-studio/playbooks/{id}. Body sections mirror the Pydantic shape from docs/agents-v2.md § 7: - Status pill + enabled/disabled pill - Description (italic placeholder when null) - Connections / Skills / Plugins lists rendered as outlined mono pills, with an italic "None — set in a later slice" empty state. - Triggers list rendered as .nerd-code JSON blocks so the opaque config shape (typed in Phase 2) is still legible. - Created / Updated timestamps localised via toLocaleString, keeping UTC ISO-8601 on disk for audit consumers. Drawer actions surface a Delete button (red-on-hover) that pops a two-step confirm modal — the modal calls out the on-disk path so the user knows exactly what the destructive operation touches. Creating a new Playbook now auto-opens its drawer (was: drop user back on the library and make them find the new row). No backend changes — GET detail + DELETE endpoints already exist from the Phase 1 scaffold. tsc clean, 27 backend tests still green.

Progress tracker now points at slice 2 (run loop tied into server/agent_runner.py) as the next priority.

…s + UI Run button Slice 2.a of the run loop. The "Run" button now produces a real (stub) run record the user can see; real subprocess execution lands in slice 2.b. Backend: - models/playbook_run.py: minimal PlaybookRun (id, playbook_id, playbook_revision, status, started_at, ended_at, summary, objective_override). Cost/token/workspace/SSE-log fields per § 7 arrive with the real run loop. - storage.py: generalised the YAML load/save helpers (_safe_load is now generic over the model type via PEP 695 `[T: BaseModel]`), added runs_dir / list_runs (newest-first, optional playbook_id filter) / get_run / save_run. Same 0600 file + 0700 dir perms. - routers/agent_studio_playbooks.py: POST /{id}/run — stub that creates a run, marks it `done` immediately with a clear summary, propagates an optional objective_override from the body. - routers/agent_studio_runs.py: GET /v1/agent-studio/runs [?playbook_id=X] + GET /v1/agent-studio/runs/{run_id}. - server/__init__.py: register the runs router. - agent_studio + models __init__: export PlaybookRun (the exports were missed in the scaffold commit because the Write hit a not-yet-read guard; functionality was unaffected since storage + routers import the concrete module path directly). Frontend (Playbooks.tsx): - Drawer header gains a Run button (keboola-hover) beside Delete. - New Recent Runs section in the drawer body: status pill + short run id + start time + computed duration, truncated to the last 5 with a "+ N earlier runs" marker pointing at the future Past Jobs tab. Polls every 10s like the library. - Running a Playbook invalidates both the run list and the library query so the card status pill stays in sync. Tests: +21 (model 4, storage 7, router 10). Full Playbook + run + server smoke suite = 63 green. ruff + ty + tsc clean.

Infrastructure for use-case-specific apps that run inside `kbagent serve --ui`, alongside the morning-brief reference app. - OpenAPI type pipeline: scripts/dump_openapi.py builds the FastAPI app in-memory (no uvicorn boot) and dumps the schema; `make web-gen-types` feeds it to openapi-typescript -> web/frontend/src/api/generated.ts. `make web-types-check` guards drift the same way skill-check does. - web/frontend/src/api/ai.ts: askLocalAi() wraps POST /ai/chat/stream. Apps default to the user's local claude/codex/gemini install -- no master token, unlike hosted Kai (/kai/ask), which most users lack. - apps/type-inspector: reference Inspector-archetype app. Profiles a Storage table per column (null %, distinct, inferred type, samples), asks the local AI for native-type proposals, approve/edit per column. The destructive table-swap step is a documented Playbook stub -- apps produce the typed column list; a Playbook executes the swap. - build-app-over-kbagent-serve.md skill: the guide an AI agent reads to scaffold a new app -- apps/ convention, NERD UI primitives, typed client usage, local-AI invocation patterns, and the gotchas hit while building (response envelopes, vite-env.d.ts, --ui-dist override). Tests: 18 vitest cases (profile + ai_parse). TypeScript clean.

Slice 1.4. Second visible surface, matching docs/mockups/02-blueprints-catalog.png. Backend: - models/blueprint.py: Blueprint shape (id, name, category, description, systems, connections, skills, plugins) + BLUEPRINT_CATEGORIES tuple driving the filter chips. - blueprints_catalog.py: static in-code seed of the 9 designed cards. v2 §11/§12 wants these as YAML data files for a marketplace eventually; in-code keeps Phase 1 dependency-free and uncorruptable. list_blueprints(category) + get_blueprint(id). - routers/agent_studio_blueprints.py: GET /v1/agent-studio/blueprints[?category=X], GET /{id}, POST /{id}/fork. Fork mints a draft Playbook prefilled with the blueprint's connections/skills/plugins (the parts the current Playbook model can carry; SOP/budget/approval arrive when those substructures exist). - server/__init__.py: register the blueprints router. Frontend: - pages/Blueprints.tsx: category filter row (active chip = keboola-green outline) + search + 3-col card grid. "Use this blueprint" forks then navigates to the Playbooks library. - state.tsx / Sidebar.tsx / App.tsx: new "blueprints" PageId, sidebar entry (LayoutGrid icon, under AI / Tools after Playbooks), route. - Playbooks.tsx: the empty-state "Browse Blueprints" button is now wired to navigate (was disabled "Phase 2" placeholder). Tests: +16 (catalogue 8, router/fork 8). Full agent-studio + smoke suite = 79 green. ruff + ty + tsc + vite build all clean.

padak added 9 commits May 18, 2026 09:10

docs(agent-studio): mark slice 1.2 (Playbook detail Drawer) done

0ea46fe

Progress tracker now points at slice 2 (run loop tied into server/agent_runner.py) as the next priority.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent-studio): Phase 1 scaffold — Playbooks + Blueprints#329

feat(agent-studio): Phase 1 scaffold — Playbooks + Blueprints#329
padak wants to merge 9 commits into
mainfrom
feat/personal-ai-agents

padak commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

padak commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation foundation

Code slices

1 — Scaffold

1.2 — Detail Drawer

2.a — PlaybookRun stub

2.5 — Blueprints catalogue

Out of scope (tracked in #327)

Test plan

apps/ scaffolding (use-case apps over kbagent serve)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

padak commented May 22, 2026 •

edited

Loading