docgen demo-function renders one purpose-built tutorial MP4 per scenario
— the short-form counterpart to docgen generate-all, which stitches
multi-segment demos from docgen.yaml. Inputs are Playwright-shaped (Node
spec, declarative url + actions, or a Python @pytest.mark.docgen marker
parsed with ast only). demonstration.kind: cli (VHS) is legacy;
see AGENTS.md (north star: story mode vs truth-from-tests).
For the CLI quick start and copy-paste invocation, see README § Per-function video docs. This page is the schema, pipeline, and behavior reference.
| What | Where |
|---|---|
| Canonical manifest + HTML fixture | docs/demos/per-function/lesson-compile.docgen.yaml and lesson-compile.html (file://__FIXTURE__/… is rewritten to a real path by the rebuild script) |
| Rebuild all per-function demos | docs/demos/_rebuild-per-function.sh (requires OPENAI_API_KEY; outputs under docs/demos/recordings/per-function/) |
| Copies for readers / docs links | examples/lesson_compile.docgen.yaml and examples/sample_test.py, regenerated by docs/demos/_seed-examples.sh (also at the end of _full-reset-regenerate.sh) |
| Long-form segment explaining the same subcommand | Segment 08 in docs/demos/docgen.yaml, rendered by generate-all |
Do not edit the examples/ copies by hand; change docs/demos/per-function/ and re-run _seed-examples.sh.
End-to-end tests: tests/e2e/test_demo_function_e2e.py.
The main goal of per-function work is to turn Playwright-shaped behavior into tutorial / demo video clips: one test (or one manifest that mirrors the same steps as a test) becomes one watchable walkthrough. That keeps the demo tied to something you already trust — green Playwright coverage — instead of a hand-maintained script that drifts from reality.
Typical authoring paths (all Playwright under the hood unless noted):
- Recorded
@playwright/testspec — sibling*.docgen.yamlor in-testdocgenannotation; docgen runsnpx playwright testwith--grepso the real Node test is the source of the recording. - Declarative
demonstration.kind: playwright— YAML listsurl+actions(click, type, wait, …); Chromium records the same flow without a separate capture script. Use this for static HTML fixtures (file://...) or when you want the manifest to read like a test outline. @pytest.mark.docgen— Python tests with the marker are read statically (ast); Playwright still drives capture for browser demos.
demonstration.kind: cli (VHS .tape inside demo-function) is legacy, same possible deprecation as docgen vhs in long-form demos. Do not build new product docs on VHS if you can use Playwright (UI) or Manim (explainer). Migrate existing tapes when practical.
- In this repository
- Pipeline overview
- Manifest schema
- Per-action narration sync (
say) - Slowdown (
playback_speed_factor) - Captured timeline shape
- Action kinds
- Output artifacts
- Caching
- Fail modes & exit codes
- CLI reference
For kind: playwright, capture is always Playwright (Chromium) —
whether the manifest came from a recorded Node test, declarative actions,
or a pytest marker. The diagram below is the Playwright path;
kind: cli swaps the first stage for VHS (legacy; see above).
manifest (YAML / Playwright spec / @pytest.mark.docgen)
│
▼
Playwright Chromium ──── records visual.webm + timeline.json
│ (one entry per action: kind, say,
│ t_start_ms, t_end_ms — relative to
│ the recording's t=0)
▼
ffmpeg -filter:v setpts=… (retime by playback_speed_factor)
│
▼
ffmpeg subtitles=…vtt (burn timed `say` cues — scaled times)
│
▼
OpenAI gpt-4o-mini-tts (one MP3 per action.say,
│ placed at t_action / speed_factor;
│ overlapping clips are pushed past the
│ predecessor's tail so they never mix)
▼
ffmpeg adelay+amix+apad (compose narration to exact video length)
│
▼
ffmpeg padded mux (audio padded with silence so video
│ length wins — never `-shortest`)
▼
rendered.mp4 + poster.png + manifest.json + fragment.txt + cache-status.txt
Two equivalent shapes — pick whichever the function lives next to.
identifier: "owner/repo/src/path.ts:functionName" # required; drives fragment_id
intent: "One-sentence summary (or block scalar) — used when no per-action say."
setup:
fixtures: # optional — files staged into render work-dir
- tests/fixtures/sample.md
demonstration:
kind: playwright # primary (or `cli` + VHS — legacy)
url: "http://127.0.0.1:3000/path" # or file://… for static HTML fixtures
actions:
- kind: click
selector: '[data-testid="compile"]'
say: "Clicking compile runs the generator."
output_budget:
duration_seconds: 30
resolution: "1280x720"
playback_speed_factor: 0.7 # optional; default 1.0; range [0.25, 4.0]
assertions_to_surface: # fallback captions if no action has `say`
- "result.status === 'compiled'"Runnable copy in this repo: examples/lesson_compile.docgen.yaml — synced from docs/demos/per-function/lesson-compile.docgen.yaml via demos/_seed-examples.sh.
The marker is read statically via ast — never imported or exec'd.
Keyword args mirror the YAML keys. See
examples/sample_test.py (generated beside the YAML by _seed-examples.sh).
For a *.spec.ts, drop a sibling <spec>.docgen.yaml. The renderer
records via npx playwright test --grep "<title>" instead of driving
declarative actions. See the
tests/e2e/ entries that exercise this path.
Adding a say: string to any action turns on per-action narration
mode:
_drive_playwrightwraps each action intime.monotonic()and writes atimeline.jsonof{kind, say, t_start_ms, t_end_ms}entries against the recorded video clock.- Each
sayis sent to OpenAIgpt-4o-mini-tts(voicecoral, one-sentence narration). - Audio clips are placed at
(t_start_ms / 1000) / playback_speed_factorin the slowed timeline; a clip whose desired start would land before the previous clip finishes is pushed forward (with 0.1s breathing room) so two close-together actions never overlap audibly. - A WebVTT track is built from the same scaled timestamps and burned in as captions; cues are capped at the next captioned action's start so there is no caption stacking.
If no action has say, the renderer falls back to single-clip mode:
one TTS clip from intent plays over the whole video, and
assertions_to_surface strings are spread evenly across the timeline as
captions.
actions[*].say participates in the cache key (via the actions array
hash), so editing narration text invalidates the cache.
output_budget.playback_speed_factor (default 1.0, range [0.25, 4.0])
retimes the captured visual via ffmpeg setpts=1/factor*PTS:
| Factor | Behavior | Use when |
|---|---|---|
1.0 |
passthrough | the recording is already legible |
0.7 |
~1.43× longer (sweet spot) | clicks feel rushed; a TTS clip needs room to breathe |
0.5 |
2× longer | viewers need to read a complex form mid-action |
1.5 |
~0.67× shorter | the recording has long uneventful gaps |
Audio is not re-pitched — narration clips remain at natural pace and are placed at scaled timestamps.
output_budget.duration_seconds is interpreted against the recorded
timeline, not the slowed playback. With duration_seconds: 25 and
playback_speed_factor: 0.7, the trim cap effectively becomes
25 / 0.7 ≈ 35.7s of slowed clip, so slowed videos are never chopped in
half.
Written to manifest.json's timeline field on every Playwright run:
{
"timeline": [
{
"kind": "click",
"say": "We focus the topic input.",
"t_start_ms": 531,
"t_end_ms": 578
},
{
"kind": "type",
"say": "And type a lesson topic — async iterators in this case.",
"t_start_ms": 578,
"t_end_ms": 1896
}
]
}Times are wall-clock milliseconds against time.monotonic() at the
moment the Playwright action loop began (just before page.goto). They
are not scaled by playback_speed_factor — consumers that want
playback-aligned times divide by playback_speed_factor.
t_end_ms - t_start_ms is the duration of the action call (e.g. how long
page.click() took). For zero-duration actions (e.g. wait_for against
an already-present element) the value will be a few milliseconds.
| Kind | Required params | Optional params | Notes |
|---|---|---|---|
goto |
url |
— | navigate; uses wait_until="networkidle" |
click |
selector |
— | |
fill |
selector, value |
— | sets value directly |
type |
selector, value |
delay_ms (default 40) |
clicks then keyboard-types char-by-char |
wait_for |
selector |
timeout_ms (default 10000) |
wait for element to attach |
wait_for_text |
selector, text |
timeout_ms (default 10000) |
wait for visible text match |
wait |
ms |
— | hard wait, no DOM dependency |
screenshot |
path |
— | writes PNG; rarely needed |
All action kinds accept say as an optional field for per-action
narration; see Per-action narration sync.
Five files in the directory passed to --output:
| File | Purpose |
|---|---|
rendered.mp4 |
real ISO MP4 (h264 + aac), captioned + narrated |
poster.png |
last frame, suitable for <video poster=…> |
fragment.txt |
fn-<slug> derived from identifier (no trailing newline) |
manifest.json |
snapshot: identifier, intent, fragment_id, cache_key, duration_seconds, resolution, playback_speed_factor, assertions_to_surface, actions, timeline, narration |
cache-status.txt |
hit\n or miss\n |
The snapshot is the stable contract for downstream tooling (CI, doc sites,
aggregators) that ingest manifest.json without re-running Playwright.
When --cache-dir is provided, the renderer keys on
sha256(fn_source_sha + intent_sha + fixture_sha + speed=<factor>) and
reuses the previous output bytes when the key matches. The cache key
naturally invalidates when:
- The function's source file (
.ts/.py/.tape/ YAML) changes. intentchanges.- Any staged
fixturesfile changes. playback_speed_factorchanges.- Any
actions[*]field changes (includingsay, since the YAML hash changes).
A cache hit writes cache-status.txt: hit\n and skips the entire render
pipeline (Playwright launch, TTS calls, ffmpeg passes).
The renderer never ships silent or partial demos masquerading as success. The default is fail-loud.
| Code | Constant | Trigger |
|---|---|---|
0 |
EXIT_OK |
success — rendered.mp4 exists with both video and audio streams (or --no-narration was set) |
1 |
EXIT_INVALID |
invalid manifest, render failure, or transient OpenAI network error |
2 |
EXIT_TOOLING_MISSING |
missing ffmpeg / playwright / Chromium / OPENAI_API_KEY (or key rejected by OpenAI with 401 / 403) |
78 |
EXIT_NEUTRAL_SKIP |
placeholder manifest (kind: playwright with no url) — useful in CI |
| Condition | Exit | Output dir |
|---|---|---|
OPENAI_API_KEY unset, no --no-narration |
2 |
not created |
OPENAI_API_KEY rejected by OpenAI (401/403) |
2 |
not created |
| Transient network error during TTS | 1 |
partial — clean up and retry |
--no-narration (explicit silent opt-in) |
0 |
full artifacts; narration: null in snapshot |
| Working key + connectivity | 0 |
full artifacts including audio |
The fail-loud behavior is enforced at the top of render() before
any Chromium launch or ffmpeg pass — so a missing key fails in
milliseconds, not after a 10s capture.
Required: --output (directory for artifacts). --output-dir exists only as a deprecated hidden alias — prefer --output.
docgen demo-function \
--manifest '<PATH | path.py::test_name | spec.ts | spec.ts::title>' \
--output <DIR> \
[--cache-dir <DIR>] \
[--grep <SUBSTRING>] # Playwright spec / title filter
[--no-narration] # explicit silent opt-in--manifest accepts:
*.docgen.yaml— declarative manifest.path/to/test.py::test_function—@pytest.mark.docgenkwargs as literals.spec.ts— Playwright TypeScript spec (sibling<spec>.docgen.yamlor inline annotation;--greppicks one test).spec.ts::Test title— same asspec.tswith implicit grep.