Skip to content

Latest commit

 

History

History
313 lines (251 loc) · 14.1 KB

File metadata and controls

313 lines (251 loc) · 14.1 KB

docgen demo-function — reference

docgen demo-function renders one purpose-built tutorial MP4 per scenario — the short-form counterpart to docgen generate-all, which stitches multi-segment demos from docgen.yaml. Inputs are Playwright-shaped (Node spec, declarative url + actions, or a Python @pytest.mark.docgen marker parsed with ast only). demonstration.kind: cli (VHS) is legacy; see AGENTS.md (north star: story mode vs truth-from-tests).

For the CLI quick start and copy-paste invocation, see README § Per-function video docs. This page is the schema, pipeline, and behavior reference.

In this repository

What Where
Canonical manifest + HTML fixture docs/demos/per-function/lesson-compile.docgen.yaml and lesson-compile.html (file://__FIXTURE__/… is rewritten to a real path by the rebuild script)
Rebuild all per-function demos docs/demos/_rebuild-per-function.sh (requires OPENAI_API_KEY; outputs under docs/demos/recordings/per-function/)
Copies for readers / docs links examples/lesson_compile.docgen.yaml and examples/sample_test.py, regenerated by docs/demos/_seed-examples.sh (also at the end of _full-reset-regenerate.sh)
Long-form segment explaining the same subcommand Segment 08 in docs/demos/docgen.yaml, rendered by generate-all

Do not edit the examples/ copies by hand; change docs/demos/per-function/ and re-run _seed-examples.sh.

End-to-end tests: tests/e2e/test_demo_function_e2e.py.

Playwright-first tutorial demos

The main goal of per-function work is to turn Playwright-shaped behavior into tutorial / demo video clips: one test (or one manifest that mirrors the same steps as a test) becomes one watchable walkthrough. That keeps the demo tied to something you already trust — green Playwright coverage — instead of a hand-maintained script that drifts from reality.

Typical authoring paths (all Playwright under the hood unless noted):

  1. Recorded @playwright/test spec — sibling *.docgen.yaml or in-test docgen annotation; docgen runs npx playwright test with --grep so the real Node test is the source of the recording.
  2. Declarative demonstration.kind: playwright — YAML lists url + actions (click, type, wait, …); Chromium records the same flow without a separate capture script. Use this for static HTML fixtures (file://...) or when you want the manifest to read like a test outline.
  3. @pytest.mark.docgen — Python tests with the marker are read statically (ast); Playwright still drives capture for browser demos.

demonstration.kind: cli (VHS .tape inside demo-function) is legacy, same possible deprecation as docgen vhs in long-form demos. Do not build new product docs on VHS if you can use Playwright (UI) or Manim (explainer). Migrate existing tapes when practical.

Table of contents

Pipeline overview

For kind: playwright, capture is always Playwright (Chromium) — whether the manifest came from a recorded Node test, declarative actions, or a pytest marker. The diagram below is the Playwright path; kind: cli swaps the first stage for VHS (legacy; see above).

manifest (YAML / Playwright spec / @pytest.mark.docgen)
    │
    ▼
Playwright Chromium  ──── records visual.webm + timeline.json
    │                     (one entry per action: kind, say,
    │                      t_start_ms, t_end_ms — relative to
    │                      the recording's t=0)
    ▼
ffmpeg -filter:v setpts=…  (retime by playback_speed_factor)
    │
    ▼
ffmpeg subtitles=…vtt    (burn timed `say` cues — scaled times)
    │
    ▼
OpenAI gpt-4o-mini-tts   (one MP3 per action.say,
    │                     placed at t_action / speed_factor;
    │                     overlapping clips are pushed past the
    │                     predecessor's tail so they never mix)
    ▼
ffmpeg adelay+amix+apad  (compose narration to exact video length)
    │
    ▼
ffmpeg padded mux         (audio padded with silence so video
    │                     length wins — never `-shortest`)
    ▼
rendered.mp4 + poster.png + manifest.json + fragment.txt + cache-status.txt

Manifest schema

Two equivalent shapes — pick whichever the function lives next to.

YAML sidecar (*.docgen.yaml)

identifier: "owner/repo/src/path.ts:functionName"   # required; drives fragment_id
intent: "One-sentence summary (or block scalar) — used when no per-action say."
setup:
  fixtures:                                          # optional — files staged into render work-dir
    - tests/fixtures/sample.md
demonstration:
  kind: playwright                                   # primary (or `cli` + VHS — legacy)
  url: "http://127.0.0.1:3000/path"                  # or file://… for static HTML fixtures
  actions:
    - kind: click
      selector: '[data-testid="compile"]'
      say: "Clicking compile runs the generator."
output_budget:
  duration_seconds: 30
  resolution: "1280x720"
  playback_speed_factor: 0.7                         # optional; default 1.0; range [0.25, 4.0]
assertions_to_surface:                               # fallback captions if no action has `say`
  - "result.status === 'compiled'"

Runnable copy in this repo: examples/lesson_compile.docgen.yaml — synced from docs/demos/per-function/lesson-compile.docgen.yaml via demos/_seed-examples.sh.

Python @pytest.mark.docgen(...)

The marker is read statically via ast — never imported or exec'd. Keyword args mirror the YAML keys. See examples/sample_test.py (generated beside the YAML by _seed-examples.sh).

TypeScript Playwright spec sidecar

For a *.spec.ts, drop a sibling <spec>.docgen.yaml. The renderer records via npx playwright test --grep "<title>" instead of driving declarative actions. See the tests/e2e/ entries that exercise this path.

Per-action narration sync (say)

Adding a say: string to any action turns on per-action narration mode:

  • _drive_playwright wraps each action in time.monotonic() and writes a timeline.json of {kind, say, t_start_ms, t_end_ms} entries against the recorded video clock.
  • Each say is sent to OpenAI gpt-4o-mini-tts (voice coral, one-sentence narration).
  • Audio clips are placed at (t_start_ms / 1000) / playback_speed_factor in the slowed timeline; a clip whose desired start would land before the previous clip finishes is pushed forward (with 0.1s breathing room) so two close-together actions never overlap audibly.
  • A WebVTT track is built from the same scaled timestamps and burned in as captions; cues are capped at the next captioned action's start so there is no caption stacking.

If no action has say, the renderer falls back to single-clip mode: one TTS clip from intent plays over the whole video, and assertions_to_surface strings are spread evenly across the timeline as captions.

actions[*].say participates in the cache key (via the actions array hash), so editing narration text invalidates the cache.

Slowdown (playback_speed_factor)

output_budget.playback_speed_factor (default 1.0, range [0.25, 4.0]) retimes the captured visual via ffmpeg setpts=1/factor*PTS:

Factor Behavior Use when
1.0 passthrough the recording is already legible
0.7 ~1.43× longer (sweet spot) clicks feel rushed; a TTS clip needs room to breathe
0.5 2× longer viewers need to read a complex form mid-action
1.5 ~0.67× shorter the recording has long uneventful gaps

Audio is not re-pitched — narration clips remain at natural pace and are placed at scaled timestamps.

output_budget.duration_seconds is interpreted against the recorded timeline, not the slowed playback. With duration_seconds: 25 and playback_speed_factor: 0.7, the trim cap effectively becomes 25 / 0.7 ≈ 35.7s of slowed clip, so slowed videos are never chopped in half.

Captured timeline shape

Written to manifest.json's timeline field on every Playwright run:

{
  "timeline": [
    {
      "kind": "click",
      "say": "We focus the topic input.",
      "t_start_ms": 531,
      "t_end_ms": 578
    },
    {
      "kind": "type",
      "say": "And type a lesson topic — async iterators in this case.",
      "t_start_ms": 578,
      "t_end_ms": 1896
    }
  ]
}

Times are wall-clock milliseconds against time.monotonic() at the moment the Playwright action loop began (just before page.goto). They are not scaled by playback_speed_factor — consumers that want playback-aligned times divide by playback_speed_factor.

t_end_ms - t_start_ms is the duration of the action call (e.g. how long page.click() took). For zero-duration actions (e.g. wait_for against an already-present element) the value will be a few milliseconds.

Action kinds

Kind Required params Optional params Notes
goto url navigate; uses wait_until="networkidle"
click selector
fill selector, value sets value directly
type selector, value delay_ms (default 40) clicks then keyboard-types char-by-char
wait_for selector timeout_ms (default 10000) wait for element to attach
wait_for_text selector, text timeout_ms (default 10000) wait for visible text match
wait ms hard wait, no DOM dependency
screenshot path writes PNG; rarely needed

All action kinds accept say as an optional field for per-action narration; see Per-action narration sync.

Output artifacts

Five files in the directory passed to --output:

File Purpose
rendered.mp4 real ISO MP4 (h264 + aac), captioned + narrated
poster.png last frame, suitable for <video poster=…>
fragment.txt fn-<slug> derived from identifier (no trailing newline)
manifest.json snapshot: identifier, intent, fragment_id, cache_key, duration_seconds, resolution, playback_speed_factor, assertions_to_surface, actions, timeline, narration
cache-status.txt hit\n or miss\n

The snapshot is the stable contract for downstream tooling (CI, doc sites, aggregators) that ingest manifest.json without re-running Playwright.

Caching

When --cache-dir is provided, the renderer keys on sha256(fn_source_sha + intent_sha + fixture_sha + speed=<factor>) and reuses the previous output bytes when the key matches. The cache key naturally invalidates when:

  • The function's source file (.ts / .py / .tape / YAML) changes.
  • intent changes.
  • Any staged fixtures file changes.
  • playback_speed_factor changes.
  • Any actions[*] field changes (including say, since the YAML hash changes).

A cache hit writes cache-status.txt: hit\n and skips the entire render pipeline (Playwright launch, TTS calls, ffmpeg passes).

Fail modes & exit codes

The renderer never ships silent or partial demos masquerading as success. The default is fail-loud.

Code Constant Trigger
0 EXIT_OK success — rendered.mp4 exists with both video and audio streams (or --no-narration was set)
1 EXIT_INVALID invalid manifest, render failure, or transient OpenAI network error
2 EXIT_TOOLING_MISSING missing ffmpeg / playwright / Chromium / OPENAI_API_KEY (or key rejected by OpenAI with 401 / 403)
78 EXIT_NEUTRAL_SKIP placeholder manifest (kind: playwright with no url) — useful in CI

Behavior matrix

Condition Exit Output dir
OPENAI_API_KEY unset, no --no-narration 2 not created
OPENAI_API_KEY rejected by OpenAI (401/403) 2 not created
Transient network error during TTS 1 partial — clean up and retry
--no-narration (explicit silent opt-in) 0 full artifacts; narration: null in snapshot
Working key + connectivity 0 full artifacts including audio

The fail-loud behavior is enforced at the top of render() before any Chromium launch or ffmpeg pass — so a missing key fails in milliseconds, not after a 10s capture.

CLI reference

Required: --output (directory for artifacts). --output-dir exists only as a deprecated hidden alias — prefer --output.

docgen demo-function \
  --manifest '<PATH | path.py::test_name | spec.ts | spec.ts::title>' \
  --output <DIR> \
  [--cache-dir <DIR>] \
  [--grep <SUBSTRING>]              # Playwright spec / title filter
  [--no-narration]                  # explicit silent opt-in

--manifest accepts:

  • *.docgen.yaml — declarative manifest.
  • path/to/test.py::test_function@pytest.mark.docgen kwargs as literals.
  • spec.ts — Playwright TypeScript spec (sibling <spec>.docgen.yaml or inline annotation; --grep picks one test).
  • spec.ts::Test title — same as spec.ts with implicit grep.