feat: cache generated Playwright scripts so repeat runs skip the LLM loop by mvanhorn · Pull Request #6 · microsoft/Webwright

mvanhorn · 2026-05-26T14:00:44Z

Summary

Webwright now supports an opt-in script cache that lets repeat runs of the same task skip the agent loop entirely and execute the cached Playwright script directly. Cache hits cost zero LLM tokens.

Why this matters

Webwright's pitch in the Microsoft Research blog post is that the agent's persistent artifact is code: a final_script.py per task that "can be rerun, adapted, and shared across tasks rather than rediscovered from scratch." Today that artifact is saved but never reused — running the same task twice regenerates final_script.py from scratch and burns the full LLM budget twice.

Stagehand built its v3 wedge against browser-use on exactly this property: auto-caching combined with self-healing remembers previous actions and runs without LLM inference. Independent reviews note that "for repeated workflows (same sites, same forms, many times a day), Stagehand's caching means your costs approach zero after the first run." This change brings webwright the same property for full-script replay.

Changes

New src/webwright/cache/script_cache.py adds make_fingerprint(config) (SHA-256 over task, start_url, model.model_name, environment.environment_class by default; fields configurable) and a ScriptCache class that reads and writes entries under ~/.cache/webwright/<fingerprint>/. Each entry holds metadata.json, final_script.py, and trajectory.json.
CacheConfig is added to config/__init__.py with enabled: false as the default. The feature is fully opt-in.
run/cli.py checks the cache before constructing the model + agent. On hit, it runs the cached script via the local workspace environment and returns a result shaped like a normal run (with cached: true). On any failure during replay (selector drift, exception, 4xx/5xx on start_url when validate_url: true), the entry is invalidated and the agent loop runs normally.
agents/default.py writes a new cache entry after every successful run when cache.enabled is set.

No new runtime dependencies — uses stdlib hashlib/json/pathlib and the existing httpx for the URL precheck.

Testing

tests/unit/test_script_cache.py covers fingerprint stability (same inputs → same hash, one input change → different hash), script-error invalidation (cached script raises → entry is removed), and start-URL invalidation (HEAD returns 5xx → entry is removed). 3 tests, all passing.

A meaningful end-to-end test requires a real LLM run on both ends of the cache window, so a follow-up integration test is left out of this PR; the unit-level invariants are what would block re-merges.

Usage

Opt in via config:

# base.yaml
cache:
  enabled: true
  directory: ~/.cache/webwright
  ttl_seconds: 604800
  validate_url: true

Or via CLI: webwright -t "..." --start-url "..." --cache.enabled=true.

A cache hit prints Cache hit: skipping model loop to the console; a miss prints Cache miss and runs normally.

…loop Today, running the same task twice burns the full LLM budget twice: the agent loop regenerates final_script.py from scratch even when the prior run succeeded against an unchanged site. Stagehand caches selectors so repeat runs skip LLM inference; webwright already saves final_script.py per task but never indexes it for retrieval. This change adds an opt-in file-backed cache (cache.enabled: false by default) keyed on a stable fingerprint of (task, start_url, model_name, env_class). On hit, the agent loop is skipped and the cached script executes directly. On script failure (selector drift) or 4xx/5xx on the start_url, the entry is invalidated and the agent loop runs normally. Cache lives at ~/.cache/webwright/<fingerprint>/ with metadata.json, final_script.py, trajectory.json. No new runtime deps; uses stdlib hashlib and the existing httpx for URL validation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cache generated Playwright scripts so repeat runs skip the LLM loop#6

feat: cache generated Playwright scripts so repeat runs skip the LLM loop#6
mvanhorn wants to merge 1 commit into
microsoft:mainfrom
mvanhorn:fix/webwright-action-cache

mvanhorn commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvanhorn commented May 26, 2026

Summary

Why this matters

Changes

Testing

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant