Skip to content

feat: cache generated Playwright scripts so repeat runs skip the LLM loop#6

Open
mvanhorn wants to merge 1 commit into
microsoft:mainfrom
mvanhorn:fix/webwright-action-cache
Open

feat: cache generated Playwright scripts so repeat runs skip the LLM loop#6
mvanhorn wants to merge 1 commit into
microsoft:mainfrom
mvanhorn:fix/webwright-action-cache

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

Webwright now supports an opt-in script cache that lets repeat runs of the same task skip the agent loop entirely and execute the cached Playwright script directly. Cache hits cost zero LLM tokens.

Why this matters

Webwright's pitch in the Microsoft Research blog post is that the agent's persistent artifact is code: a final_script.py per task that "can be rerun, adapted, and shared across tasks rather than rediscovered from scratch." Today that artifact is saved but never reused — running the same task twice regenerates final_script.py from scratch and burns the full LLM budget twice.

Stagehand built its v3 wedge against browser-use on exactly this property: auto-caching combined with self-healing remembers previous actions and runs without LLM inference. Independent reviews note that "for repeated workflows (same sites, same forms, many times a day), Stagehand's caching means your costs approach zero after the first run." This change brings webwright the same property for full-script replay.

Changes

  • New src/webwright/cache/script_cache.py adds make_fingerprint(config) (SHA-256 over task, start_url, model.model_name, environment.environment_class by default; fields configurable) and a ScriptCache class that reads and writes entries under ~/.cache/webwright/<fingerprint>/. Each entry holds metadata.json, final_script.py, and trajectory.json.
  • CacheConfig is added to config/__init__.py with enabled: false as the default. The feature is fully opt-in.
  • run/cli.py checks the cache before constructing the model + agent. On hit, it runs the cached script via the local workspace environment and returns a result shaped like a normal run (with cached: true). On any failure during replay (selector drift, exception, 4xx/5xx on start_url when validate_url: true), the entry is invalidated and the agent loop runs normally.
  • agents/default.py writes a new cache entry after every successful run when cache.enabled is set.

No new runtime dependencies — uses stdlib hashlib/json/pathlib and the existing httpx for the URL precheck.

Testing

tests/unit/test_script_cache.py covers fingerprint stability (same inputs → same hash, one input change → different hash), script-error invalidation (cached script raises → entry is removed), and start-URL invalidation (HEAD returns 5xx → entry is removed). 3 tests, all passing.

A meaningful end-to-end test requires a real LLM run on both ends of the cache window, so a follow-up integration test is left out of this PR; the unit-level invariants are what would block re-merges.

Usage

Opt in via config:

# base.yaml
cache:
  enabled: true
  directory: ~/.cache/webwright
  ttl_seconds: 604800
  validate_url: true

Or via CLI: webwright -t "..." --start-url "..." --cache.enabled=true.

A cache hit prints Cache hit: skipping model loop to the console; a miss prints Cache miss and runs normally.

…loop

Today, running the same task twice burns the full LLM budget twice: the
agent loop regenerates final_script.py from scratch even when the prior
run succeeded against an unchanged site. Stagehand caches selectors so
repeat runs skip LLM inference; webwright already saves final_script.py
per task but never indexes it for retrieval.

This change adds an opt-in file-backed cache (cache.enabled: false by
default) keyed on a stable fingerprint of (task, start_url, model_name,
env_class). On hit, the agent loop is skipped and the cached script
executes directly. On script failure (selector drift) or 4xx/5xx on the
start_url, the entry is invalidated and the agent loop runs normally.

Cache lives at ~/.cache/webwright/<fingerprint>/ with metadata.json,
final_script.py, trajectory.json. No new runtime deps; uses stdlib
hashlib and the existing httpx for URL validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant