feat: cache generated Playwright scripts so repeat runs skip the LLM loop#6
Open
mvanhorn wants to merge 1 commit into
Open
feat: cache generated Playwright scripts so repeat runs skip the LLM loop#6mvanhorn wants to merge 1 commit into
mvanhorn wants to merge 1 commit into
Conversation
…loop Today, running the same task twice burns the full LLM budget twice: the agent loop regenerates final_script.py from scratch even when the prior run succeeded against an unchanged site. Stagehand caches selectors so repeat runs skip LLM inference; webwright already saves final_script.py per task but never indexes it for retrieval. This change adds an opt-in file-backed cache (cache.enabled: false by default) keyed on a stable fingerprint of (task, start_url, model_name, env_class). On hit, the agent loop is skipped and the cached script executes directly. On script failure (selector drift) or 4xx/5xx on the start_url, the entry is invalidated and the agent loop runs normally. Cache lives at ~/.cache/webwright/<fingerprint>/ with metadata.json, final_script.py, trajectory.json. No new runtime deps; uses stdlib hashlib and the existing httpx for URL validation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Webwright now supports an opt-in script cache that lets repeat runs of the same task skip the agent loop entirely and execute the cached Playwright script directly. Cache hits cost zero LLM tokens.
Why this matters
Webwright's pitch in the Microsoft Research blog post is that the agent's persistent artifact is code: a
final_script.pyper task that "can be rerun, adapted, and shared across tasks rather than rediscovered from scratch." Today that artifact is saved but never reused — running the same task twice regeneratesfinal_script.pyfrom scratch and burns the full LLM budget twice.Stagehand built its v3 wedge against browser-use on exactly this property: auto-caching combined with self-healing remembers previous actions and runs without LLM inference. Independent reviews note that "for repeated workflows (same sites, same forms, many times a day), Stagehand's caching means your costs approach zero after the first run." This change brings webwright the same property for full-script replay.
Changes
src/webwright/cache/script_cache.pyaddsmake_fingerprint(config)(SHA-256 overtask,start_url,model.model_name,environment.environment_classby default; fields configurable) and aScriptCacheclass that reads and writes entries under~/.cache/webwright/<fingerprint>/. Each entry holdsmetadata.json,final_script.py, andtrajectory.json.CacheConfigis added toconfig/__init__.pywithenabled: falseas the default. The feature is fully opt-in.run/cli.pychecks the cache before constructing the model + agent. On hit, it runs the cached script via the local workspace environment and returns a result shaped like a normal run (withcached: true). On any failure during replay (selector drift, exception, 4xx/5xx onstart_urlwhenvalidate_url: true), the entry is invalidated and the agent loop runs normally.agents/default.pywrites a new cache entry after every successful run whencache.enabledis set.No new runtime dependencies — uses stdlib
hashlib/json/pathliband the existinghttpxfor the URL precheck.Testing
tests/unit/test_script_cache.pycovers fingerprint stability (same inputs → same hash, one input change → different hash), script-error invalidation (cached script raises → entry is removed), and start-URL invalidation (HEAD returns 5xx → entry is removed). 3 tests, all passing.A meaningful end-to-end test requires a real LLM run on both ends of the cache window, so a follow-up integration test is left out of this PR; the unit-level invariants are what would block re-merges.
Usage
Opt in via config:
Or via CLI:
webwright -t "..." --start-url "..." --cache.enabled=true.A cache hit prints
Cache hit: skipping model loopto the console; a miss printsCache missand runs normally.