Fix HuggingFace API rate limiting in CI (#1291)#1296
Fix HuggingFace API rate limiting in CI (#1291)#1296ak91456 wants to merge 5 commits intoTransformerLensOrg:devfrom
Conversation
* Fix type of HookedTransformerConfig.device This is typed as `Optional[str]` but sometimes returns `torch.device`. Updated the code to just return the `str` instead of wrapping with a device. I'm not confident that every function which takes a device will always be passed a string, so I didn't change functions like warn_if_mps. Found while working on TransformerLensOrg#1219 * more cleanup * 3.0 CI Bugs (TransformerLensOrg#1261) * Fixing `utils` imports * skip gated notebooks on PR from forks * Updating notebooks * Ensure LLaMA only runs when HF_TOKEN is available --------- Co-authored-by: jlarson4 <jonahalarson@comcast.net>
TransformerLens 3.1.0
| break | ||
| except Exception as exc: | ||
| if "429" in str(exc) and attempt < max_retries: | ||
| wait = 10 * (attempt + 1) |
There was a problem hiding this comment.
If many workers hit a 429 at the same time, all will retry at the same time, creating a repeated loop of failures. Can we add some random variance to the wait duration?
Additionally, we might want to increase the base duration of wait in this situation. HF's API timeout window is 5 minutes, if we have a large volume of requests at the same time we are going to want to spread out the calls on each retry using longer waits + variance to increase our chances of success.
|
This is a solid start to fixing 1291. We will still need to do additional work on bundling and creating fixtures for model requests to reduce the request load further, but this will definitely help reduce the number of false failures caused by timeout. Thank you for contributing! |
Description
Fixes #1291 — CI jobs were hitting HTTP 429 (Too Many Requests) from HuggingFace Hub when multiple PRs or pushes triggered simultaneous workflow runs.
Even when model files are cached locally,
huggingface_hubmakes lightweight "resolve" API calls by default to check freshness. With 3 matrix Pythonversions + coverage + notebook jobs all running in parallel across multiple PRs, these calls exceeded HF's rate limit.
Changes:
checks.yml): Cancels the stale in-progress run on the same PR branch when a new push arrives. Push-to-main, tags, andworkflow_call(release) events are exempt and never cancelled.hf_utils.py):download_file_from_hfnow retries up to 3 times (10s / 20s / 30s waits) on HTTP 429, matching the patternalready used in
hf_scraper.py.Fixes #1291
Type of change
Checklist: