Fix HuggingFace API rate limiting in CI (#1291) by ak91456 · Pull Request #1296 · TransformerLensOrg/TransformerLens

ak91456 · 2026-05-09T08:30:32Z

Description

Fixes #1291 — CI jobs were hitting HTTP 429 (Too Many Requests) from HuggingFace Hub when multiple PRs or pushes triggered simultaneous workflow runs.

Even when model files are cached locally, huggingface_hub makes lightweight "resolve" API calls by default to check freshness. With 3 matrix Python
versions + coverage + notebook jobs all running in parallel across multiple PRs, these calls exceeded HF's rate limit.

Changes:

Concurrency group (checks.yml): Cancels the stale in-progress run on the same PR branch when a new push arrives. Push-to-main, tags, and
workflow_call (release) events are exempt and never cancelled.
Retry with backoff (hf_utils.py): download_file_from_hf now retries up to 3 times (10s / 20s / 30s waits) on HTTP 429, matching the pattern
already used in hf_scraper.py.

Fixes #1291

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

* Fix type of HookedTransformerConfig.device This is typed as `Optional[str]` but sometimes returns `torch.device`. Updated the code to just return the `str` instead of wrapping with a device. I'm not confident that every function which takes a device will always be passed a string, so I didn't change functions like warn_if_mps. Found while working on TransformerLensOrg#1219 * more cleanup * 3.0 CI Bugs (TransformerLensOrg#1261) * Fixing `utils` imports * skip gated notebooks on PR from forks * Updating notebooks * Ensure LLaMA only runs when HF_TOKEN is available --------- Co-authored-by: jlarson4 <jonahalarson@comcast.net>

TransformerLens 3.1.0

Release v3.2.0

jlarson4 · 2026-05-11T13:09:07Z

+            break
+        except Exception as exc:
+            if "429" in str(exc) and attempt < max_retries:
+                wait = 10 * (attempt + 1)


If many workers hit a 429 at the same time, all will retry at the same time, creating a repeated loop of failures. Can we add some random variance to the wait duration?

Additionally, we might want to increase the base duration of wait in this situation. HF's API timeout window is 5 minutes, if we have a large volume of requests at the same time we are going to want to spread out the calls on each retry using longer waits + variance to increase our chances of success.

jlarson4 · 2026-05-11T13:11:19Z

This is a solid start to fixing 1291. We will still need to do additional work on bundling and creating fixtures for model requests to reduce the request load further, but this will definitely help reduce the number of false failures caused by timeout. Thank you for contributing!

brendanlong and others added 4 commits April 20, 2026 14:50

Merge pull request TransformerLensOrg#1277 from TransformerLensOrg/dev

6f56518

TransformerLens 3.1.0

Merge pull request TransformerLensOrg#1294 from TransformerLensOrg/dev

31d4f6a

Release v3.2.0

Fix HuggingFace API rate limiting in CI (TransformerLensOrg#1291)

70eaa97

ak91456 marked this pull request as draft May 9, 2026 17:42

ak91456 marked this pull request as ready for review May 9, 2026 17:42

updates checks

5d7e3a3

jlarson4 reviewed May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix HuggingFace API rate limiting in CI (#1291)#1296

Fix HuggingFace API rate limiting in CI (#1291)#1296
ak91456 wants to merge 5 commits intoTransformerLensOrg:devfrom
ak91456:main

ak91456 commented May 9, 2026 •

edited by jlarson4

Loading

Uh oh!

jlarson4 May 11, 2026

Uh oh!

jlarson4 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ak91456 commented May 9, 2026 • edited by jlarson4 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

jlarson4 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

jlarson4 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ak91456 commented May 9, 2026 •

edited by jlarson4

Loading