Skip to content

Fix HuggingFace API rate limiting in CI (#1291)#1296

Open
ak91456 wants to merge 5 commits intoTransformerLensOrg:devfrom
ak91456:main
Open

Fix HuggingFace API rate limiting in CI (#1291)#1296
ak91456 wants to merge 5 commits intoTransformerLensOrg:devfrom
ak91456:main

Conversation

@ak91456
Copy link
Copy Markdown

@ak91456 ak91456 commented May 9, 2026

Description

Fixes #1291 — CI jobs were hitting HTTP 429 (Too Many Requests) from HuggingFace Hub when multiple PRs or pushes triggered simultaneous workflow runs.

Even when model files are cached locally, huggingface_hub makes lightweight "resolve" API calls by default to check freshness. With 3 matrix Python
versions + coverage + notebook jobs all running in parallel across multiple PRs, these calls exceeded HF's rate limit.

Changes:

  • Concurrency group (checks.yml): Cancels the stale in-progress run on the same PR branch when a new push arrives. Push-to-main, tags, and
    workflow_call (release) events are exempt and never cancelled.
  • Retry with backoff (hf_utils.py): download_file_from_hf now retries up to 3 times (10s / 20s / 30s waits) on HTTP 429, matching the pattern
    already used in hf_scraper.py.

Fixes #1291

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

brendanlong and others added 4 commits April 20, 2026 14:50
* Fix type of HookedTransformerConfig.device

This is typed as `Optional[str]` but sometimes returns `torch.device`.
Updated the code to just return the `str` instead of wrapping with a
device.

I'm not confident that every function which takes a device will
always be passed a string, so I didn't change functions like
warn_if_mps.

Found while working on TransformerLensOrg#1219

* more cleanup

* 3.0 CI Bugs (TransformerLensOrg#1261)

* Fixing `utils` imports

* skip gated notebooks on PR from forks

* Updating notebooks

* Ensure LLaMA only runs when HF_TOKEN is available

---------

Co-authored-by: jlarson4 <jonahalarson@comcast.net>
@ak91456 ak91456 marked this pull request as draft May 9, 2026 17:42
@ak91456 ak91456 marked this pull request as ready for review May 9, 2026 17:42
break
except Exception as exc:
if "429" in str(exc) and attempt < max_retries:
wait = 10 * (attempt + 1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If many workers hit a 429 at the same time, all will retry at the same time, creating a repeated loop of failures. Can we add some random variance to the wait duration?

Additionally, we might want to increase the base duration of wait in this situation. HF's API timeout window is 5 minutes, if we have a large volume of requests at the same time we are going to want to spread out the calls on each retry using longer waits + variance to increase our chances of success.

@jlarson4
Copy link
Copy Markdown
Collaborator

This is a solid start to fixing 1291. We will still need to do additional work on bundling and creating fixtures for model requests to reduce the request load further, but this will definitely help reduce the number of false failures caused by timeout. Thank you for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants