Add OCI-compatible registry support (ghcr.io, quay.io, Docker Hub) to create_singularities#161
Add OCI-compatible registry support (ghcr.io, quay.io, Docker Hub) to create_singularities#161
Conversation
- Add `get_last_ghcr_version_tag` static method on `Builder` that uses the OCI registry API at ghcr.io with anonymous Bearer auth to list tags for public GitHub Container Registry images - Modify `retry_get` to accept optional `headers` dict for auth support - Fix `get_familyname` to use the last path segment (`re.sub(r'.*/','',...)`) so that 3-segment GHCR IDs like `ghcr.io/owner/image` are handled correctly alongside the existing 2-segment Docker Hub IDs - Modify `generate_singularity_for_docker_image` to detect `ghcr.io/` prefix: route tag listing to GHCR API, and auto-extract family name from the owner segment (e.g. `unfmontreal` from `ghcr.io/unfmontreal/skullduggery`) - Add `ghcr.io/unfmontreal/skullduggery` to the repronim section in main() Agent-Logs-Url: https://github.com/ReproNim/containers/sessions/b13bea98-5812-4d1c-8b7c-e4b16460e8d2 Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ReproNim/containers/sessions/b13bea98-5812-4d1c-8b7c-e4b16460e8d2 Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
yarikoptic
left a comment
There was a problem hiding this comment.
Review of PR #161 — GHCR support for `create_singularities`
Tested the branch in a worktree (.git-meta/worktrees/ghcr-test) against live
registries. The code is clean and the refactor of _select_best_version out of
get_last_docker_version_tag is a nice touch, but there are three findings worth
addressing.
1. Correctness: ghcr.io/unfmontreal/skullduggery silently builds nothing
Exercised the new path directly (not via singularity, which wasn't installed):
Builder.get_last_ghcr_version_tag('unfmontreal/skullduggery') # → NoneThe repo only publishes two tags:
GET https://ghcr.io/v2/unfmontreal/skullduggery/tags/list
→ {"tags": ["dev", "main"]}
_select_best_version requires a match against
^([Vv]|version-|release-|)[0-9]{1,10}\..*, so neither dev nor main qualify.
The PR adds this image to the repronim build group (main() call at
scripts/create_singularities:621-622) but the builder will log
no version and silently skip it on every run.
Options:
- Point at a different image/tag that actually has semver releases.
- Accept a
version_regex=r'^(dev|main)$'and teach_select_best_versionto
treat the whole tag as "version" when no semver pattern is available (i.e.
whenonly_good_versions=Falseand a regex was explicitly given). - Add a narrow fallback: if
version_regexmatches but_select_best_version
rejects because it's non-semver, return(tag, tag)unchanged.
The third is the smallest change and keeps intent explicit at the call site:
generate_singularity_for_docker_image('ghcr.io/unfmontreal/skullduggery', 'repronim', version_regex=r'^(dev|main)$').
2. API design: Docker Hub and GHCR are both OCI — they can share one path
You're right to suspect this. Both registries implement the OCI Distribution
Spec v2. The PR currently uses two different APIs:
| Current endpoint | Format | |
|---|---|---|
| Docker Hub | registry.hub.docker.com/v2/repositories/{name}/tags |
Docker Hub's custom web API (non-standard) |
| GHCR | ghcr.io/v2/{name}/tags/list + bearer token |
OCI distribution spec |
Both registry-1.docker.io and ghcr.io serve /v2/{name}/tags/list behind a
token handshake — the exact same shape. Verified on
nipreps/fmriprep:
Custom API result → ('25.2.5', '25.2.5')
Unified OCI API → ('25.2.5', '25.2.5') # 104 tags returned
A unified abstraction worth considering:
@dataclass
class OCIRegistry:
host: str # "registry-1.docker.io" / "ghcr.io" / "quay.io"
auth_host: str # "auth.docker.io" / "ghcr.io" / "quay.io"
service: str # "registry.docker.io" / "ghcr.io" / "quay.io"
@classmethod
def parse(cls, image_id: str) -> tuple["OCIRegistry", str]:
if image_id.startswith("ghcr.io/"):
return cls("ghcr.io", "ghcr.io", "ghcr.io"), image_id[8:]
if image_id.startswith("quay.io/"):
return cls("quay.io", "quay.io", "quay.io"), image_id[8:]
repo = image_id if "/" in image_id else f"library/{image_id}"
return cls("registry-1.docker.io", "auth.docker.io", "registry.docker.io"), repo
def list_tags(self, repo: str) -> list[str]:
tok = retry_get(
f"https://{self.auth_host}/token?service={self.service}"
f"&scope=repository:{repo}:pull"
).json()
token = tok.get("token") or tok.get("access_token", "")
r = retry_get(
f"https://{self.host}/v2/{repo}/tags/list",
headers={"Authorization": f"Bearer {token}"},
)
return r.json().get("tags") or []Then get_last_version_tag becomes one function that calls list_tags and
delegates to _select_best_version.
Trade-off: the Docker Hub custom API returns extra metadata (upload date,
size, last_updated) that the OCI API does not. The current code doesn't use any
of it, so there's no regression from switching. The loss is purely theoretical
— if tag-selection logic ever wants "most recently pushed" instead of "highest
semver", the custom API would be handy. The OCI API returns tags sorted
lexicographically, not chronologically, so that's a real difference — but again,
today's selector only uses max(version_key), not upload time.
Recommendation: refactor to the unified OCI path. Reasoning:
- One code path instead of two → easier to test and to extend (quay.io, GitLab
registry, self-hosted Harbor all fall out for free). - Closer to what
singularity pull docker://…does under the hood. - Sidesteps the Docker Hub custom-API pagination quirk (see #3 below).
If you'd rather land the PR as-is, please at minimum rename
get_last_docker_version_tag / get_last_ghcr_version_tag into a single
dispatching entry point, e.g. get_last_version_tag(image_id), so call sites
stop duplicating the is_ghcr branching currently in
generate_singularity_for_docker_image:403-411.
3. Pagination is unhandled in both paths
Neither method handles paginated tag lists.
- Docker Hub custom API defaults to 10 results per page (max 100). For
repos likelibrary/python(3752 tags) only a tiny window is seen. Today it
survives because results are sorted newest-first, so the highest semver is
usually on page 1 — but this is fragile and silently wrong for
version_regex-filtered lookups against historical tags (see
nipreps/fmriprep20.2.*LTS atmain():591, which already returns a
cut-off list of 10 on every run). - GHCR OCI returns everything in one shot for the repos I tested, but the
distribution spec allows the server to paginate via theLink: …; rel="next"
header, and GHCR is documented to do this for very large repos.
Not a blocker, but worth a # TODO: handle Link-header pagination comment at
minimum.
4. Minor: auto-family for ghcr differs from Docker Hub
At generate_singularity_for_docker_image:390-393, when family isn't passed,
ghcr uses the owner segment (ghcr.io/unfmontreal/skullduggery →
unfmontreal) but Docker Hub uses the same (nipreps/fmriprep → nipreps).
Consistent, so fine — but note that this means plain CLI use of
scripts/create_singularities ghcr.io/foo/bar will create an images/foo/
subdirectory, whereas for the main() entry the explicit family="repronim"
override is what keeps skullduggery under images/repronim/. Documenting that
in the docstring or a comment at the call site would help future contributors.
Suggested next steps
- Decide on option (1) — either drop the skullduggery inclusion, or add
non-semver tag handling. Without this the PR merges a no-op. - Pick between (2a) ship as-is with a renamed dispatcher, or (2b) land the OCI
unification. - Add at least one unit test exercising the GHCR path with a mocked response
(scripts/tests/test_create_singularities.pyhas none forBuilder.*).
Happy to draft the OCI refactor if you want to take that direction.
|
@copilot , please review and address the review (by claude code) I just submitted requesting changes |
…O, unit tests Per review by @yarikoptic: 1. Unified OCI API: replace separate get_last_docker_version_tag and get_last_ghcr_version_tag with a new OCIRegistry dataclass + single get_last_version_tag dispatcher. All three registries (Docker Hub, GHCR, Quay.io) share one /v2/{repo}/tags/list code path with a bearer token handshake. 2. Non-semver fallback: when version_regex is explicitly provided but no semver tag matches, _select_best_version now returns the first regex-matched tag verbatim (unless only_good_versions=True). This makes version_regex=r'^main$' work for skullduggery. 3. Skullduggery call updated to version_regex=r'^main$' so it actually builds the main-branch image until versioned releases are published. 4. Pagination TODO comment added inside OCIRegistry.list_tags. 5. generate_singularity_for_docker_image simplified: removed is_ghcr branching, unified family extraction via parts[-2]. 6. Unit tests added in scripts/tests/test_create_singularities.py (17 tests covering OCIRegistry.for_image, list_tags, _select_best_version, and get_last_version_tag with mocked HTTP). 7. noxfile.py: new 'tests' session to run pytest tests. Agent-Logs-Url: https://github.com/ReproNim/containers/sessions/43f3a2b5-3c7c-40ed-85a7-6b575e55a061 Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
Addressed all four points in commit
|
- OCIRegistry.list_tags: raise ValueError with descriptive message when bearer token is missing from auth response instead of silently using an empty string that would cause cryptic auth failures - OCIRegistry.list_tags: raise ValueError with context info when 'tags' field is missing from the tags/list API response instead of silently returning [] - test: use mock_get.call_args.kwargs (safe attribute access) instead of tuple-unpacking call_args Agent-Logs-Url: https://github.com/ReproNim/containers/sessions/43f3a2b5-3c7c-40ed-85a7-6b575e55a061 Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
All four points from the review were addressed in commits
|
create_singularitieswas hardcoded to fetch images and list tags only from Docker Hub via its custom web API. This extends it to support any OCI-compatible registry (GitHub Container Registry, Quay.io, Docker Hub, and others) through a unified code path.Changes
OCIRegistrydataclass — new class encapsulating the per-registry token/tags parameters. All registries share the same OCI Distribution Spec v2GET /v2/{repo}/tags/listcode path with a bearer-token handshake. Supportsghcr.io,quay.io, Docker Hub, and any other OCI-compliant registry out of the box.retry_get— accepts an optionalheadersdict to support Bearer-token authenticated requestsBuilder.get_last_version_tag— new unified dispatcher replacing the old registry-specific methods; accepts any image ID (owner/image,ghcr.io/owner/image,quay.io/org/image) and delegates toOCIRegistryBuilder._select_best_version— enhanced to fall back to returning the first regex-matched tag verbatim whenversion_regexis explicitly provided but no semver tag matches (andonly_good_versions=False). This enables tracking non-semver tags likemainordev.generate_singularity_for_docker_image— simplified by removing registry-specific branching; uses the unifiedget_last_version_tagdispatcher for all registriesget_familyname— regex changed from^[^/]*/to.*/(greedy, take last path segment), soghcr.io/owner/imagecorrectly yieldsimage; Docker Hub two-segment IDs are unaffectedmain()— addsghcr.io/unfmontreal/skullduggeryto therepronimbuild group, tracking themainbranch image viaversion_regex=r"^main$"until versioned releases are publishedscripts/tests/test_create_singularities.py— new pytest test module with 17 tests coveringOCIRegistry.for_image,OCIRegistry.list_tags(mocked HTTP),_select_best_version, andget_last_version_tagnoxfile.py— newtestssession to run the pytest suiteUsage remains identical; pass a full registry-prefixed URL anywhere a Docker Hub ID was accepted: