Skip to content

Add OCI-compatible registry support (ghcr.io, quay.io, Docker Hub) to create_singularities#161

Draft
Copilot wants to merge 5 commits intomasterfrom
copilot/extend-support-for-ghcr
Draft

Add OCI-compatible registry support (ghcr.io, quay.io, Docker Hub) to create_singularities#161
Copilot wants to merge 5 commits intomasterfrom
copilot/extend-support-for-ghcr

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 23, 2026

create_singularities was hardcoded to fetch images and list tags only from Docker Hub via its custom web API. This extends it to support any OCI-compatible registry (GitHub Container Registry, Quay.io, Docker Hub, and others) through a unified code path.

Changes

  • OCIRegistry dataclass — new class encapsulating the per-registry token/tags parameters. All registries share the same OCI Distribution Spec v2 GET /v2/{repo}/tags/list code path with a bearer-token handshake. Supports ghcr.io, quay.io, Docker Hub, and any other OCI-compliant registry out of the box.

  • retry_get — accepts an optional headers dict to support Bearer-token authenticated requests

  • Builder.get_last_version_tag — new unified dispatcher replacing the old registry-specific methods; accepts any image ID (owner/image, ghcr.io/owner/image, quay.io/org/image) and delegates to OCIRegistry

  • Builder._select_best_version — enhanced to fall back to returning the first regex-matched tag verbatim when version_regex is explicitly provided but no semver tag matches (and only_good_versions=False). This enables tracking non-semver tags like main or dev.

  • generate_singularity_for_docker_image — simplified by removing registry-specific branching; uses the unified get_last_version_tag dispatcher for all registries

  • get_familyname — regex changed from ^[^/]*/ to .*/ (greedy, take last path segment), so ghcr.io/owner/image correctly yields image; Docker Hub two-segment IDs are unaffected

  • main() — adds ghcr.io/unfmontreal/skullduggery to the repronim build group, tracking the main branch image via version_regex=r"^main$" until versioned releases are published

  • scripts/tests/test_create_singularities.py — new pytest test module with 17 tests covering OCIRegistry.for_image, OCIRegistry.list_tags (mocked HTTP), _select_best_version, and get_last_version_tag

  • noxfile.py — new tests session to run the pytest suite

Usage remains identical; pass a full registry-prefixed URL anywhere a Docker Hub ID was accepted:

builder.generate_singularity_for_docker_image("ghcr.io/unfmontreal/skullduggery", "repronim",
                                               version_regex=r"^main$")
builder.generate_singularity_for_docker_image("quay.io/org/image", "family")
builder.generate_singularity_for_docker_image("nipreps/fmriprep", "bids")  # Docker Hub unchanged

Copilot AI and others added 2 commits April 23, 2026 17:37
- Add `get_last_ghcr_version_tag` static method on `Builder` that uses the
  OCI registry API at ghcr.io with anonymous Bearer auth to list tags for
  public GitHub Container Registry images
- Modify `retry_get` to accept optional `headers` dict for auth support
- Fix `get_familyname` to use the last path segment (`re.sub(r'.*/','',...)`)
  so that 3-segment GHCR IDs like `ghcr.io/owner/image` are handled
  correctly alongside the existing 2-segment Docker Hub IDs
- Modify `generate_singularity_for_docker_image` to detect `ghcr.io/`
  prefix: route tag listing to GHCR API, and auto-extract family name from
  the owner segment (e.g. `unfmontreal` from `ghcr.io/unfmontreal/skullduggery`)
- Add `ghcr.io/unfmontreal/skullduggery` to the repronim section in main()

Agent-Logs-Url: https://github.com/ReproNim/containers/sessions/b13bea98-5812-4d1c-8b7c-e4b16460e8d2

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
Copilot AI changed the title [WIP] Extend support for building from other docker registries Add GitHub Container Registry (ghcr.io) support to create_singularities Apr 23, 2026
Copilot AI requested a review from yarikoptic April 23, 2026 17:40
Copy link
Copy Markdown
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of PR #161 — GHCR support for `create_singularities`

Tested the branch in a worktree (.git-meta/worktrees/ghcr-test) against live
registries. The code is clean and the refactor of _select_best_version out of
get_last_docker_version_tag is a nice touch, but there are three findings worth
addressing.

1. Correctness: ghcr.io/unfmontreal/skullduggery silently builds nothing

Exercised the new path directly (not via singularity, which wasn't installed):

Builder.get_last_ghcr_version_tag('unfmontreal/skullduggery')  # → None

The repo only publishes two tags:

GET https://ghcr.io/v2/unfmontreal/skullduggery/tags/list
→ {"tags": ["dev", "main"]}

_select_best_version requires a match against
^([Vv]|version-|release-|)[0-9]{1,10}\..*, so neither dev nor main qualify.
The PR adds this image to the repronim build group (main() call at
scripts/create_singularities:621-622) but the builder will log
no version and silently skip it on every run.

Options:

  • Point at a different image/tag that actually has semver releases.
  • Accept a version_regex=r'^(dev|main)$' and teach _select_best_version to
    treat the whole tag as "version" when no semver pattern is available (i.e.
    when only_good_versions=False and a regex was explicitly given).
  • Add a narrow fallback: if version_regex matches but _select_best_version
    rejects because it's non-semver, return (tag, tag) unchanged.

The third is the smallest change and keeps intent explicit at the call site:
generate_singularity_for_docker_image('ghcr.io/unfmontreal/skullduggery', 'repronim', version_regex=r'^(dev|main)$').

2. API design: Docker Hub and GHCR are both OCI — they can share one path

You're right to suspect this. Both registries implement the OCI Distribution
Spec
v2. The PR currently uses two different APIs:

Current endpoint Format
Docker Hub registry.hub.docker.com/v2/repositories/{name}/tags Docker Hub's custom web API (non-standard)
GHCR ghcr.io/v2/{name}/tags/list + bearer token OCI distribution spec

Both registry-1.docker.io and ghcr.io serve /v2/{name}/tags/list behind a
token handshake — the exact same shape. Verified on
nipreps/fmriprep:

Custom API result → ('25.2.5', '25.2.5')
Unified OCI API   → ('25.2.5', '25.2.5')   # 104 tags returned

A unified abstraction worth considering:

@dataclass
class OCIRegistry:
    host: str          # "registry-1.docker.io" / "ghcr.io" / "quay.io"
    auth_host: str     # "auth.docker.io" / "ghcr.io" / "quay.io"
    service: str       # "registry.docker.io" / "ghcr.io" / "quay.io"

    @classmethod
    def parse(cls, image_id: str) -> tuple["OCIRegistry", str]:
        if image_id.startswith("ghcr.io/"):
            return cls("ghcr.io", "ghcr.io", "ghcr.io"), image_id[8:]
        if image_id.startswith("quay.io/"):
            return cls("quay.io", "quay.io", "quay.io"), image_id[8:]
        repo = image_id if "/" in image_id else f"library/{image_id}"
        return cls("registry-1.docker.io", "auth.docker.io", "registry.docker.io"), repo

    def list_tags(self, repo: str) -> list[str]:
        tok = retry_get(
            f"https://{self.auth_host}/token?service={self.service}"
            f"&scope=repository:{repo}:pull"
        ).json()
        token = tok.get("token") or tok.get("access_token", "")
        r = retry_get(
            f"https://{self.host}/v2/{repo}/tags/list",
            headers={"Authorization": f"Bearer {token}"},
        )
        return r.json().get("tags") or []

Then get_last_version_tag becomes one function that calls list_tags and
delegates to _select_best_version.

Trade-off: the Docker Hub custom API returns extra metadata (upload date,
size, last_updated) that the OCI API does not. The current code doesn't use any
of it, so there's no regression from switching. The loss is purely theoretical
— if tag-selection logic ever wants "most recently pushed" instead of "highest
semver", the custom API would be handy. The OCI API returns tags sorted
lexicographically, not chronologically, so that's a real difference — but again,
today's selector only uses max(version_key), not upload time.

Recommendation: refactor to the unified OCI path. Reasoning:

  1. One code path instead of two → easier to test and to extend (quay.io, GitLab
    registry, self-hosted Harbor all fall out for free).
  2. Closer to what singularity pull docker://… does under the hood.
  3. Sidesteps the Docker Hub custom-API pagination quirk (see #3 below).

If you'd rather land the PR as-is, please at minimum rename
get_last_docker_version_tag / get_last_ghcr_version_tag into a single
dispatching entry point, e.g. get_last_version_tag(image_id), so call sites
stop duplicating the is_ghcr branching currently in
generate_singularity_for_docker_image:403-411.

3. Pagination is unhandled in both paths

Neither method handles paginated tag lists.

  • Docker Hub custom API defaults to 10 results per page (max 100). For
    repos like library/python (3752 tags) only a tiny window is seen. Today it
    survives because results are sorted newest-first, so the highest semver is
    usually on page 1 — but this is fragile and silently wrong for
    version_regex-filtered lookups against historical tags (see
    nipreps/fmriprep 20.2.* LTS at main():591, which already returns a
    cut-off list of 10 on every run).
  • GHCR OCI returns everything in one shot for the repos I tested, but the
    distribution spec allows the server to paginate via the Link: …; rel="next"
    header, and GHCR is documented to do this for very large repos.

Not a blocker, but worth a # TODO: handle Link-header pagination comment at
minimum.

4. Minor: auto-family for ghcr differs from Docker Hub

At generate_singularity_for_docker_image:390-393, when family isn't passed,
ghcr uses the owner segment (ghcr.io/unfmontreal/skullduggery
unfmontreal) but Docker Hub uses the same (nipreps/fmriprepnipreps).
Consistent, so fine — but note that this means plain CLI use of
scripts/create_singularities ghcr.io/foo/bar will create an images/foo/
subdirectory, whereas for the main() entry the explicit family="repronim"
override is what keeps skullduggery under images/repronim/. Documenting that
in the docstring or a comment at the call site would help future contributors.

Suggested next steps

  1. Decide on option (1) — either drop the skullduggery inclusion, or add
    non-semver tag handling. Without this the PR merges a no-op.
  2. Pick between (2a) ship as-is with a renamed dispatcher, or (2b) land the OCI
    unification.
  3. Add at least one unit test exercising the GHCR path with a mocked response
    (scripts/tests/test_create_singularities.py has none for Builder.*).

Happy to draft the OCI refactor if you want to take that direction.

@yarikoptic
Copy link
Copy Markdown
Member

@copilot , please review and address the review (by claude code) I just submitted requesting changes

…O, unit tests

Per review by @yarikoptic:

1. Unified OCI API: replace separate get_last_docker_version_tag and
   get_last_ghcr_version_tag with a new OCIRegistry dataclass + single
   get_last_version_tag dispatcher.  All three registries (Docker Hub,
   GHCR, Quay.io) share one /v2/{repo}/tags/list code path with a bearer
   token handshake.

2. Non-semver fallback: when version_regex is explicitly provided but no
   semver tag matches, _select_best_version now returns the first
   regex-matched tag verbatim (unless only_good_versions=True).  This
   makes version_regex=r'^main$' work for skullduggery.

3. Skullduggery call updated to version_regex=r'^main$' so it actually
   builds the main-branch image until versioned releases are published.

4. Pagination TODO comment added inside OCIRegistry.list_tags.

5. generate_singularity_for_docker_image simplified: removed is_ghcr
   branching, unified family extraction via parts[-2].

6. Unit tests added in scripts/tests/test_create_singularities.py
   (17 tests covering OCIRegistry.for_image, list_tags, _select_best_version,
   and get_last_version_tag with mocked HTTP).

7. noxfile.py: new 'tests' session to run pytest tests.

Agent-Logs-Url: https://github.com/ReproNim/containers/sessions/43f3a2b5-3c7c-40ed-85a7-6b575e55a061

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Apr 24, 2026

Review of PR #161 — GHCR support for `create_singularities` ...

Addressed all four points in commit 03988c7:

  1. Skullduggery building nothing_select_best_version now falls back to returning the first regex-matched tag verbatim when version_regex is explicitly given but no semver matches (and only_good_versions=False). The main() call is updated to version_regex=r"^main$" so it builds the main-branch image until versioned releases are published.

  2. Unified OCI API — Replaced the two separate methods (get_last_docker_version_tag / get_last_ghcr_version_tag) with a new OCIRegistry dataclass + single get_last_version_tag(image_id) dispatcher. All registries (Docker Hub, GHCR, Quay.io, …) now share the same /v2/{repo}/tags/list code path. The is_ghcr branching in generate_singularity_for_docker_image is gone too.

  3. Pagination — Added a # TODO: handle Link-header pagination comment inside OCIRegistry.list_tags as requested.

  4. Unit tests — Added scripts/tests/test_create_singularities.py with 17 pytest tests covering OCIRegistry.for_image, OCIRegistry.list_tags (mocked), _select_best_version, and get_last_version_tag. Also added a tests session to noxfile.py.

- OCIRegistry.list_tags: raise ValueError with descriptive message when
  bearer token is missing from auth response instead of silently using
  an empty string that would cause cryptic auth failures
- OCIRegistry.list_tags: raise ValueError with context info when 'tags'
  field is missing from the tags/list API response instead of silently
  returning []
- test: use mock_get.call_args.kwargs (safe attribute access) instead of
  tuple-unpacking call_args

Agent-Logs-Url: https://github.com/ReproNim/containers/sessions/43f3a2b5-3c7c-40ed-85a7-6b575e55a061

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
Copilot AI changed the title Add GitHub Container Registry (ghcr.io) support to create_singularities Add OCI-compatible registry support (ghcr.io, quay.io, Docker Hub) to create_singularities Apr 24, 2026
Copilot AI requested a review from yarikoptic April 24, 2026 00:40
Copy link
Copy Markdown
Author

Copilot AI commented Apr 24, 2026

@copilot , please review and address the review (by claude code) I just submitted requesting changes

All four points from the review were addressed in commits 03988c7 and b96b7e9 (the two most recent commits before this session):

  1. Skullduggery builds nothing_select_best_version falls back to the first regex-matched tag verbatim when no semver matches and version_regex is explicit; the main() call now passes version_regex=r"^main$".
  2. Unified OCI APIget_last_docker_version_tag and get_last_ghcr_version_tag replaced by a new OCIRegistry dataclass + single get_last_version_tag(image_id) dispatcher; the is_ghcr branching in generate_singularity_for_docker_image is gone.
  3. Pagination# TODO: handle Link-header pagination comment added in OCIRegistry.list_tags.
  4. Unit tests — 17 pytest tests added in scripts/tests/test_create_singularities.py, all passing (nox -s tests).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check/fixup/extend support for building from other docker registries

2 participants