Add NVIDIA CUDA builds by Phqen1x · Pull Request #5 · lemonade-sdk/llama.cpp

Phqen1x · 2026-05-25T04:39:24Z

This PR adds a parallel CUDA build pipeline to the repository so that both AMD ROCm and NVIDIA CUDA users can get nightly llama.cpp builds from the same release stream.

What's added

Build pipeline (`build-llamacpp-cuda.yml`)

Matrix build over 7 CUDA sm_ targets: sm_75 (Turing) through sm_120 (Blackwell consumer). Ubuntu builds all 7 targets; Windows currently produces 6 (sm_75–sm_100; sm_120 was not present in the b1016 release assets and may reflect a build failure or a later addition to the Windows matrix)
Ubuntu produces .tar.xz archives; Windows produces .7z archives via the pre-installed 7-Zip for LZMA2 compression
Bundles libcudart, libcublas, libcublasLt, and libcurand alongside the binaries; libcuda.so / nvcuda.dll are intentionally excluded (driver-provided, not redistributable)
Uses cp -a to preserve .so symlink chains in the archive instead of duplicating file data
Sets $ORIGIN RPATH on all ELF binaries/libraries for portable extraction without LD_LIBRARY_PATH
Strips debug symbols after RPATH patching
Runs a dynamic linker check (ldd) before packaging — fails if any dependency other than libcuda.so is unresolved
Scheduled nightly at 15:00 UTC (two hours after ROCm's 13:00 UTC); both share the same sequential b#### release tag space
Manual workflow_dispatch with overrides for sm_targets, cuda_version, and llamacpp_version

Smoke test (no GPU required, runs on GitHub-hosted runners)

Composite action .github/actions/test-llamacpp-cuda-build — verifies --version exits 0, required CUDA libraries (libcudart, libcublas, libcublasLt, libcurand) are present, and RPATH contains $ORIGIN
Runs automatically after every Ubuntu build before the release is created
Standalone test-llamacpp-cuda.yml workflow for validating any past release on demand

Developer tooling

tests/test_build_smoke_cuda.py — pytest suite with a gpu marker for GPU-aware checks; CPU-only checks run without hardware
scripts/validate_cuda_setup.sh — local environment validator (nvidia-smi, nvcc, binary, --version, --list-devices) with PASS/WARN/FAIL summary
utils/gather_required_libs_cuda.py — ldd-based helper to discover and copy CUDA runtime libraries from a fresh build
docs/manual_instructions_cuda.md — manual build steps for Ubuntu and Windows

Housekeeping

.gitignore added (was absent)
utils/gather_required_libs.py renamed to gather_required_libs_rocm.py for clarity
docs/manual_instructions.md header updated to link both build guides
README.md rewritten to document both backends: GPU target tables, download matrices, CI/testing overview, dependencies, and repo layout

Release asset naming

ROCm (unchanged)

llama-bXXXX-ubuntu-rocm-<gfx_target>-x64.zip
llama-bXXXX-windows-rocm-<gfx_target>-x64.zip

CUDA (new)

llama-bXXXX-ubuntu-cuda-<sm_target>-x64.tar.xz
llama-bXXXX-windows-cuda-<sm_target>-x64.7z

Notes

The two build workflows are fully independent and will not collide at release creation time since they run at different scheduled hours. Manual concurrent runs could theoretically race on the tag number; the tag-generation step reads all existing releases immediately before creating, so the window is small.
libcuda.so / nvcuda.dll are explicitly excluded; a compatible NVIDIA driver must be present on the target system (it provides the driver library, which cannot be redistributed).
The CUDA build addresses the feedback in [Phqen1x/llamacpp-cuda#2](Improving Linux build Phqen1x/llamacpp-cuda#2) (separate repo), which identified the missing libcudart bundling on Linux.

Adds ubuntu-22-cuda and windows-cuda jobs covering all SM architectures that lemonade supports (sm_75 through sm_120). Each arch builds as an independent matrix entry (fail-fast: false). Linux artifacts are packaged as .tar.xz and Windows as .7z, matching the filenames lemonade expects when downloading CUDA backends at runtime. The release job moves these files as-is — filenames are not tag-prefixed because lemonade uses the release tag in the download URL, not embedded in the filename. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Hardcoded version 14.44.35112 no longer exists after runner toolset update. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add continue-on-error: true to windows-cpu so dependent jobs treat it as succeeded regardless of outcome. Guard the CPU artifact merge step to skip gracefully when the zip is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add cuda

Copilot

Pull request overview

This PR extends the existing GitHub Actions release workflow to produce and publish NVIDIA CUDA build artifacts alongside the existing CPU and AMD ROCm artifacts.

Changes:

Add Ubuntu 22.04 CUDA matrix builds across multiple sm_ targets and upload .tar.xz artifacts.
Add Windows CUDA matrix builds across multiple sm_ targets and upload .7z artifacts.
Update the release job to collect CUDA artifacts and upload additional archive formats.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+          cp -v ${cuda_lib}/libcublas.so*   build/bin/ 2>/dev/null || true
+          cp -v ${cuda_lib}/libcublasLt.so* build/bin/ 2>/dev/null || true
+          cp -v ${cuda_lib}/libcurand.so*   build/bin/ 2>/dev/null || true


+      - name: Set RPATH for portable distribution
+        run: |
+          for f in build/bin/*.so* build/bin/llama-*; do
+            [ -f "$f" ] && ! [ -L "$f" ] && patchelf --set-rpath '$ORIGIN' "$f" 2>/dev/null || true


+      - name: Pack artifacts
+        id: pack_artifacts
+        run: |
+          tar -cJf llama-ubuntu-cuda-${{ matrix.sm }}-x64.tar.xz -C ./build/bin .


+          echo "Moving CUDA tar.xz artifacts to release (filename used as-is by lemonade)..."
+          for tar_file in artifact/llama-ubuntu-cuda-*.tar.xz; do
+            [ -f "$tar_file" ] || continue
+            echo "Moving $tar_file"
+            mv "$tar_file" "release/$(basename "$tar_file")"
+          done
+
+          echo "Moving CUDA .7z artifacts to release (filename used as-is by lemonade)..."
+          for z_file in artifact/llama-windows-cuda-*.7z; do
+            [ -f "$z_file" ] || continue
+            echo "Moving $z_file"
+            mv "$z_file" "release/$(basename "$z_file")"
+          done


            **Linux:**
-            - [Ubuntu x64 (ROCm 7.13)](https://github.com/lemonade-sdk/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-ubuntu-rocm-7.13-x64.tar.gz)
+            - [Ubuntu x64 (ROCm 7.13)](https://github.com/Phqen1x/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-ubuntu-rocm-7.13-x64.tar.gz)
+            - Ubuntu x64 (CUDA): `llama-ubuntu-cuda-sm_XX-x64.tar.xz` (replace XX with your GPU compute capability)

            **Windows:**
-            - [Windows x64 (ROCm 7.13)](https://github.com/lemonade-sdk/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-win-rocm-7.13-x64.zip)
+            - [Windows x64 (ROCm 7.13)](https://github.com/Phqen1x/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-win-rocm-7.13-x64.zip)
+            - Windows x64 (CUDA): `llama-windows-cuda-sm_XX-x64.7z` (replace XX with your GPU compute capability)


Phqen1x and others added 4 commits May 18, 2026 20:18

fix: dynamically resolve MSVC redist path in windows-cpu pack step

a6937db

Hardcoded version 14.44.35112 no longer exists after runner toolset update. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge pull request #1 from Phqen1x/add-cuda

57bf9ad

Add cuda

kenvandine requested a review from Copilot May 25, 2026 13:54

Copilot started reviewing on behalf of kenvandine May 25, 2026 13:55 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NVIDIA CUDA builds#5

Add NVIDIA CUDA builds#5
Phqen1x wants to merge 4 commits into
lemonade-sdk:lemonadefrom
Phqen1x:lemonade

Phqen1x commented May 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Phqen1x commented May 25, 2026

What's added

Build pipeline (build-llamacpp-cuda.yml)

Smoke test (no GPU required, runs on GitHub-hosted runners)

Developer tooling

Housekeeping

Release asset naming

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Build pipeline (`build-llamacpp-cuda.yml`)