Skip to content

Add NVIDIA CUDA builds#5

Open
Phqen1x wants to merge 4 commits into
lemonade-sdk:lemonadefrom
Phqen1x:lemonade
Open

Add NVIDIA CUDA builds#5
Phqen1x wants to merge 4 commits into
lemonade-sdk:lemonadefrom
Phqen1x:lemonade

Conversation

@Phqen1x
Copy link
Copy Markdown

@Phqen1x Phqen1x commented May 25, 2026

This PR adds a parallel CUDA build pipeline to the repository so that both AMD ROCm and NVIDIA CUDA users can get nightly llama.cpp builds from the same release stream.

What's added

Build pipeline (build-llamacpp-cuda.yml)

  • Matrix build over 7 CUDA sm_ targets: sm_75 (Turing) through sm_120 (Blackwell consumer). Ubuntu builds all 7 targets; Windows currently produces 6 (sm_75–sm_100; sm_120 was not present in the b1016 release assets and may reflect a build failure or a later addition to the Windows matrix)
  • Ubuntu produces .tar.xz archives; Windows produces .7z archives via the pre-installed 7-Zip for LZMA2 compression
  • Bundles libcudart, libcublas, libcublasLt, and libcurand alongside the binaries; libcuda.so / nvcuda.dll are intentionally excluded (driver-provided, not redistributable)
  • Uses cp -a to preserve .so symlink chains in the archive instead of duplicating file data
  • Sets $ORIGIN RPATH on all ELF binaries/libraries for portable extraction without LD_LIBRARY_PATH
  • Strips debug symbols after RPATH patching
  • Runs a dynamic linker check (ldd) before packaging — fails if any dependency other than libcuda.so is unresolved
  • Scheduled nightly at 15:00 UTC (two hours after ROCm's 13:00 UTC); both share the same sequential b#### release tag space
  • Manual workflow_dispatch with overrides for sm_targets, cuda_version, and llamacpp_version

Smoke test (no GPU required, runs on GitHub-hosted runners)

  • Composite action .github/actions/test-llamacpp-cuda-build — verifies --version exits 0, required CUDA libraries (libcudart, libcublas, libcublasLt, libcurand) are present, and RPATH contains $ORIGIN
  • Runs automatically after every Ubuntu build before the release is created
  • Standalone test-llamacpp-cuda.yml workflow for validating any past release on demand

Developer tooling

  • tests/test_build_smoke_cuda.py — pytest suite with a gpu marker for GPU-aware checks; CPU-only checks run without hardware
  • scripts/validate_cuda_setup.sh — local environment validator (nvidia-smi, nvcc, binary, --version, --list-devices) with PASS/WARN/FAIL summary
  • utils/gather_required_libs_cuda.pyldd-based helper to discover and copy CUDA runtime libraries from a fresh build
  • docs/manual_instructions_cuda.md — manual build steps for Ubuntu and Windows

Housekeeping

  • .gitignore added (was absent)
  • utils/gather_required_libs.py renamed to gather_required_libs_rocm.py for clarity
  • docs/manual_instructions.md header updated to link both build guides
  • README.md rewritten to document both backends: GPU target tables, download matrices, CI/testing overview, dependencies, and repo layout

Release asset naming

ROCm (unchanged)

llama-bXXXX-ubuntu-rocm-<gfx_target>-x64.zip
llama-bXXXX-windows-rocm-<gfx_target>-x64.zip

CUDA (new)

llama-bXXXX-ubuntu-cuda-<sm_target>-x64.tar.xz
llama-bXXXX-windows-cuda-<sm_target>-x64.7z

Notes

  • The two build workflows are fully independent and will not collide at release creation time since they run at different scheduled hours. Manual concurrent runs could theoretically race on the tag number; the tag-generation step reads all existing releases immediately before creating, so the window is small.
  • libcuda.so / nvcuda.dll are explicitly excluded; a compatible NVIDIA driver must be present on the target system (it provides the driver library, which cannot be redistributed).
  • The CUDA build addresses the feedback in [Phqen1x/llamacpp-cuda#2](Improving Linux build Phqen1x/llamacpp-cuda#2) (separate repo), which identified the missing libcudart bundling on Linux.

Phqen1x and others added 4 commits May 18, 2026 20:18
Adds ubuntu-22-cuda and windows-cuda jobs covering all SM architectures
that lemonade supports (sm_75 through sm_120). Each arch builds as an
independent matrix entry (fail-fast: false).

Linux artifacts are packaged as .tar.xz and Windows as .7z, matching the
filenames lemonade expects when downloading CUDA backends at runtime.
The release job moves these files as-is — filenames are not tag-prefixed
because lemonade uses the release tag in the download URL, not embedded
in the filename.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hardcoded version 14.44.35112 no longer exists after runner toolset update.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add continue-on-error: true to windows-cpu so dependent jobs treat it
as succeeded regardless of outcome. Guard the CPU artifact merge step
to skip gracefully when the zip is absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the existing GitHub Actions release workflow to produce and publish NVIDIA CUDA build artifacts alongside the existing CPU and AMD ROCm artifacts.

Changes:

  • Add Ubuntu 22.04 CUDA matrix builds across multiple sm_ targets and upload .tar.xz artifacts.
  • Add Windows CUDA matrix builds across multiple sm_ targets and upload .7z artifacts.
  • Update the release job to collect CUDA artifacts and upload additional archive formats.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +191 to +193
cp -v ${cuda_lib}/libcublas.so* build/bin/ 2>/dev/null || true
cp -v ${cuda_lib}/libcublasLt.so* build/bin/ 2>/dev/null || true
cp -v ${cuda_lib}/libcurand.so* build/bin/ 2>/dev/null || true
- name: Set RPATH for portable distribution
run: |
for f in build/bin/*.so* build/bin/llama-*; do
[ -f "$f" ] && ! [ -L "$f" ] && patchelf --set-rpath '$ORIGIN' "$f" 2>/dev/null || true
- name: Pack artifacts
id: pack_artifacts
run: |
tar -cJf llama-ubuntu-cuda-${{ matrix.sm }}-x64.tar.xz -C ./build/bin .
Comment on lines +531 to +543
echo "Moving CUDA tar.xz artifacts to release (filename used as-is by lemonade)..."
for tar_file in artifact/llama-ubuntu-cuda-*.tar.xz; do
[ -f "$tar_file" ] || continue
echo "Moving $tar_file"
mv "$tar_file" "release/$(basename "$tar_file")"
done

echo "Moving CUDA .7z artifacts to release (filename used as-is by lemonade)..."
for z_file in artifact/llama-windows-cuda-*.7z; do
[ -f "$z_file" ] || continue
echo "Moving $z_file"
mv "$z_file" "release/$(basename "$z_file")"
done
Comment on lines 583 to +589
**Linux:**
- [Ubuntu x64 (ROCm 7.13)](https://github.com/lemonade-sdk/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-ubuntu-rocm-7.13-x64.tar.gz)
- [Ubuntu x64 (ROCm 7.13)](https://github.com/Phqen1x/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-ubuntu-rocm-7.13-x64.tar.gz)
- Ubuntu x64 (CUDA): `llama-ubuntu-cuda-sm_XX-x64.tar.xz` (replace XX with your GPU compute capability)

**Windows:**
- [Windows x64 (ROCm 7.13)](https://github.com/lemonade-sdk/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-win-rocm-7.13-x64.zip)
- [Windows x64 (ROCm 7.13)](https://github.com/Phqen1x/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-win-rocm-7.13-x64.zip)
- Windows x64 (CUDA): `llama-windows-cuda-sm_XX-x64.7z` (replace XX with your GPU compute capability)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants