Add NVIDIA CUDA builds#5
Open
Phqen1x wants to merge 4 commits into
Open
Conversation
Adds ubuntu-22-cuda and windows-cuda jobs covering all SM architectures that lemonade supports (sm_75 through sm_120). Each arch builds as an independent matrix entry (fail-fast: false). Linux artifacts are packaged as .tar.xz and Windows as .7z, matching the filenames lemonade expects when downloading CUDA backends at runtime. The release job moves these files as-is — filenames are not tag-prefixed because lemonade uses the release tag in the download URL, not embedded in the filename. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hardcoded version 14.44.35112 no longer exists after runner toolset update. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add continue-on-error: true to windows-cpu so dependent jobs treat it as succeeded regardless of outcome. Guard the CPU artifact merge step to skip gracefully when the zip is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add cuda
There was a problem hiding this comment.
Pull request overview
This PR extends the existing GitHub Actions release workflow to produce and publish NVIDIA CUDA build artifacts alongside the existing CPU and AMD ROCm artifacts.
Changes:
- Add Ubuntu 22.04 CUDA matrix builds across multiple
sm_targets and upload.tar.xzartifacts. - Add Windows CUDA matrix builds across multiple
sm_targets and upload.7zartifacts. - Update the release job to collect CUDA artifacts and upload additional archive formats.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+191
to
+193
| cp -v ${cuda_lib}/libcublas.so* build/bin/ 2>/dev/null || true | ||
| cp -v ${cuda_lib}/libcublasLt.so* build/bin/ 2>/dev/null || true | ||
| cp -v ${cuda_lib}/libcurand.so* build/bin/ 2>/dev/null || true |
| - name: Set RPATH for portable distribution | ||
| run: | | ||
| for f in build/bin/*.so* build/bin/llama-*; do | ||
| [ -f "$f" ] && ! [ -L "$f" ] && patchelf --set-rpath '$ORIGIN' "$f" 2>/dev/null || true |
| - name: Pack artifacts | ||
| id: pack_artifacts | ||
| run: | | ||
| tar -cJf llama-ubuntu-cuda-${{ matrix.sm }}-x64.tar.xz -C ./build/bin . |
Comment on lines
+531
to
+543
| echo "Moving CUDA tar.xz artifacts to release (filename used as-is by lemonade)..." | ||
| for tar_file in artifact/llama-ubuntu-cuda-*.tar.xz; do | ||
| [ -f "$tar_file" ] || continue | ||
| echo "Moving $tar_file" | ||
| mv "$tar_file" "release/$(basename "$tar_file")" | ||
| done | ||
|
|
||
| echo "Moving CUDA .7z artifacts to release (filename used as-is by lemonade)..." | ||
| for z_file in artifact/llama-windows-cuda-*.7z; do | ||
| [ -f "$z_file" ] || continue | ||
| echo "Moving $z_file" | ||
| mv "$z_file" "release/$(basename "$z_file")" | ||
| done |
Comment on lines
583
to
+589
| **Linux:** | ||
| - [Ubuntu x64 (ROCm 7.13)](https://github.com/lemonade-sdk/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-ubuntu-rocm-7.13-x64.tar.gz) | ||
| - [Ubuntu x64 (ROCm 7.13)](https://github.com/Phqen1x/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-ubuntu-rocm-7.13-x64.tar.gz) | ||
| - Ubuntu x64 (CUDA): `llama-ubuntu-cuda-sm_XX-x64.tar.xz` (replace XX with your GPU compute capability) | ||
|
|
||
| **Windows:** | ||
| - [Windows x64 (ROCm 7.13)](https://github.com/lemonade-sdk/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-win-rocm-7.13-x64.zip) | ||
| - [Windows x64 (ROCm 7.13)](https://github.com/Phqen1x/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-win-rocm-7.13-x64.zip) | ||
| - Windows x64 (CUDA): `llama-windows-cuda-sm_XX-x64.7z` (replace XX with your GPU compute capability) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a parallel CUDA build pipeline to the repository so that both AMD ROCm and NVIDIA CUDA users can get nightly llama.cpp builds from the same release stream.
What's added
Build pipeline (
build-llamacpp-cuda.yml).tar.xzarchives; Windows produces.7zarchives via the pre-installed 7-Zip for LZMA2 compressionlibcudart,libcublas,libcublasLt, andlibcurandalongside the binaries;libcuda.so/nvcuda.dllare intentionally excluded (driver-provided, not redistributable)cp -ato preserve.sosymlink chains in the archive instead of duplicating file data$ORIGINRPATH on all ELF binaries/libraries for portable extraction withoutLD_LIBRARY_PATHldd) before packaging — fails if any dependency other thanlibcuda.sois unresolvedb####release tag spaceworkflow_dispatchwith overrides forsm_targets,cuda_version, andllamacpp_versionSmoke test (no GPU required, runs on GitHub-hosted runners)
.github/actions/test-llamacpp-cuda-build— verifies--versionexits 0, required CUDA libraries (libcudart,libcublas,libcublasLt,libcurand) are present, and RPATH contains$ORIGINtest-llamacpp-cuda.ymlworkflow for validating any past release on demandDeveloper tooling
tests/test_build_smoke_cuda.py— pytest suite with agpumarker for GPU-aware checks; CPU-only checks run without hardwarescripts/validate_cuda_setup.sh— local environment validator (nvidia-smi,nvcc, binary,--version,--list-devices) with PASS/WARN/FAIL summaryutils/gather_required_libs_cuda.py—ldd-based helper to discover and copy CUDA runtime libraries from a fresh builddocs/manual_instructions_cuda.md— manual build steps for Ubuntu and WindowsHousekeeping
.gitignoreadded (was absent)utils/gather_required_libs.pyrenamed togather_required_libs_rocm.pyfor claritydocs/manual_instructions.mdheader updated to link both build guidesREADME.mdrewritten to document both backends: GPU target tables, download matrices, CI/testing overview, dependencies, and repo layoutRelease asset naming
ROCm (unchanged)
CUDA (new)
Notes
libcuda.so/nvcuda.dllare explicitly excluded; a compatible NVIDIA driver must be present on the target system (it provides the driver library, which cannot be redistributed).libcudartbundling on Linux.