Add MLX backend for Apple Silicon#1199
Draft
ChinChangYang wants to merge 1 commit into
Draft
Conversation
b544c66 to
dcf296a
Compare
Introduces a new neural-net backend (USE_BACKEND=MLX) targeting Apple
Silicon via Apple's MLX framework. The backend implements the full
nninterface contract (model load, batched evaluation, FP16/FP32 paths)
and ships with a Winograd 3x3 convolution path plus an adaptive
per-shape tuner that picks the fastest implementation for each
conv-3x3 shape at model load.
Backend
- cpp/neuralnet/mlxbackend.cpp: backend implementation. Supports
variable board sizes via input masking (same nnXLen/nnYLen
contract as other backends; the global COMPILE_MAX_BOARD_LEN
bound still applies). FP16/FP32 selected by the mlxUseFP16 config
(default auto -> fp16); same input feature layout as the other
backends. Mish activation runs FP16-safe (asserts on
ACTIVATION_MISH_SCALE8 so out-of-range variants are caught
explicitly rather than silently truncated).
- cpp/neuralnet/mlxwinograd.h: F(4x4, 3x3) Winograd transform with
fused activation + residual add.
- cpp/neuralnet/mlxwinotuner.{cpp,h}: per-shape Winograd tuner with
adaptive scoring (rotates the candidate set per shape, scores by
median-time delta against a baked-default baseline). Logs the
conv-3x3 shape distribution at model load.
- cpp/neuralnet/mlxtests.cpp: unit tests for the Winograd path
and tuner numeric-consistency, gated under runnnlayertests.
Build / wiring
- cpp/CMakeLists.txt: USE_BACKEND=MLX target. MLX requires CMake
3.27 (cmake_minimum_required stays at 3.18.2 so other backends
keep building on older CMake). Links Homebrew's prebuilt
libmlx.dylib; OSX deployment target intentionally not pinned so
the executable's minos matches the dylib it was linked against.
- cpp/main.cpp, cpp/program/setup.cpp, cpp/command/benchmark.cpp:
wire MLX into backend selection / benchmark.
- cpp/configs/{gtp,analysis,match,contribute}_example.cfg: document
mlxUseFP16 (auto / true / false), default auto -> fp16.
- Compiling.md: build instructions for the MLX backend.
Validation
- Cross-backend validation against an Eigen reference (testgpuerror)
for b18c384nbt, b40v8, and humanv0 nets shows FP32 max winrate
error 0.00095% and FP16 max 2.63%, well within the existing
backend tolerances.
This is the squash of 130 commits from feature/mlx-backend.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dcf296a to
81b00db
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a new neural-net backend (
USE_BACKEND=MLX) targeting AppleSilicon via Apple's MLX framework.
It implements the full
nninterface.hcontract (model load, batchedevaluation, FP16/FP32 paths) and ships with a Winograd 3x3 convolution
path plus an adaptive per-shape tuner that picks the fastest
implementation for each conv-3x3 shape at model load.
What's new
Backend (
cpp/neuralnet/)mlxbackend.cpp— backend implementation. Supports variable boardsizes via input masking (same
nnXLen/nnYLencontract as otherbackends; the global
COMPILE_MAX_BOARD_LENbound still applies),FP16/FP32 selected by
mlxUseFP16(defaultauto→ fp16), and thesame input feature layout as the other backends. Mish runs FP16-safe;
the code asserts on
ACTIVATION_MISH_SCALE8so out-of-range variantsfail loudly rather than truncate silently.
mlxwinograd.h— F(4×4, 3×3) Winograd transform with fusedactivation + residual add.
mlxwinotuner.{cpp,h}— per-shape Winograd tuner with adaptivescoring (rotates the candidate set per shape, scores by median-time
delta against a baked-default baseline). Logs the conv-3x3 shape
distribution at model load.
mlxtests.cpp— Winograd + tuner numeric-consistency tests,gated under
runnnlayertests.Build / wiring
cpp/CMakeLists.txt—USE_BACKEND=MLXtarget. MLX needs CMake3.27;
cmake_minimum_requiredstays at 3.18.2 so other backendskeep building on older CMake. Links Homebrew's prebuilt
libmlx.dylib; OSX deployment target is intentionally not pinnedso the executable's
minosmatches the linked dylib.cpp/main.cpp,cpp/program/setup.cpp,cpp/command/benchmark.cpp— wire MLX into backend selection / benchmark.
cpp/configs/{gtp,analysis,match,contribute}_example.cfg—document
mlxUseFP16(auto/true/false, defaultauto→ fp16).Compiling.md— build instructions.How to build
cd cpp cmake -G Ninja -DUSE_BACKEND=MLX ninjaRequires CMake ≥ 3.27 and
brew install mlx.Validation
Cross-backend validation against an Eigen reference (
testgpuerror)on b18c384nbt, b40v8, and humanv0 nets:
Both well within the existing tolerances used by other backends.
Status
Draft — opening for early feedback on the backend's structure and
the tuner approach before promoting to ready-for-review.