Add MLX backend for Apple Silicon by ChinChangYang · Pull Request #1199 · lightvector/KataGo

ChinChangYang · 2026-05-23T04:27:34Z

This PR adds a new neural-net backend (USE_BACKEND=MLX) targeting Apple
Silicon via Apple's MLX framework.
It implements the full nninterface.h contract (model load, batched
evaluation, FP16/FP32 paths) and ships with a Winograd 3x3 convolution
path plus an adaptive per-shape tuner that picks the fastest
implementation for each conv-3x3 shape at model load.

What's new

Backend (cpp/neuralnet/)

mlxbackend.cpp — backend implementation. Supports variable board
sizes via input masking (same nnXLen / nnYLen contract as other
backends; the global COMPILE_MAX_BOARD_LEN bound still applies),
FP16/FP32 selected by mlxUseFP16 (default auto → fp16), and the
same input feature layout as the other backends. Mish runs FP16-safe;
the code asserts on ACTIVATION_MISH_SCALE8 so out-of-range variants
fail loudly rather than truncate silently.
mlxwinograd.h — F(4×4, 3×3) Winograd transform with fused
activation + residual add.
mlxwinotuner.{cpp,h} — per-shape Winograd tuner with adaptive
scoring (rotates the candidate set per shape, scores by median-time
delta against a baked-default baseline). Logs the conv-3x3 shape
distribution at model load.
mlxtests.cpp — Winograd + tuner numeric-consistency tests,
gated under runnnlayertests.

Build / wiring

cpp/CMakeLists.txt — USE_BACKEND=MLX target. MLX needs CMake
3.27; cmake_minimum_required stays at 3.18.2 so other backends
keep building on older CMake. Links Homebrew's prebuilt
libmlx.dylib; OSX deployment target is intentionally not pinned
so the executable's minos matches the linked dylib.
cpp/main.cpp, cpp/program/setup.cpp, cpp/command/benchmark.cpp
— wire MLX into backend selection / benchmark.
cpp/configs/{gtp,analysis,match,contribute}_example.cfg —
document mlxUseFP16 (auto / true / false, default
auto → fp16).
Compiling.md — build instructions.

How to build

cd cpp
cmake -G Ninja -DUSE_BACKEND=MLX
ninja

Requires CMake ≥ 3.27 and brew install mlx.

Validation

Cross-backend validation against an Eigen reference (testgpuerror)
on b18c384nbt, b40v8, and humanv0 nets:

FP32: max winrate error 0.00095%
FP16: max winrate error 2.63%

Both well within the existing tolerances used by other backends.

Status

Draft — opening for early feedback on the backend's structure and
the tuner approach before promoting to ready-for-review.

Introduces a new neural-net backend (USE_BACKEND=MLX) targeting Apple Silicon via Apple's MLX framework. The backend implements the full nninterface contract (model load, batched evaluation, FP16/FP32 paths) and ships with a Winograd 3x3 convolution path plus an adaptive per-shape tuner that picks the fastest implementation for each conv-3x3 shape at model load. Backend - cpp/neuralnet/mlxbackend.cpp: backend implementation. Supports variable board sizes via input masking (same nnXLen/nnYLen contract as other backends; the global COMPILE_MAX_BOARD_LEN bound still applies). FP16/FP32 selected by the mlxUseFP16 config (default auto -> fp16); same input feature layout as the other backends. Mish activation runs FP16-safe (asserts on ACTIVATION_MISH_SCALE8 so out-of-range variants are caught explicitly rather than silently truncated). - cpp/neuralnet/mlxwinograd.h: F(4x4, 3x3) Winograd transform with fused activation + residual add. - cpp/neuralnet/mlxwinotuner.{cpp,h}: per-shape Winograd tuner with adaptive scoring (rotates the candidate set per shape, scores by median-time delta against a baked-default baseline). Logs the conv-3x3 shape distribution at model load. - cpp/neuralnet/mlxtests.cpp: unit tests for the Winograd path and tuner numeric-consistency, gated under runnnlayertests. Build / wiring - cpp/CMakeLists.txt: USE_BACKEND=MLX target. MLX requires CMake 3.27 (cmake_minimum_required stays at 3.18.2 so other backends keep building on older CMake). Links Homebrew's prebuilt libmlx.dylib; OSX deployment target intentionally not pinned so the executable's minos matches the dylib it was linked against. - cpp/main.cpp, cpp/program/setup.cpp, cpp/command/benchmark.cpp: wire MLX into backend selection / benchmark. - cpp/configs/{gtp,analysis,match,contribute}_example.cfg: document mlxUseFP16 (auto / true / false), default auto -> fp16. - Compiling.md: build instructions for the MLX backend. Validation - Cross-backend validation against an Eigen reference (testgpuerror) for b18c384nbt, b40v8, and humanv0 nets shows FP32 max winrate error 0.00095% and FP16 max 2.63%, well within the existing backend tolerances. This is the squash of 130 commits from feature/mlx-backend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ChinChangYang force-pushed the mlx-backend-squash branch from b544c66 to dcf296a Compare May 23, 2026 04:32

ChinChangYang mentioned this pull request May 23, 2026

Add MLX backend for Apple Silicon neural network inference ChinChangYang/KataGo#16

Closed

ChinChangYang force-pushed the mlx-backend-squash branch from dcf296a to 81b00db Compare May 23, 2026 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLX backend for Apple Silicon#1199

Add MLX backend for Apple Silicon#1199
ChinChangYang wants to merge 1 commit into
lightvector:masterfrom
ChinChangYang:mlx-backend-squash

ChinChangYang commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChinChangYang commented May 23, 2026

What's new

How to build

Validation

Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant