Skip to content

Add MLX backend for Apple Silicon#1199

Draft
ChinChangYang wants to merge 1 commit into
lightvector:masterfrom
ChinChangYang:mlx-backend-squash
Draft

Add MLX backend for Apple Silicon#1199
ChinChangYang wants to merge 1 commit into
lightvector:masterfrom
ChinChangYang:mlx-backend-squash

Conversation

@ChinChangYang
Copy link
Copy Markdown
Contributor

This PR adds a new neural-net backend (USE_BACKEND=MLX) targeting Apple
Silicon via Apple's MLX framework.
It implements the full nninterface.h contract (model load, batched
evaluation, FP16/FP32 paths) and ships with a Winograd 3x3 convolution
path plus an adaptive per-shape tuner that picks the fastest
implementation for each conv-3x3 shape at model load.

What's new

Backend (cpp/neuralnet/)

  • mlxbackend.cpp — backend implementation. Supports variable board
    sizes via input masking (same nnXLen / nnYLen contract as other
    backends; the global COMPILE_MAX_BOARD_LEN bound still applies),
    FP16/FP32 selected by mlxUseFP16 (default auto → fp16), and the
    same input feature layout as the other backends. Mish runs FP16-safe;
    the code asserts on ACTIVATION_MISH_SCALE8 so out-of-range variants
    fail loudly rather than truncate silently.
  • mlxwinograd.h — F(4×4, 3×3) Winograd transform with fused
    activation + residual add.
  • mlxwinotuner.{cpp,h} — per-shape Winograd tuner with adaptive
    scoring (rotates the candidate set per shape, scores by median-time
    delta against a baked-default baseline). Logs the conv-3x3 shape
    distribution at model load.
  • mlxtests.cpp — Winograd + tuner numeric-consistency tests,
    gated under runnnlayertests.

Build / wiring

  • cpp/CMakeLists.txtUSE_BACKEND=MLX target. MLX needs CMake
    3.27; cmake_minimum_required stays at 3.18.2 so other backends
    keep building on older CMake. Links Homebrew's prebuilt
    libmlx.dylib; OSX deployment target is intentionally not pinned
    so the executable's minos matches the linked dylib.
  • cpp/main.cpp, cpp/program/setup.cpp, cpp/command/benchmark.cpp
    — wire MLX into backend selection / benchmark.
  • cpp/configs/{gtp,analysis,match,contribute}_example.cfg
    document mlxUseFP16 (auto / true / false, default
    auto → fp16).
  • Compiling.md — build instructions.

How to build

cd cpp
cmake -G Ninja -DUSE_BACKEND=MLX
ninja

Requires CMake ≥ 3.27 and brew install mlx.

Validation

Cross-backend validation against an Eigen reference (testgpuerror)
on b18c384nbt, b40v8, and humanv0 nets:

  • FP32: max winrate error 0.00095%
  • FP16: max winrate error 2.63%

Both well within the existing tolerances used by other backends.

Status

Draft — opening for early feedback on the backend's structure and
the tuner approach before promoting to ready-for-review.

Introduces a new neural-net backend (USE_BACKEND=MLX) targeting Apple
Silicon via Apple's MLX framework. The backend implements the full
nninterface contract (model load, batched evaluation, FP16/FP32 paths)
and ships with a Winograd 3x3 convolution path plus an adaptive
per-shape tuner that picks the fastest implementation for each
conv-3x3 shape at model load.

Backend
- cpp/neuralnet/mlxbackend.cpp: backend implementation. Supports
  variable board sizes via input masking (same nnXLen/nnYLen
  contract as other backends; the global COMPILE_MAX_BOARD_LEN
  bound still applies). FP16/FP32 selected by the mlxUseFP16 config
  (default auto -> fp16); same input feature layout as the other
  backends. Mish activation runs FP16-safe (asserts on
  ACTIVATION_MISH_SCALE8 so out-of-range variants are caught
  explicitly rather than silently truncated).
- cpp/neuralnet/mlxwinograd.h: F(4x4, 3x3) Winograd transform with
  fused activation + residual add.
- cpp/neuralnet/mlxwinotuner.{cpp,h}: per-shape Winograd tuner with
  adaptive scoring (rotates the candidate set per shape, scores by
  median-time delta against a baked-default baseline). Logs the
  conv-3x3 shape distribution at model load.
- cpp/neuralnet/mlxtests.cpp: unit tests for the Winograd path
  and tuner numeric-consistency, gated under runnnlayertests.

Build / wiring
- cpp/CMakeLists.txt: USE_BACKEND=MLX target. MLX requires CMake
  3.27 (cmake_minimum_required stays at 3.18.2 so other backends
  keep building on older CMake). Links Homebrew's prebuilt
  libmlx.dylib; OSX deployment target intentionally not pinned so
  the executable's minos matches the dylib it was linked against.
- cpp/main.cpp, cpp/program/setup.cpp, cpp/command/benchmark.cpp:
  wire MLX into backend selection / benchmark.
- cpp/configs/{gtp,analysis,match,contribute}_example.cfg: document
  mlxUseFP16 (auto / true / false), default auto -> fp16.
- Compiling.md: build instructions for the MLX backend.

Validation
- Cross-backend validation against an Eigen reference (testgpuerror)
  for b18c384nbt, b40v8, and humanv0 nets shows FP32 max winrate
  error 0.00095% and FP16 max 2.63%, well within the existing
  backend tolerances.

This is the squash of 130 commits from feature/mlx-backend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant