Skip to content

rlglab/wallzero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WallZero: Mastering the Game of WallGo with Strategic Analysis

This is the official repository for the CG 2026 paper WallZero: Mastering the Game of WallGo with Strategic Analysis.

If you use this work for research, please consider citing our paper as follows:

@inproceedings{
chen2026wallzero,
title = {WallZero: Mastering the Game of WallGo with Strategic Analysis},
author = {Hsing-Yu Chen and Jerome Arjonilla and I-Chen Wu and Ti-Rong Wu},
booktitle={Computers and Games},
year={2026},
}

This repository is built upon MiniZero. The following provide the instructions to reproduce the experiments in main text.

Results

WallZero demonstrates strong performance in feature ablations and human matches. The main 1-block models are trained for 500 iterations with 2,000 self-play games and 500 optimization steps per iteration, using 200 MCTS simulations per move during self-play. WallZero also applies data augmentation through board symmetries, including rotations and reflections.

Feature Design Performance

The paper compares four feature configurations: WallZero-B, WallZero-BT, WallZero-BTR, and the full WallZero model (BTRH). Each model is a 1-block residual network trained for approximately 456 GPU-hours, followed by a round-robin tournament with 1,000 games per model pair.

Mode WallZero-B WallZero-BT WallZero-BTR WallZero
Empty Mode (Avg.) 10.65 +/- 1.10% 27.05 +/- 1.59% 79.43 +/- 1.45% 82.87 +/- 1.35%
4-Stone Mode (Avg.) 10.45 +/- 1.09% 28.65 +/- 1.62% 78.88 +/- 1.46% 82.02 +/- 1.37%

The results show that feature design strongly affects performance under the same training budget. The largest improvement comes from adding reachability, while the full feature set achieves the best average win rate in both modes and is used in the remaining experiments.

First-player advantage is evaluated over 5,000 WallZero self-play games per mode:

  • Empty mode: Red win rate 53.57%.
  • 4-stone mode: Red win rate 50.37%.

This suggests that the 4-stone opening used in The Devil's Plan produces a more balanced outcome than empty mode.

Human-AI Evaluation

For practical playing strength, the model is extended to a 10-block residual network with a two-stage curriculum: pre-training on data generated by the 1-block model, followed by 500 training iterations. This requires approximately 49 GPU-hours for pre-training and 902 GPU-hours for subsequent training.

The trained model is tested against two Taiwanese professional Go players, Wei Huang (3-dan) and Chun-Hsun Chou (9-dan), across both modes and both colors. Before the formal matches, both players play practice games to familiarize themselves with WallGo. The formal matches use a 90-second time limit per move and 2,000 MCTS simulations per move for WallZero.

Scores are reported as Human-WallZero territory counts. Values in parentheses denote the ratio of WallZero's territory to the human's.

Player Empty Mode Red Empty Mode Blue 4-Stone Mode Red 4-Stone Mode Blue
3-dan 16-33 (2.06x) 20-29 (1.45x) 14-32 (2.29x) 19-30 (1.58x)
9-dan 12-37 (3.08x) 17-30 (1.76x) 20-29 (1.45x) 12-34 (2.83x)

WallZero wins all eight formal games. Across the matches, it gains 1.98x more territory than the human players on average, and its estimated win rate exceeds 90% before move 20 in every evaluated game.

Training WallZero

This section provides example commands for training WallZero with MiniZero. The examples are editable templates, so adjust GPUs, output folder names, or hyperparameters for your own runs.

Clone the Repository

git clone https://github.com/rlglab/wallzero.git
cd wallzero

Prerequisites

WallZero training requires:

  • Linux
  • at least one NVIDIA GPU
  • build tools required by MiniZero, such as cmake, make, and a C++ compiler
  • the MiniZero training dependencies described in docs/Training.md

Start the Container

All build and training commands below should be executed inside the pre-built MiniZero container, which includes all required dependencies:

scripts/start-container.sh  # requires podman or docker
pip install tensorboard

Once started, the repository is mounted at /workspace inside the container.

Usage

WallZero is an AlphaZero-based agent. You can either pass the az algorithm to tools/quick-run.sh, or pass a generated config file:

tools/quick-run.sh train wallgo az END_ITER [OPTION]...
tools/quick-run.sh train wallgo CONFIG_FILE END_ITER [OPTION]...

Common arguments:

  • END_ITER: total number of training iterations
  • -g: GPUs used for training, for example 0 or 0123
  • -n: explicit output directory name
  • CONFIG_FILE: generated config file, for example wallgo_4-stone_7x7_az_1bx256_n200.cfg
  • -conf_str: overwrite configuration values from the command line

For all supported arguments, run:

tools/quick-run.sh train -h

Generate a Config File First

If you want a config file that you can edit later, generate one from the WallGo executable first:

scripts/build.sh wallgo
build/wallgo/minizero_wallgo -gen wallgo.cfg

After that, you can edit wallgo.cfg directly or keep overriding keys through -conf_str.

Commands for Main Paper Settings

The paper trains full-feature WallZero models with the following settings:

Setting Config key Value
Board size env_board_size 7
Empty mode env_wallgo_init_rule empty
4-stone mode env_wallgo_init_rule 4-stone
Stones per player env_wallgo_stone_per_player 4
MCTS simulations per move actor_num_simulation 200
Training iterations END_ITER / zero_end_iteration 500
Self-play games per iteration zero_num_games_per_iteration 2000
Optimization steps per iteration learner_training_step 500
Batch size learner_batch_size 1024
Learning rate learner_learning_rate 0.02
Network type nn_type_name alphazero
Residual blocks nn_num_blocks 1
Hidden channels nn_num_hidden_channels 256
Board-symmetry augmentation actor_use_random_rotation_features true

The released WallGo environment uses the full WallZero feature set by default, so no feature-selection flags are needed in the commands below.

1-block WallZero in Empty Mode

Generate the config:

build/wallgo/minizero_wallgo -gen wallgo_empty_7x7_az_1bx256_n200.cfg \
  -conf_str env_board_size=7:env_wallgo_init_rule=empty:env_wallgo_stone_per_player=4:actor_num_simulation=200:zero_num_games_per_iteration=2000:learner_training_step=500:learner_batch_size=1024:learner_learning_rate=0.02:nn_type_name=alphazero:nn_num_blocks=1:nn_num_hidden_channels=256:actor_use_random_rotation_features=true

Start training:

tools/quick-run.sh train wallgo wallgo_empty_7x7_az_1bx256_n200.cfg 500 -g 0123 -n wallgo_empty_7x7_az_1bx256_n200

1-block WallZero in 4-Stone Mode

Generate the config:

build/wallgo/minizero_wallgo -gen wallgo_4-stone_7x7_az_1bx256_n200.cfg \
  -conf_str env_board_size=7:env_wallgo_init_rule=4-stone:env_wallgo_stone_per_player=4:actor_num_simulation=200:zero_num_games_per_iteration=2000:learner_training_step=500:learner_batch_size=1024:learner_learning_rate=0.02:nn_type_name=alphazero:nn_num_blocks=1:nn_num_hidden_channels=256:actor_use_random_rotation_features=true

Start training:

tools/quick-run.sh train wallgo wallgo_4-stone_7x7_az_1bx256_n200.cfg 500 -g 0123 -n wallgo_4-stone_7x7_az_1bx256_n200

Larger Model for Human Evaluation

For the human evaluation, the paper uses a larger 10-block model and 2,000 simulations per move during play. The paper uses 1,000 total training iterations in two stages: the first 500 iterations generate SGF records with the trained 1-block model, then the 10-block model is pre-trained from those records and continues with 500 additional AlphaZero training iterations.

Assume the 1-block self-play records are stored in a directory such as:

SGF_DIR=wallgo_4-stone_7x7_az_1bx256_n200/sgf
TRAIN_DIR=wallgo_4-stone_7x7_az_10bx256_n200
PRETRAIN_ITERS=500
END_ITER=1000
PORT=30212

First, generate a 10-block configuration. You can reuse the same settings as the 1-block run above, but change nn_num_blocks from 1 to 10 and save it as a separate config file:

build/wallgo/minizero_wallgo -gen wallgo_4-stone_7x7_az_10bx256_n200.cfg \
  -conf_str env_board_size=7:env_wallgo_init_rule=4-stone:env_wallgo_stone_per_player=4:actor_num_simulation=200:zero_num_games_per_iteration=2000:learner_training_step=500:learner_batch_size=1024:learner_learning_rate=0.02:nn_type_name=alphazero:nn_num_blocks=10:nn_num_hidden_channels=256:actor_use_random_rotation_features=true

Then pre-train the 10-block model from the 1-block SGF records. The --link_sgf option is supported by scripts/zero-server.sh, not by tools/quick-run.sh, so this stage is launched manually with one optimization worker:

scripts/zero-server.sh wallgo wallgo_4-stone_7x7_az_10bx256_n200.cfg ${PRETRAIN_ITERS} -n ${TRAIN_DIR} -g 0123 \
  --link_sgf ${SGF_DIR} \
  -conf_str zero_server_port=${PORT}

In another terminal, start the optimization worker:

scripts/zero-worker.sh wallgo $(hostname) ${PORT} op -g 0123

After pre-training finishes, continue the same 10-block model with 500 additional training iterations. The paper setting uses PRETRAIN_ITERS=500 and END_ITER=1000.

tools/quick-run.sh train wallgo wallgo_4-stone_7x7_az_10bx256_n200.cfg ${END_ITER} -g 0123 -n ${TRAIN_DIR}

When prompted because ${TRAIN_DIR} already exists, choose C to continue. END_ITER is the final total iteration number, so add 500 to the number of pre-training SGF files rather than setting it to only 500.

For interactive testing or evaluation with the stronger model, override the simulation count to 2000.

Other Tips

  • For empty mode and 4-stone mode, the key switch is env_wallgo_init_rule=empty and env_wallgo_init_rule=4-stone, respectively.
  • To match the paper's main full-feature model size, keep nn_num_blocks=1 and nn_num_hidden_channels=256.
  • To change the number of self-play games per iteration, modify zero_num_games_per_iteration.
  • To change the number of optimization steps per iteration, modify learner_training_step.
  • To continue a previous run, reuse the same folder name with a larger END_ITER.
  • For more training details, see docs/Training.md.

Training Results

This section summarizes the files generated by a WallZero training run.

For example, suppose you train the 1-block WallZero model in 4-stone mode with:

Generate the config:

build/wallgo/minizero_wallgo -gen wallgo_4-stone_7x7_az_1bx256_n200.cfg \
  -conf_str env_board_size=7:env_wallgo_init_rule=4-stone:env_wallgo_stone_per_player=4:actor_num_simulation=200:zero_num_games_per_iteration=2000:learner_training_step=500:learner_batch_size=1024:learner_learning_rate=0.02:nn_type_name=alphazero:nn_num_blocks=1:nn_num_hidden_channels=256:actor_use_random_rotation_features=true

Start training:

tools/quick-run.sh train wallgo wallgo_4-stone_7x7_az_1bx256_n200.cfg 500 -g 0123 -n wallgo_4-stone_7x7_az_1bx256_n200

You will obtain a training folder similar to:

wallgo_4-stone_7x7_az_1bx256_n200
├── wallgo_4-stone_7x7_az_1bx256_n200.cfg   # configuration file used for this training run
├── analysis/                # figures of the training process
│   ├── accuracy_policy.png  # policy prediction accuracy
│   ├── Lengths.png          # self-play game lengths
│   ├── loss_policy.png      # policy loss
│   ├── loss_value.png       # value loss
│   ├── Returns.png          # self-play returns
│   └── Time.png             # elapsed training time
├── model/                   # saved network checkpoints
│   ├── *.pkl                # full checkpoints, including optimizer states
│   └── *.pt                 # model weights only
├── sgf/                     # self-play records for each iteration
│   └── *.sgf                # 1.sgf, 2.sgf, ..., 500.sgf
├── op.log                   # optimization worker log
├── Training.log             # main training log
└── Worker.log               # worker connection log

The most important outputs are:

  • model/*.pt: use these checkpoints for evaluation, console play, or later experiments.
  • model/*.pkl: use these if you need full training checkpoints.
  • analysis/: inspect whether training is stable and whether the policy/value losses converge as expected.
  • sgf/: review self-play games generated during training.

Playing via Console

MiniZero provides a built-in GTP-like console for interacting with a trained model:

tools/quick-run.sh console wallgo wallgo_4-stone_7x7_az_1bx256_n200 \
  -conf_str actor_num_simulation=2000:actor_select_action_by_count=true:actor_use_dirichlet_noise=false

For stronger play, use the 10-block model from Larger Model for Human Evaluation instead; with 2,000 simulations per move, this matches the paper's human evaluation setting.

After the message Successfully started console mode is displayed, use GTP-like commands to play:

  • genmove b / genmove w: let WallZero generate a move for the given player
  • play b C3: place a stone during the setup phase (empty mode only; 4-stone mode starts with preset stones)
  • play b C3->D4:W2: in the play phase, move the stone from C3 to D4, then build a wall in the given direction (W0: up, W1: right, W2: down, W3: left)
  • showboard: print the current board
  • clear_board: restart the game
  • final_score: show the game result based on current territory (1: Red/Black wins, -1: Blue/White wins, 0: draw)

Run list_commands for all supported commands.

Viewing Self-Play Games

Self-play records can be replayed in the console as well. Each sgf/*.sgf file stores one game record per line, so extract a single line into a file first:

# extract the 1st game of iteration 500 into a standalone file
sed -n '1p' wallgo_4-stone_7x7_az_1bx256_n200/sgf/500.sgf > game1.sgf

# start the console
tools/quick-run.sh console wallgo wallgo_4-stone_7x7_az_1bx256_n200

Then, inside the console, load the record and inspect the board:

load_game game1.sgf
showboard

Notes

  • If you do not specify -n, MiniZero automatically generates a training folder name based on the game type and key training settings.
  • If you continue a previous run with a larger END_ITER, new checkpoints and SGF files will be appended to the same folder.
  • For more details on MiniZero training outputs, see docs/Training.md.

Releases

No releases published

Packages

 
 
 

Contributors