This repo is the official implementation for the paper MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning, accepted at ICLR 2026.
MATA formulates visual reasoning as a hierarchical finite-state automaton. A Hyper Agent controls high-level transitions among specialized agents, oneshot reasoning, stepwise reasoning, and answering states, while all agents communicate through shared memory for transparent execution traces.
- [2026/02] MATA is accepted at ICLR 2026.
We're working on the following TODOs:
- Inference code.
- MATA-SFT-90K dataset.
- Training pipeline.
- Official benchmark evaluation scripts.
Clone this repository and install the pixi environment.
git clone https://github.com/ControlNet/MATA.git
cd MATA
pixi installThe default environment targets Linux with CUDA and uses the dependencies specified in pixi.toml.
Create a local .env file from the example file.
cp .env.example .envSet your OpenAI or OpenAI-compatible endpoint credentials in .env.
OPENAI_API_KEY=your-api-key
OPENAI_BASE_URL=https://your-openai-compatible-endpoint/v1OPENAI_BASE_URL is optional when using the official OpenAI endpoint. Local model and tool caches are controlled by TORCH_HOME in .env.
The Hyper Agent uses a local LLM State Controller to choose state transitions. By default, it uses the Qwen3-4B model, or the future released checkpoint. You can change the model by modifying the hyper_agent section in the base config.
hyper_agent:
model_name: "Qwen/Qwen3-4B"Prepare the vision-language tool models used by MATA.
pixi run download_modelRun MATA on a single image and query.
pixi run python scripts/infer_once.py \
--base_config configs/gqa.yaml \
--image /path/to/image.jpg \
--query "How many red cups are on the left table?"Run inference on a dataset split.
pixi run python scripts/infer_dataset.py \
--base_config configs/gqa.yaml \
--data_root /path/to/data \
--result_folder ./resultsrc/mata/
agent/ Hyper Automaton agents and state controller
execution/ ImagePatch runtime and tool execution helpers
memory/ Shared memory
prompt/ Prompt templates
tool/ Vision-language tool wrappers
util/ Config, logging, and misc utilities
configs/ Task and model configs
scripts/ Inference entrypoints
Key paper terms map to the code as follows:
- Hyper Automaton: src/mata/agent/mata.py
- Hyper Agent and LLM State Controller: src/mata/agent/hyper_agent.py
- Specialized Agent: src/mata/agent/specialized
- Oneshot Reasoner: src/mata/agent/oneshot
- Stepwise Reasoner: src/mata/agent/stepwise
- Answering State: src/mata/agent/answering
- Shared Memory: src/mata/memory/shared_memory.py
If you find this work useful for your research, please consider citing it.
@inproceedings{cai2026mata,
title={MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning},
author={Cai, Zhixi and Ke, Fucai and Leo, Kevin and Huang, Sukai and Garcia de la Banda, Maria and Stuckey, Peter J. and Rezatofighi, Hamid},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}This project is released under the Apache-2.0 License.