MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning

This repo is the official implementation for the paper MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning, accepted at ICLR 2026.

MATA formulates visual reasoning as a hierarchical finite-state automaton. A Hyper Agent controls high-level transitions among specialized agents, oneshot reasoning, stepwise reasoning, and answering states, while all agents communicate through shared memory for transparent execution traces.

Release

[2026/02] MATA is accepted at ICLR 2026.

TODOs

We're working on the following TODOs:

Inference code.
MATA-SFT-90K dataset.
Training pipeline.
Official benchmark evaluation scripts.

Installation

Clone this repository and install the pixi environment.

git clone https://github.com/ControlNet/MATA.git
cd MATA
pixi install

The default environment targets Linux with CUDA and uses the dependencies specified in pixi.toml.

Environment Variables

Create a local .env file from the example file.

cp .env.example .env

Set your OpenAI or OpenAI-compatible endpoint credentials in .env.

OPENAI_API_KEY=your-api-key
OPENAI_BASE_URL=https://your-openai-compatible-endpoint/v1

OPENAI_BASE_URL is optional when using the official OpenAI endpoint. Local model and tool caches are controlled by TORCH_HOME in .env.

Hyper Agent

The Hyper Agent uses a local LLM State Controller to choose state transitions. By default, it uses the Qwen3-4B model, or the future released checkpoint. You can change the model by modifying the hyper_agent section in the base config.

hyper_agent:
  model_name: "Qwen/Qwen3-4B"

Download Models

Prepare the vision-language tool models used by MATA.

pixi run download_model

Inference

Run MATA on a single image and query.

pixi run python scripts/infer_once.py \
  --base_config configs/gqa.yaml \
  --image /path/to/image.jpg \
  --query "How many red cups are on the left table?"

Dataset Inference

Run inference on a dataset split.

pixi run python scripts/infer_dataset.py \
  --base_config configs/gqa.yaml \
  --data_root /path/to/data \
  --result_folder ./result

Code Structure

src/mata/
  agent/        Hyper Automaton agents and state controller
  execution/    ImagePatch runtime and tool execution helpers
  memory/       Shared memory
  prompt/       Prompt templates
  tool/         Vision-language tool wrappers
  util/         Config, logging, and misc utilities

configs/        Task and model configs
scripts/        Inference entrypoints

Key paper terms map to the code as follows:

Hyper Automaton: src/mata/agent/mata.py
Hyper Agent and LLM State Controller: src/mata/agent/hyper_agent.py
Specialized Agent: src/mata/agent/specialized
Oneshot Reasoner: src/mata/agent/oneshot
Stepwise Reasoner: src/mata/agent/stepwise
Answering State: src/mata/agent/answering
Shared Memory: src/mata/memory/shared_memory.py

Citation

If you find this work useful for your research, please consider citing it.

@inproceedings{cai2026mata,
  title={MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning},
  author={Cai, Zhixi and Ke, Fucai and Leo, Kevin and Huang, Sukai and Garcia de la Banda, Maria and Stuckey, Peter J. and Rezatofighi, Hamid},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

License

This project is released under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
scripts		scripts
src/mata		src/mata
.env.example		.env.example
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning

Release

TODOs

Installation

Environment Variables

Hyper Agent

Download Models

Inference

Dataset Inference

Code Structure

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning

Release

TODOs

Installation

Environment Variables

Hyper Agent

Download Models

Inference

Dataset Inference

Code Structure

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages