Skip to content

ControlNet/MATA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning

This repo is the official implementation for the paper MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning, accepted at ICLR 2026.

MATA formulates visual reasoning as a hierarchical finite-state automaton. A Hyper Agent controls high-level transitions among specialized agents, oneshot reasoning, stepwise reasoning, and answering states, while all agents communicate through shared memory for transparent execution traces.

Release

  • [2026/02] MATA is accepted at ICLR 2026.

TODOs

We're working on the following TODOs:

  • Inference code.
  • MATA-SFT-90K dataset.
  • Training pipeline.
  • Official benchmark evaluation scripts.

Installation

Clone this repository and install the pixi environment.

git clone https://github.com/ControlNet/MATA.git
cd MATA
pixi install

The default environment targets Linux with CUDA and uses the dependencies specified in pixi.toml.

Environment Variables

Create a local .env file from the example file.

cp .env.example .env

Set your OpenAI or OpenAI-compatible endpoint credentials in .env.

OPENAI_API_KEY=your-api-key
OPENAI_BASE_URL=https://your-openai-compatible-endpoint/v1

OPENAI_BASE_URL is optional when using the official OpenAI endpoint. Local model and tool caches are controlled by TORCH_HOME in .env.

Hyper Agent

The Hyper Agent uses a local LLM State Controller to choose state transitions. By default, it uses the Qwen3-4B model, or the future released checkpoint. You can change the model by modifying the hyper_agent section in the base config.

hyper_agent:
  model_name: "Qwen/Qwen3-4B"

Download Models

Prepare the vision-language tool models used by MATA.

pixi run download_model

Inference

Run MATA on a single image and query.

pixi run python scripts/infer_once.py \
  --base_config configs/gqa.yaml \
  --image /path/to/image.jpg \
  --query "How many red cups are on the left table?"

Dataset Inference

Run inference on a dataset split.

pixi run python scripts/infer_dataset.py \
  --base_config configs/gqa.yaml \
  --data_root /path/to/data \
  --result_folder ./result

Code Structure

src/mata/
  agent/        Hyper Automaton agents and state controller
  execution/    ImagePatch runtime and tool execution helpers
  memory/       Shared memory
  prompt/       Prompt templates
  tool/         Vision-language tool wrappers
  util/         Config, logging, and misc utilities

configs/        Task and model configs
scripts/        Inference entrypoints

Key paper terms map to the code as follows:

Citation

If you find this work useful for your research, please consider citing it.

@inproceedings{cai2026mata,
  title={MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning},
  author={Cai, Zhixi and Ke, Fucai and Leo, Kevin and Huang, Sukai and Garcia de la Banda, Maria and Stuckey, Peter J. and Rezatofighi, Hamid},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

License

This project is released under the Apache-2.0 License.

About

[ICLR 2026] MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages