Skip to content

OpenSQZ/MiniCPM-V-CookBook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

247 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

🍳 MiniCPM-V & o Cookbook

🏠 Main Repository | 📚 Full Documentation

Cook up amazing AI applications effortlessly with MiniCPM-V, MiniCPM-o, and the MiniCPM LLM series — bringing text, vision, speech, and live-streaming capabilities right to your fingertips! For version-specific deployment instructions, see the files in the deployment directory.

✨ What Makes Our Recipes Special?

Easy Usage Documentation

Our comprehensive documentation website presents every recipe in a clear, well-organized manner. All features are displayed at a glance, making it easy for you to quickly find exactly what you need.

Broad User Spectrum

We support a wide range of users, from individuals to enterprises and researchers.

  • Individuals: Enjoy effortless inference using Ollama and Llama.cpp with minimal setup.
  • Enterprises: Achieve high-throughput, scalable performance with vLLM and SGLang.
  • Researchers: Leverage advanced frameworks including Transformers , LLaMA-Factory, SWIFT, and Align-anything to enable flexible model development and cutting-edge experimentation.

Versatile Deployment Scenarios

Our ecosystem delivers optimal solution for a variety of hardware environments and deployment demands.

  • Web demo: Launch interactive multimodal AI web demo with FastAPI.
  • Quantized deployment: Maximize efficiency and minimize resource consumption using GGUF, BNB, and AWQ.
  • Edge devices: Local multimodal demos on MiniCPM-V-Apps (iOS / Android / HarmonyOS NEXT, llama.cpp); Cookbook iOS quickstart: iPhone and iPad.

⭐️ Live Demonstrations

Explore real-world examples of MiniCPM-V deployed on edge devices using our curated recipes. These demos highlight the model’s high efficiency and robust performance in practical scenarios.

🔥 Inference Recipes

Ready-to-run examples

Recipe Description
Vision Capabilities (MiniCPM-V 4.6)
🖼️ Single-image QA Question answering on a single image
🧩 Multi-image QA Question answering with multiple images
🎬 Video QA Video-based question answering
📄 Document Parser Parse and extract content from PDFs and webpages
📝 Text Recognition Reliable OCR for photos and screenshots
🎯 Grounding Visual grounding and object localization in images
Audio Capabilities (MiniCPM-o)
🎤 Speech-to-Text Multilingual speech recognition
🗣️ Text-to-Speech Instruction-following speech synthesis
🎭 Voice Cloning Realistic voice cloning and role-play
Text Capabilities (MiniCPM LLM 4 / 4.1)
💬 Chat & Hybrid Reasoning LLM chat with optional step-by-step reasoning
🛠️ MCP Tool Agent Tool-use agent built on Model Context Protocol
📑 Survey Generation Long-form survey / report generation with citations

🏋️ Fine-tuning Recipes

Customize your model with your own ingredients

Data preparation

Follow the guidance to set up your training datasets.

Training

We provide training methods serving different needs as following:

Framework Description
Transformers Most flexible for customization
LLaMA-Factory Modular fine-tuning toolkit
SWIFT Lightweight and fast parameter-efficient tuning
Align-anything Visual instruction alignment for multimodal models

📦 Serving Recipes

Deploy your model efficiently. Pick a framework — the cookbook docs page lets you switch between V / o / LLM versions on the sidebar.

Framework Description
vLLM High-throughput GPU inference
SGLang High-throughput GPU inference (LLM series via tc-mb/sglang fork)
llama.cpp Fast CPU / GGUF inference on PC, iPhone and iPad
Ollama User-friendly one-line local run
MLX Apple Silicon inference
CPM.cu On-device CUDA inference
OpenWebUI Interactive Web demo with Open WebUI
Gradio Interactive Web demo with Gradio
FastAPI Interactive Omni Streaming demo with FastAPI
iOS MiniCPM-V-Apps — iOS quickstart (llama.cpp; Android / HarmonyOS in upstream)

🥄 Quantization Recipes

Compress your model to improve efficiency. The cookbook docs page covers all supported series — use the sidebar version switcher to pick a release.

Method Key Feature
GGUF Simplest and most portable format
BNB Simple and easy-to-use quantization method
AWQ High-performance INT4 quantization for efficient inference
GPTQ Weight-only INT4 with vLLM-compatible packaging (also supports QAT)
BitCPM4 Ternary 3-bit quantization — ~10% of original size

Awesome Works using MiniCPM-V & o

  • text-extract-api: Document extraction API using OCRs and Ollama supported models GitHub Repo stars
  • comfyui_LLM_party: Build LLM workflows and integrate into existing image workflows GitHub Repo stars
  • Ollama-OCR: OCR package uses vlms through Ollama to extract text from images and PDF GitHub Repo stars
  • comfyui-mixlab-nodes: ComfyUI node suite supports Workflow-to-APP、GPT&3D and more GitHub Repo stars
  • OpenAvatarChat: Interactive digital human conversation implementation on single PC GitHub Repo stars
  • pensieve: A privacy-focused passive recording project by recording screen content GitHub Repo stars
  • paperless-gpt: Use LLMs to handle paperless-ngx, AI-powered titles, tags and OCR GitHub Repo stars
  • Neuro: A recreation of Neuro-Sama, but running on local models on consumer hardware GitHub Repo stars

👥 Community

Contributing

We love new recipes! Please share your creative dishes:

  1. Fork the repository
  2. Create your recipe
  3. Submit a pull request

Issues & Support

Institutions

This cookbook is developed by OpenBMB and OpenSQZ.

📜 License

This cookbook is served under the Apache-2.0 License - cook freely, share generously! 🍳

Citation

If you find our model/code/paper helpful, please consider citing our papers 📝 and staring us ⭐️!

@misc{yu2025minicpmv45cookingefficient,
      title={MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe}, 
      author={Tianyu Yu and Zefan Wang and Chongyi Wang and Fuwei Huang and Wenshuo Ma and Zhihui He and Tianchi Cai and Weize Chen and Yuxiang Huang and Yuanqian Zhao and Bokai Xu and Junbo Cui and Yingjing Xu and Liqing Ruan and Luoyuan Zhang and Hanyu Liu and Jingkun Tang and Hongyuan Liu and Qining Guo and Wenhao Hu and Bingxiang He and Jie Zhou and Jie Cai and Ji Qi and Zonghao Guo and Chi Chen and Guoyang Zeng and Yuxuan Li and Ganqu Cui and Ning Ding and Xu Han and Yuan Yao and Zhiyuan Liu and Maosong Sun},
      year={2025},
      eprint={2509.18154},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.18154}, 
}

@article{yao2024minicpm,
  title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
  author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
  journal={Nat Commun 16, 5509 (2025)},
  year={2025}
}

@article{cui2026minicpmo45realtimefullduplex,
      title={MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction},
      author={Junbo Cui and Bokai Xu and Chongyi Wang and Tianyu Yu and Weiyue Sun and Yingjing Xu and Tianran Wang and Zhihui He and Wenshuo Ma and Tianchi Cai and Jiancheng Gui and Luoyuan Zhang and Xian Sun and Fuwei Huang and Moye Chen and Zhuo Lin and Hanyu Liu and Qingxin Gui and Qingzhe Han and Yuyang Wen and Huiping Liu and Rongkang Wang and Yaqi Zhang and Hongliang Wei and Chi Chen and You Li and Kechen Fang and Jie Zhou and Yuxuan Li and Guoyang Zeng and Chaojun Xiao and Yankai Lin and Xu Han and Maosong Sun and Zhiyuan Liu and Yuan Yao},
      year={2026},
      eprint={2604.27393},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.27393},
}

About

Cook up amazing AI applications effortlessly with MiniCPM / MiniCPM-V / MiniCPM-o

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Contributors