About Me
I am a Researcher at Ant Group, working on multimodal foundation models. I received my Master’s degree in Control Science and Engineering from Zhejiang University in 2025, and earned my Bachelor of Science (B.S.) degree in Automation from Zhejiang University in 2022.
My research interests center on unified understanding and generation models, as well as visual perception.
News
- [Oct. 2025] We release Ming-Flash-Omni, a sparse, unified architecture for multimodal perception and generation! 🚀
- [Oct. 2025] We release Ming-UniVision, joint image understanding and generation with a unified continuous tokenizer! 🤗
- [Sep. 2025] Our ARGenSeg is accepted by NeurIPS 2025! 🎉
- [Dec. 2024] Our HomoMatcher is accepted by AAAI 2025! 🎉
- [Oct. 2024] PointLLM was accepted to ECCV 2024 with all “strong accept” reviews and selected as a Best Paper Candidate! 🎉
- [Sep. 2023] Our HC-Net is accepted by NeurIPS 2023! 🎉
- [Aug. 2023] We release PointLLM, a multi-modal large language model capable of understanding point clouds! 🤗
- [Aug. 2023] We release HC-Net, a SOTA fine-grained cross-view geo-localization model! 📊
Publications


Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer
Ziyuan Huang, Dandan Zheng, Cheng Zou, Rui Liu, Xiaolong Wang, Kaixiang Ji, Weilong Chai, Jianxin Sun, Libin Wang, Yongjie Lv, Taozhi Huang, Jiajia Liu, Qingpei Guo, Ming Yang, Jingdong Chen, Jun Zhou
Technical Report, 2025






A Novel Geo-Localization Method for UAV and Satellite Images Using Cross-View Consistent Attention
Zhuofan Cui, Pengwei Zhou, Xiaolong Wang, Zilun Zhang, Yingxuan Li, Hongbo Li*, Yu Zhang
Remote Sensing, 2023