About Me

I am a Researcher at Ant Group, working on multimodal foundation models. I received my Master’s degree in Control Science and Engineering from Zhejiang University in 2025, and earned my Bachelor of Science (B.S.) degree in Automation from Zhejiang University in 2022.

My research interests center on unified understanding and generation models, as well as visual perception.

News

  • [Oct. 2025] We release Ming-Flash-Omni, a sparse, unified architecture for multimodal perception and generation! 🚀
  • [Oct. 2025] We release Ming-UniVision, joint image understanding and generation with a unified continuous tokenizer! 🤗
  • [Sep. 2025] Our ARGenSeg is accepted by NeurIPS 2025! 🎉
  • [Dec. 2024] Our HomoMatcher is accepted by AAAI 2025! 🎉
  • [Oct. 2024] PointLLM was accepted to ECCV 2024 with all “strong accept” reviews and selected as a Best Paper Candidate! 🎉
  • [Sep. 2023] Our HC-Net is accepted by NeurIPS 2023! 🎉
  • [Aug. 2023] We release PointLLM, a multi-modal large language model capable of understanding point clouds! 🤗
  • [Aug. 2023] We release HC-Net, a SOTA fine-grained cross-view geo-localization model! 📊

Publications

Ming-Flash-Omni

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Ming team, Ant Group

Technical Report, 2025

[paper] [code] [hf]

Ming-UniVision

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

Ziyuan Huang, Dandan Zheng, Cheng Zou, Rui Liu, Xiaolong Wang, Kaixiang Ji, Weilong Chai, Jianxin Sun, Libin Wang, Yongjie Lv, Taozhi Huang, Jiajia Liu, Qingpei Guo, Ming Yang, Jingdong Chen, Jun Zhou

Technical Report, 2025

[paper] [code] [hf]

ARGenSeg

ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

Xiaolong Wang*, Lixiang Ru*, Ziyuan Huang, Kaixiang Ji, Dandan Zheng, Jingdong Chen#, Jun Zhou#

Neural Information Processing Systems (NeurIPS), 2025

[paper] [code] [project]

HomoMatcher

HomoMatcher: Achieving Dense Feature Matching with Semi-Dense Efficiency by Homography Estimation

Xiaolong Wang*, Lei Yu*, Yingying Zhang, Jiangwei Lao, Lixiang Ru, Liheng Zhong, Jingdong Chen, Yu Zhang#, Ming Yang#

AAAI Conference on Artificial Intelligence (AAAI), 2025

[paper] [aaai]

PointLLM-V2

PointLLM-V2: Empowering Large Language Models to Better Understand Point Clouds

Runsen Xu*, Shuai Yang*, Xiaolong Wang, Tai Wang#, Yilun Chen, Jiangmiao Pang#, Dahua Lin

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

[paper] [code] [project]

PointLLM

PointLLM: Empowering Large Language Models to Understand Point Clouds

Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang*, Dahua Lin

European Conference on Computer Vision (ECCV), 2024

[paper] [code] [project]

HC-Net

HC-Net: Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

Xiaolong Wang, Runsen Xu, Zhuofan Cui, Zeyu Wan, Yu Zhang*

Neural Information Processing Systems (NeurIPS), 2023

[paper] [code]

UAV Geo-Localization

A Novel Geo-Localization Method for UAV and Satellite Images Using Cross-View Consistent Attention

Zhuofan Cui, Pengwei Zhou, Xiaolong Wang, Zilun Zhang, Yingxuan Li, Hongbo Li*, Yu Zhang

Remote Sensing, 2023

[paper]