I am a senior undergraduate student with a strong interest in Multimodal LLMs Post-Training and Tool-Augmented Agentic System. I am fortunate to collaborate with Manling Li, Zhengzhong Tu, and Yue Zhao, and more broadly with Jiacheng Zhu, Yujun Cai, Yiwei Wang, and Shuo Li. Prior to that, I worked with Zhe Liu and Victor S. Sheng on research in Generalizable Medical Vision and Multimodal Machine Learning. I am deeply grateful to them for guiding me into the world of research.
I am actively seeking a 26 Fall CS PhD position. I am always open to collaborate, feel free to drop me an email or contact with me on wechat ID qiancxdotcom.
Research Interests
- Multimodal Foundation Models (VLM, VLA, Videos, Spatial Intelligence etc.)π₯
- Multimodal Post-Training (Reasoning, Alignment, Benchmarks, Agent)π₯π₯
- Tool-Augmented Agentic RL (Visual Tools, RAG, Search Engine, Code Interpreter, APIs etc.)π₯π₯
- Agent Application (Autonomous Driving, Biomedicine, Embodied AI)π₯π₯
π₯ News
- 2025.10: Β ππ We propose Video-STAR, a powerful Tool-Augmented Agentic RL approach for Thinking with Videos. On open-vocabulary action recognition benchmarks like K-400 and HMDB-51, our 3B VLM achieves nearly 40% accuracy improvement over base models!π₯
- 2025.09: Β ππ Our work HALF-GS, an efficient dynamic 3D reconstruction framework combining sparse anchors, self-supervised guidance, and hierarchical propagation to improve reconstruction quality and temporal consistency, has been accepted by NeurIPS 2025!
- 2025.09: Β ππ We propose AutoDrive-RΒ², Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving. Weβre also honored that our work was featured by AutoDrive Heart (θͺε¨ι©Ύι©ΆδΉεΏ)!
- 2025.08: Β ππ Our work Re-Align has been accepted by EMNLP 2025 Main Conference!
- 2025.08: Β ππ I will serve as the Program Committee for AAAI 2026!.
- 2025.07: Β ππ Our work on Generalizable Medical Vision has been Accepted by IEEE Transactions on Medical Imaging.
- 2025.06: Β ππ We propose DVP-MVS++, a multi-view stereo method that integrates depth-normal-edge priors and visibility guidance for robust 3D Reconstruction, which is now available on ArXiv!
- 2025.05: Β ππ Our work CLIMD has been Early Accepted by MICCAI 2025 (Top 9%).
- 2025.03: Β ππ Excited to propose my first-author work DecAlign, a novel cross-modal decoupling and alignment framwork for multimodal representation learning, which is now available on ArXiv!
- 2024.11: Β ππ Excited to propose my first-author work DynCIM, a novel dynamic multimodal curriculum learning framework in addressing cross-modal competition and imbalances, which is now available on ArXiv!
- 2024.10: Β ππ We propose FASS, a novel frequency domain-enhanced approach for Medical Image Segmentation under Low-Contrast environment.
- 2024.10: Β ππ Our work LiMT is now under Major Revision by Medical Image Analysis.
- 2024.08: Β ππ Excited to propose my first-author work ALC, a novel adaptive label correction framework for medical image segmentation with noisy labels, which is now available on ArXiv!
π Publications
Multimodal LLMs Post-Training

Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Preprint
Zhenlong Yuan, Xiangyan Qu, Chengxuan Qianβ (corresponding author), Rui Chen, Jing Tang, Lei Sun, Xiangxiang Chu, Dapeng Zhu, Yiwei Wang, Yujun Cai, Shuo Li.

Preprint
Featured by AutoDrive Heart (θͺε¨ι©Ύι©ΆδΉεΏ)
Zhenlong Yuan, Jing Tang, Jinguo Luo, Rui Chen, Lei Sun, Chengxuan Qian, Yujun Cai, Dapeng Zhang, Shuo Li.

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing, Yuping Wang, Peiran Li, Ruizheng Bai, Yueqi Wang, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tuβ .
Multimodal Foundation Models

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tuβ .

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
Chengxuan Qian, Kai Han, Jingchao Wang, Zhenlong Yuan, Chongwen Lyu, Jun Chenβ , Zhe Liuβ .

CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis
MICCAI 2025 Early Accept (Top 9% Paper)
Kai Han, Chongwen Lyu, Chengxuan Qian, Siqi Ma, Jun Chenβ , Zhe Liuβ ,

Contrastive Intra- and Inter-modal Clustering for Multimodal Semantic Discovery
Under Review
Zhengzhong Zhu, Pei Zhou, Chengxuan Qian, Ruohong Yang, Yixuan Ye, Jiangping Zhu
Medical Image Analysis

Adaptive Label Correction Framework for Robust Medical Image Segmentation with Noisy Labels
Chengxuan Qian, K Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chenβ , Zhe Liuβ .

Region Uncertainty Estimation for Medical Image Segmentation with Noisy Labels
IEEE Transaction on Medical Imaging (Published July 2025)
Kai Han, Shuhui Wang, Jun Chenβ , Chengxuan Qian, Chongwen Lyu, Siqi Ma, Victor S. Shengβ , Qingming Huangβ , Zhe Liuβ .

Frequency Domain Unlocks New Perspectives for Abdominal Medical Image Segmentation
arXiv Preprint
Kai Han, Siqi Ma, Chengxuan Qian, Jun Chenβ , Chongwen Lyu, Victor S. Shengβ , Zhe Liuβ .

LiMT: A Multi-task Liver Image Benchmark Dataset
Medical Image Analysis (Major Revision)
Z Liuβ , K Han, S Ma, J Chenβ , β¦, C Qian, C Lyu, β¦, V S. Shengβ .
- Dataset and Benchmarking work
3D Vision

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo
IEEE Transactions on Circuits and Systems for Video Technology (Major Revision)
Zhenlong Yuan, Dapeng Zhang, Zehao Li, Chengxuan Qian, Jianing Chen, Yinda Chen, Kehua Chen, Tianlu Mao, Zhaoxin Li, Hao Jiang and Zhaoqi Wang

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang.
π Academical Services
- Reviewer of IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia (TMM), IEEE International Conference on Multimedia & Expo (ICME 2025), ICCV 2025, NeurIPS 2025, AAAI 2026, ICASSP 2026.