I am a senior undergraduate student with a strong interest in Multimodal LLMs Post-Training, Spatial Intelligence and Agent Applications. I am fortunate to collaborate with Manling Li, Han Liu, Zhengzhong Tu, and Yue Zhao, and more broadly with Jiacheng Zhu, Yujun Cai, Yiwei Wang, and Shuo Li. Prior to that, I worked with Zhe Liu and Victor S. Sheng on research in Generalizable Medical Vision and Multimodal Machine Learning. I am deeply grateful to them for guiding me into the world of research.
I am actively seeking a 26 Fall CS PhD position. I am always open to collaborate, feel free to drop me an email or contact with me on wechat ID qiancxdotcom.
Research Interests
- Multimodal Foundation Models (VLM, VLA, Video LLMs etc.)π₯
- Spatial Intelligence (Scene Understanding, Robotic Maniplation, Embodied Navigation etc.)π₯π₯
- Tool-Augmented Agentic RL (Thinking with Images/Videos, Deep Research etc.)π₯π₯
- Agent Application (Embodied Robotics, Autonomous Driving, Biomedicine)π₯π₯
π₯ News
- 2025.11: Β ππ Our work LiMT, an unified multi-task liver image benchmark work, has been accepted by Journal of Biomedical and Health Informatics (JBHI)!
- 2025.10: Β ππ Our work DVP-MVS++, a multi-view stereo method that integrates depth-normal-edge priors and visibility guidance for robust 3D Reconstruction, has been accepted by IEEE Transactions on Circuits and Systems for Video Technology!
- 2024.10: Β ππ My first-author work on Medical Segmentation under sparse and noisy labeled annotations has been accepted by BIBM 2025!
- 2025.10: Β ππ We propose Video-STAR, a powerful Tool-Augmented Agentic RL approach for Thinking with Videos. On open-vocabulary action recognition benchmarks like K-400 and HMDB-51, our 3B VLM achieves nearly 40% accuracy improvement over base models!π₯
- 2025.09: Β ππ Our work HALF-GS, an efficient dynamic 3D reconstruction framework combining sparse anchors, self-supervised guidance, and hierarchical propagation to improve reconstruction quality and temporal consistency, has been accepted by NeurIPS 2025!
- 2025.09: Β ππ We propose AutoDrive-RΒ², Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving. Weβre also honored that our work was featured by AutoDrive Heart (θͺε¨ι©Ύι©ΆδΉεΏ)!
- 2025.08: Β ππ Our work Re-Align has been accepted by EMNLP 2025 Main Conference!
- 2025.07: Β ππ Our work on Generalizable Medical Vision has been Accepted by IEEE Transactions on Medical Imaging.
- 2025.05: Β ππ Our work CLIMD has been Early Accepted by MICCAI 2025 (Top 9%).
- 2025.03: Β ππ Excited to propose my first-author work DecAlign, a novel cross-modal decoupling and alignment framwork for multimodal representation learning, which is now available on ArXiv!
- 2024.11: Β ππ Excited to propose my first-author work DynCIM, a novel dynamic multimodal curriculum learning framework in addressing cross-modal competition and imbalances, which is now available on ArXiv!
- 2024.10: Β ππ We propose FASS, a novel frequency domain-enhanced approach for Medical Image Segmentation under Low-Contrast environment.
- 2024.08: Β ππ Excited to propose my first-author work ALC, a novel adaptive label correction framework for medical image segmentation with noisy labels, which is now available on ArXiv!
π Publications
Multimodal LLMs Post-Training

Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Preprint
Zhenlong Yuan, Xiangyan Qu, Chengxuan Qianβ (corresponding author), Rui Chen, Jing Tang, Lei Sun, Xiangxiang Chu, Dapeng Zhu, Yiwei Wang, Yujun Cai, Shuo Li.

Preprint
Featured by AutoDrive Heart (θͺε¨ι©Ύι©ΆδΉεΏ)
Zhenlong Yuan, Jing Tang, Jinguo Luo, Rui Chen, Lei Sun, Chengxuan Qian, Yujun Cai, Dapeng Zhang, Shuo Li.

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing, Yuping Wang, Peiran Li, Ruizheng Bai, Yueqi Wang, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tuβ .

fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding
Yuxiang Wei, Yanteng Zhang, Xi Xiao, Chengxuan Qian, Tianyang Wang, Vince D. Calhoun β .
Multimodal Foundation Models

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tuβ .

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
Chengxuan Qian, Kai Han, Jingchao Wang, Zhenlong Yuan, Chongwen Lyu, Jun Chenβ , Zhe Liuβ .

CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis
MICCAI 2025 Early Accept (Top 9% Paper)
Kai Han, Chongwen Lyu, Chengxuan Qian, Siqi Ma, Jun Chenβ , Zhe Liuβ ,

Contrastive Intra- and Inter-modal Clustering for Multimodal Semantic Discovery
Under Review
Zhengzhong Zhu, Pei Zhou, Chengxuan Qian, Ruohong Yang, Yixuan Ye, Jiangping Zhu
Medical Image Analysis

Adaptive Label Correction Framework for Robust Medical Image Segmentation with Noisy Labels
Chengxuan Qian, K Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chenβ , Zhe Liuβ .

Region Uncertainty Estimation for Medical Image Segmentation with Noisy Labels
IEEE Transaction on Medical Imaging (Published July 2025)
Kai Han, Shuhui Wang, Jun Chenβ , Chengxuan Qian, Chongwen Lyu, Siqi Ma, Victor S. Shengβ , Qingming Huangβ , Zhe Liuβ .

Frequency Domain Unlocks New Perspectives for Abdominal Medical Image Segmentation
arXiv Preprint
Kai Han, Siqi Ma, Chengxuan Qian, Jun Chenβ , Chongwen Lyu, Victor S. Shengβ , Zhe Liuβ .

LiMT: A Multi-task Liver Image Benchmark Dataset
Journal of Biomedical and Health Informatics (JBHI)
Z Liuβ , K Han, S Ma, J Chenβ , β¦, C Qian, C Lyu, β¦, V S. Shengβ .
- Dataset and Benchmarking work
3D Vision

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo
IEEE Transactions on Circuits and Systems for Video Technology
Zhenlong Yuan, Dapeng Zhang, Zehao Li, Chengxuan Qian, Jianing Chen, Yinda Chen, Kehua Chen, Tianlu Mao, Zhaoxin Li, Hao Jiang and Zhaoqi Wang

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang.
π Academical Services
- Journal Reviewer: IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia (TMM).
- Conference Reviewer: ICME 2025, ACL 2025, ICCV 2025, NeurIPS 2025, AAAI 2026, ICASSP 2026, CVPR 2026