I am a senior undergraduate student with a strong interest in Multimodal LLMs Post-Training, Spatial Intelligence and Agent Applications. I am fortunate to collaborate with Manling Li, Han Liu, Zhengzhong Tu, and Yue Zhao, and more broadly with Jiacheng Zhu, Yujun Cai, Yiwei Wang, and Shuo Li. Prior to that, I worked with Zhe Liu and Victor S. Sheng on research in Generalizable Medical Vision and Multimodal Machine Learning. I am deeply grateful to them for guiding me into the world of research.

I am actively seeking a 26 Fall CS PhD position. I am always open to collaborate, feel free to drop me an email or contact with me on wechat ID qiancxdotcom.

Research Interests

  • Multimodal Foundation Models (VLM, VLA, Video LLMs etc.)πŸ”₯
  • Spatial Intelligence (Scene Understanding, Robotic Maniplation, Embodied Navigation etc.)πŸ”₯πŸ”₯
  • Tool-Augmented Agentic RL (Thinking with Images/Videos, Deep Research etc.)πŸ”₯πŸ”₯
  • Agent Application (Embodied Robotics, Autonomous Driving, Biomedicine)πŸ”₯πŸ”₯

πŸ”₯ News

  • 2025.11: Β πŸŽ‰πŸŽ‰ Our work LiMT, an unified multi-task liver image benchmark work, has been accepted by Journal of Biomedical and Health Informatics (JBHI)!
  • 2025.10: Β πŸŽ‰πŸŽ‰ Our work DVP-MVS++, a multi-view stereo method that integrates depth-normal-edge priors and visibility guidance for robust 3D Reconstruction, has been accepted by IEEE Transactions on Circuits and Systems for Video Technology!
  • 2024.10: Β πŸŽ‰πŸŽ‰ My first-author work on Medical Segmentation under sparse and noisy labeled annotations has been accepted by BIBM 2025!
  • 2025.10: Β πŸŽ‰πŸŽ‰ We propose Video-STAR, a powerful Tool-Augmented Agentic RL approach for Thinking with Videos. On open-vocabulary action recognition benchmarks like K-400 and HMDB-51, our 3B VLM achieves nearly 40% accuracy improvement over base models!πŸ”₯
  • 2025.09: Β πŸŽ‰πŸŽ‰ Our work HALF-GS, an efficient dynamic 3D reconstruction framework combining sparse anchors, self-supervised guidance, and hierarchical propagation to improve reconstruction quality and temporal consistency, has been accepted by NeurIPS 2025!
  • 2025.09: Β πŸŽ‰πŸŽ‰ We propose AutoDrive-RΒ², Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving. We’re also honored that our work was featured by AutoDrive Heart (θ‡ͺεŠ¨ι©Ύι©ΆδΉ‹εΏƒ)!
  • 2025.08: Β πŸŽ‰πŸŽ‰ Our work Re-Align has been accepted by EMNLP 2025 Main Conference!
  • 2025.07: Β πŸŽ‰πŸŽ‰ Our work on Generalizable Medical Vision has been Accepted by IEEE Transactions on Medical Imaging.
  • 2025.05: Β πŸŽ‰πŸŽ‰ Our work CLIMD has been Early Accepted by MICCAI 2025 (Top 9%).
  • 2025.03: Β πŸŽ‰πŸŽ‰ Excited to propose my first-author work DecAlign, a novel cross-modal decoupling and alignment framwork for multimodal representation learning, which is now available on ArXiv!
  • 2024.11: Β πŸŽ‰πŸŽ‰ Excited to propose my first-author work DynCIM, a novel dynamic multimodal curriculum learning framework in addressing cross-modal competition and imbalances, which is now available on ArXiv!
  • 2024.10: Β πŸŽ‰πŸŽ‰ We propose FASS, a novel frequency domain-enhanced approach for Medical Image Segmentation under Low-Contrast environment.
  • 2024.08: Β πŸŽ‰πŸŽ‰ Excited to propose my first-author work ALC, a novel adaptive label correction framework for medical image segmentation with noisy labels, which is now available on ArXiv!

πŸ“ Publications

Multimodal LLMs Post-Training

Preprint
sym

Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools

Preprint

Zhenlong Yuan, Xiangyan Qu, Chengxuan Qian† (corresponding author), Rui Chen, Jing Tang, Lei Sun, Xiangxiang Chu, Dapeng Zhu, Yiwei Wang, Yujun Cai, Shuo Li.

Preprint
sym

AutoDrive-RΒ²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving

Preprint

Featured by AutoDrive Heart (θ‡ͺεŠ¨ι©Ύι©ΆδΉ‹εΏƒ)

Zhenlong Yuan, Jing Tang, Jinguo Luo, Rui Chen, Lei Sun, Chengxuan Qian, Yujun Cai, Dapeng Zhang, Shuo Li.

Preprint
sym
Preprint
sym

fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding

Preprint

Yuxiang Wei, Yanteng Zhang, Xi Xiao, Chengxuan Qian, Tianyang Wang, Vince D. Calhoun †.

Multimodal Foundation Models

Preprint
sym

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning

Arxiv Preprint

Chengxuan Qian, Kai Han, Jingchao Wang, Zhenlong Yuan, Chongwen Lyu, Jun Chen†, Zhe Liu†.

MICCAI 2025
sym

CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis

MICCAI 2025 Early Accept (Top 9% Paper)

Kai Han, Chongwen Lyu, Chengxuan Qian, Siqi Ma, Jun Chen†, Zhe Liu†,

Preprint
sym

Contrastive Intra- and Inter-modal Clustering for Multimodal Semantic Discovery

Under Review

Zhengzhong Zhu, Pei Zhou, Chengxuan Qian, Ruohong Yang, Yixuan Ye, Jiangping Zhu

Medical Image Analysis

BIBM 2025
sym

Adaptive Label Correction Framework for Robust Medical Image Segmentation with Noisy Labels

BIBM 2025

Chengxuan Qian, K Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chen†, Zhe Liu†.

IEEE TMI 2025
sym

Region Uncertainty Estimation for Medical Image Segmentation with Noisy Labels

IEEE Transaction on Medical Imaging (Published July 2025)

Kai Han, Shuhui Wang, Jun Chen†, Chengxuan Qian, Chongwen Lyu, Siqi Ma, Victor S. Sheng†, Qingming Huang†, Zhe Liu†.

TCSVT 2024
sym

Frequency Domain Unlocks New Perspectives for Abdominal Medical Image Segmentation

arXiv Preprint

Kai Han, Siqi Ma, Chengxuan Qian, Jun Chen†, Chongwen Lyu, Victor S. Sheng†, Zhe Liu†.

JBHI 2025
sym

LiMT: A Multi-task Liver Image Benchmark Dataset

Journal of Biomedical and Health Informatics (JBHI)

Z Liu†, K Han, S Ma, J Chen†, …, C Qian, C Lyu, …, V S. Sheng†.

  • Dataset and Benchmarking work

3D Vision

TCSVT 2025
sym

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo

IEEE Transactions on Circuits and Systems for Video Technology

Zhenlong Yuan, Dapeng Zhang, Zehao Li, Chengxuan Qian, Jianing Chen, Yinda Chen, Kehua Chen, Tianlu Mao, Zhaoxin Li, Hao Jiang and Zhaoqi Wang

NeurIPS 2025
sym

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene

NeurIPS 2025

Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang.

πŸŽ– Academical Services

  • Journal Reviewer: IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia (TMM).
  • Conference Reviewer: ICME 2025, ACL 2025, ICCV 2025, NeurIPS 2025, AAAI 2026, ICASSP 2026, CVPR 2026