I am a senior undergraduate student with a strong interest in Multimodal LLMs, Tool-Using Agents and Spatial Intelligence. My research vision is to develop super-intelligent yet lightweight Multimodal LLMs, enabling machines to perceive, plan, reason, and act through the autonomous integration of multi-sensory signals, external tools, and knowledge, giving rise to superhuman yet controllable intelligence in downstream tasks such as Video Understanding, Embodied Robotics, Autonomous Driving, and Medical Diagnosis.

I'm applying for PhD programs in the 2026 Fall and Research Internship in the 2026 Summer. I am always open to collaborate, feel free to reach out! Email: open.qiancx[at]gmail.com | WeChat: qiancxdotcom

Research Interests and Highlights

  • Multi-Sensory Perception, Integration and Reasoning: How can machines extract learnable neural-symbolic concepts from the complex physical world to enable grounded understanding, integration, interaction, and decision-making across multi-sensory signals, ultimately leading to superhuman yet interpretable intelligence?

  • World Modeling, Long-Term Video Understanding and Spatial Intelligence: Toward multimodal superintelligence, guiding foundation models to deeply understand the underlying mechanisms of the complex physical world, internalize world dynamics within their parameter space, and reason about complex object properties and interactions in dynamic 3D environments.

  • In-the-wild Environment-Interactive Foundation Agents: How can we teach foundation models to see, plan, and act in open-world settings, while autonomously interacting with external environments such as tools, knowledge bases, and simulators, thereby continuously extending their capability boundaries in real-world applications?

๐Ÿ”ฅ News

  • 2025.11: ย ๐ŸŽ‰๐ŸŽ‰ Our work LiMT, an unified multi-task liver image benchmark work, has been accepted by Journal of Biomedical and Health Informatics (JBHI)!
  • 2025.10: ย ๐ŸŽ‰๐ŸŽ‰ Our work DVP-MVS++, a multi-view stereo method that integrates depth-normal-edge priors and visibility guidance for robust 3D Reconstruction, has been accepted by IEEE Transactions on Circuits and Systems for Video Technology!
  • 2025.10: ย ๐ŸŽ‰๐ŸŽ‰ My first-author work on Medical Segmentation under sparse and noisy labeled annotations has been accepted by BIBM AIBH 2025!
  • 2025.10: ย ๐ŸŽ‰๐ŸŽ‰ We propose Video-STAR, a powerful Tool-Augmented Agentic RL approach for Thinking with Videos. On open-vocabulary action recognition benchmarks like K-400 and HMDB-51, our 3B VLM achieves nearly 40% accuracy improvement over base models!๐Ÿ”ฅ
  • 2025.09: ย ๐ŸŽ‰๐ŸŽ‰ Our work HALF-GS, an efficient dynamic 3D reconstruction framework combining sparse anchors, self-supervised guidance, and hierarchical propagation to improve reconstruction quality and temporal consistency, has been accepted by NeurIPS 2025!
  • 2025.09: ย ๐ŸŽ‰๐ŸŽ‰ We propose AutoDrive-Rยฒ, Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving. Weโ€™re also honored that our work was featured by AutoDrive Heart (่‡ชๅŠจ้ฉพ้ฉถไน‹ๅฟƒ)!
  • 2025.08: ย ๐ŸŽ‰๐ŸŽ‰ Our work Re-Align has been accepted by EMNLP 2025 Main Conference!
  • 2025.07: ย ๐ŸŽ‰๐ŸŽ‰ Our work on Generalizable Medical Vision has been Accepted by IEEE Transactions on Medical Imaging.
  • 2025.05: ย ๐ŸŽ‰๐ŸŽ‰ Our work CLIMD has been Early Accepted by MICCAI 2025 (Top 9%).
  • 2025.03: ย ๐ŸŽ‰๐ŸŽ‰ Excited to propose my first-author work DecAlign, a novel cross-modal decoupling and alignment framwork for multimodal representation learning.
  • 2024.11: ย ๐ŸŽ‰๐ŸŽ‰ Excited to propose my first-author work DynCIM, a novel dynamic multimodal curriculum learning framework in addressing cross-modal competition and imbalances, which is now available on ArXiv!
  • 2024.10: ย ๐ŸŽ‰๐ŸŽ‰ We propose FASS, a novel frequency domain-enhanced approach for Medical Image Segmentation under Low-Contrast environment.

๐Ÿ“ Publications

Multimodal LLMs and Agentic AI

Preprint
sym

ProgressLM: Towards Progress Reasoning in Vision-Language Models

Spatial Intelligence Embodied Robotics Data-Centric Multimodal Reasoning Open-World Applications

Paper Code Model Dataset Website

Jianshu Zhang*, Chengxuan Qian*, Haosen Sun, Haoran Lu, Dingcheng Wang, Letian Xue, Han Liu

Preprint
sym

AutoDrive-Rยฒ: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving

Multimodal Reasoning Autonomous Driving Open-World Applications

Featured by AutoDrive Heart (่‡ชๅŠจ้ฉพ้ฉถไน‹ๅฟƒ)

Zhenlong Yuan*, Chengxuan Qian*, Jing Tang, Jinguo Luo, Rui Chen, Lei Sun, Xiangxiang Chu, Yujun Cai, Dapeng Zhang, Shuo Li.

Preprint
sym

Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools

Think with Videos Tool-Using Agent Multi-turn Agentic RL

Zhenlong Yuan, Xiangyan Qu, Chengxuan Qianโ€ , Rui Chen, Jing Tang, Lei Sun, Xiangxiang Chu, Dapeng Zhang, Yiwei Wang, Yujun Cai, Shuo Li.

Preprint
sym

System Prompt Auditing for User-centric Large Language Model Systems

Human-centric AI LLM Safety AI Agents

Xiangning Lin*, Shenzhe Zhu*, Chengxuan Qian, Tianwei Wang, Haoqian Zhang, Ziheng Zhang, Zhenlong Yuan, Dingcheng Wang, Juncheng Wu, Yuan Si, Jiaxin Liu, Baolong Bi, Shu Yang, Robert Mahari, Tobin South, Dazza Greenwood, Andreas Haupt, Samuele Marro, Alex Pentland, Jiaxin Pei

EMNLP 2025
sym

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization

EMNLP 2025 Main Conference

Multimodal Alignment Hallucination Mitigation Multimodal RAG DPO

Shuo Xing, Yuping Wang, Peiran Li, Ruizheng Bai, Yueqi Wang, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tuโ€ .

Preprint
sym

fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding

Medical LLMs Data-Centric Foundation Model

Yuxiang Wei, Yanteng Zhang, Xi Xiao, Chengxuan Qian, Tianyang Wang, Vince D. Calhoun โ€ .

Multimodal Foundation Models

Preprint
sym

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

Multimodal Alignment Foundation Model Interpretability

Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tuโ€ .

Preprint
sym

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning

Multimodal Competition Modality Imbalances Curriculum Learning

Chengxuan Qian, Kai Han, Jingchao Wang, Zhenlong Yuan, Chongwen Lyu, Jun Chenโ€ , Zhe Liuโ€ .

MICCAI 2025
sym

CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis

Modality Imbalances Medical AI Curriculum Learning

MICCAI 2025 Early Accept (Top 9% Paper)

Kai Han, Chongwen Lyu, Chengxuan Qian, Siqi Ma, Jun Chenโ€ , Zhe Liuโ€ ,

Preprint
sym

Contrastive Intra- and Inter-modal Clustering for Multimodal Semantic Discovery

Multimodal Learning Semantic Discovery Interpretability

Zhengzhong Zhu, Pei Zhou, Chengxuan Qian, Ruohong Yang, Yixuan Ye, Jiangping Zhu

Medical Image Analysis

BIBM 2025
sym

Adaptive Label Correction Framework for Robust Medical Image Segmentation with Noisy Labels

Medical Segmentation Noisy Labels Sparse Annotation

BIBM AIBH 2025

Chengxuan Qian, K Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chenโ€ , Zhe Liuโ€ .

IEEE TMI 2025
sym

Region Uncertainty Estimation for Medical Image Segmentation with Noisy Labels

Medical Segmentation Noisy Labels Sparse Annotation Uncertainty Estimation

IEEE Transaction on Medical Imaging, 2025

Kai Han, Shuhui Wang, Jun Chenโ€ , Chengxuan Qian, Chongwen Lyu, Siqi Ma, Victor S. Shengโ€ , Qingming Huangโ€ , Zhe Liuโ€ .

Preprint
sym

Frequency Domain Unlocks New Perspectives for Abdominal Medical Image Segmentation

Medical Segmentation Low-contrast Environment Robustness

Kai Han, Siqi Ma, Chengxuan Qian, Jun Chenโ€ , Chongwen Lyu, Victor S. Shengโ€ , Zhe Liuโ€ .

JBHI 2025
sym

LiMT: A Multi-task Liver Image Benchmark Dataset

Medical AI Benchmarks Multi-task Unified Learning

Journal of Biomedical and Health Informatics (JBHI 2025)

Zhe Liuโ€ , Kai Han, Siqi Ma, Yan Zhu, Jun Chen, Chongwen Lyu, Xinyi Qiu, Chengxuan Qian, Yuqing Song, Yi Liu, Liyuan Tian, Yang Ji, Yuefeng Li

3D Vision

TCSVT 2025
sym

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo

3D Reconstruction Geometric Understanding Multi-View Stereo

IEEE Transactions on Circuits and Systems for Video Technology

Zhenlong Yuan, Dapeng Zhang, Zehao Li, Chengxuan Qian, Jianing Chen, Yinda Chen, Kehua Chen, Tianlu Mao, Zhaoxin Li, Hao Jiang and Zhaoqi Wang

NeurIPS 2025
sym

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene

3D Reconstruction Gaussian Splatting Dynamic Scene

NeurIPS 2025

Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang.

๐ŸŒŸ Misc

Iโ€™m grateful for the mentorship of Prof. Zhengzhong Tu (TAMU), Prof. Yue Zhao (USC), Prof. Jiaxin Pei (Stanford HAI), Prof. Han Liu & Manling Li (Northwestern University). Outside of research, I enjoy Photography๐Ÿ“น, swimming๐ŸŠ, biking๐Ÿšด, billiards๐ŸŽฑ, table tennis๐Ÿ“. I strive to stay energetic every day and maintain a strong sense of passion for both academic research and life.

๐ŸŽ– Academical Services

  • Journal Reviewer: IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia (TMM).
  • Conference Reviewer: ICME 2025-2026, AAAI 2026, ICASSP 2026, CVPR 2026
  • Workshop Reviewer: ACL 2025 SRW, NeurIPS 2025 Imageomics, NeurIPS 2025 Efficient Reasoning