I am a rising senior undergraduate student with a strong interest in Multimodal LLMs Post-Training and Tool-Augmented Agentic System. I am fortunate to collaborate with Manling Li, Zhengzhong Tu, Yue Zhao, and Jiacheng Zhu on research focused on Multimodal Reasoning, Alignment, RAG and Agentic System. Prior to that, I worked with Zhe Liu and Victor S. Sheng on research in Robust Medical Vision and Multimodal Machine Learning. I am deeply grateful to them for guiding me into the world of research.
I am actively seeking a 26 Fall CS PhD position. I would be excited to collaborate with like-minded researchers on a broad range of topics, including LLMs, VLMs, Agents, and Embodied AI. Please feel free to reach out if our interests align, my wechat is qiancxdotcom.
Research Interests
My long-term vision is to develop efficient, robust, and generalizable machine learning systems capable of perceiving, understanding, and interacting with the world through multimodal information. I am particularly interested in advancing LLMs combined with vision, audio, action, and other modalities toward Agentic and Embodied AI systems that can reason, plan, and act in complex environments β enabling intelligent agents to interact with humans and make decisions across both physical and web-based settings. Specifically, my previous research focuses on these topics:
- Generalizable Medical Vision
- 3D Scene Reconstruction
- Multimodal Foundation Modelsπ₯
- Visual Reasoning and Alignmentπ₯π₯
- Mutimodal RAG and Search Agentπ₯π₯
- Tool-Augmented Vision-Language Agentic Systemsπ₯π₯π₯
π₯ News
- 2025.08: Β ππ Our work Re-Align has been accepted by EMNLP 2025 Main Conference!
- 2025.08: Β ππ I will serve as the Program Committee for AAAI 2026!.
- 2025.07: Β ππ Our work on Generalizable Medical Vision has been Accepted by IEEE Transactions on Medical Imaging.
- 2025.06: Β ππ We propose DVP-MVS++, a multi-view stereo method that integrates depth-normal-edge priors and visibility guidance for robust 3D Reconstruction, which is now available on ArXiv!
- 2025.06: Β ππ We propose HALF-GS, an efficient dynamic 3D reconstruction framework combining sparse anchors, self-supervised guidance, and hierarchical propagation to improve reconstruction quality and temporal consistency, which is now available on ArXiv!
- 2025.05: Β ππ Our work CLIMD has been Early Accepted by MICCAI 2025 (Top 9%).
- 2025.03: Β ππ Excited to propose my first-author work DecAlign, a novel cross-modal decoupling and alignment framwork for multimodal representation learning, which is now available on ArXiv!
- 2025.02: Β ππ Excited to propose Re-Align, a novel RAG-enhanced DPO framework to mitigate hallucinations in Vision Language Models, which is now available on ArXiv!
- 2024.11: Β ππ Excited to propose my first-author work DynCIM, a novel dynamic multimodal curriculum learning framework in addressing cross-modal competition and imbalances, which is now available on ArXiv!
- 2024.10: Β ππ Our work is now under Major Revision by Medical Image Analysis.
- 2024.08: Β ππ Excited to propose my first-author work ALC, a novel adaptive label correction framework for medical image segmentation with noisy labels, which is now available on ArXiv!
π Publications
Multimodal Foundation Models

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tuβ .

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
Chengxuan Qian, Kai Han, Jingchao Wang, Zhenlong Yuan, Chongwen Lyu, Jun Chenβ , Zhe Liuβ .

CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis
MICCAI 2025 Early Accept (Top 9% Paper)
Kai Han, Chongwen Lyu, Chengxuan Qian, Siqi Ma, Jun Chenβ , Zhe Liuβ ,

Contrastive Intra- and Inter-modal Clustering for Multimodal Semantic Discovery
Under Review
Zhengzhong Zhu, Pei Zhou, Chengxuan Qian, Ruohong Yang, Yixuan Ye, Jiangping Zhu
Multimodal LLMs Post-Training

AutoDrive-RΒ²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
Preprint
Zhenlong Yuan, Jing Tang, Jinguo Luo, Rui Chen, Lei Sun, Chengxuan Qian, Yujun Cai, Dapeng Zhang, Shuo Li.

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing, Yuping Wang, Peiran Li, Ruizheng Bai, Yueqi Wang, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tuβ .
Medical Image Analysis

Adaptive Label Correction Framework for Robust Medical Image Segmentation with Noisy Labels
Chengxuan Qian, K Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chenβ , Zhe Liuβ .

Region Uncertainty Estimation for Medical Image Segmentation with Noisy Labels
IEEE Transaction on Medical Imaging (CCF B, IF:9.8)
Kai Han, Shuhui Wang, Jun Chenβ , Chengxuan Qian, Chongwen Lyu, Siqi Ma, Victor S. Shengβ , Qingming Huangβ , Zhe Liuβ .

Frequency Domain Unlocks New Perspectives for Medical Image Segmentation
IEEE Transactions on Circuits and Systems for Video Technology (CCF B, IF: 8.3)(Under Review)
Kai Han, Siqi Ma, Chengxuan Qian, Jun Chenβ , Chongwen Lyu, Victor S. Shengβ , Zhe Liuβ .

Curriculum for Region-guided Automatic Radiology Report Generation
IEEE Transactions on Circuits and Systems for Video Technology (CCF B, IF: 8.3)(Under Review)
Chongwen Lyu, Chengxuan Qian, Kai Han, Jun Chenβ , Victor S. Shengβ , Zhe Liuβ .

LiMT: A Multi-task Liver Image Benchmark Dataset
Medical Image Analysis (IF: 10.7)(Major Revision)
Z Liuβ , K Han, S Ma, J Chenβ , β¦, C Qian, C Lyu, β¦, V S. Shengβ .
-
Dataset and Benchmarking work
-
A multi-task medical image benchmark dataset for Segmentation, Classification and Detection of liver lesions, encompassing CT liver scans annotated for four common liver diseases.
-
Collaborated with researchers from Jiangsu University, Texas Tech University, and clinicians from the Affiliated Hospital of Jiangsu University.
3D Vision

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo
IEEE Transactions on Circuits and Systems for Video Technology (CCF B, IF: 8.3)(Under Review)
Zhenlong Yuan, Dapeng Zhang, Zehao Li, Chengxuan Qian, Jianing Chen, Yinda Chen, Kehua Chen, Tianlu Mao, Zhaoxin Li, Hao Jiang and Zhaoqi Wang

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang.
π Academical Services
- Reviewer of IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia (TMM), IEEE International Conference on Multimedia & Expo (ICME 2025), ICCV 2025 and AAAI 2026.