I am a rising senior undergraduate student with a strong interest in Multimodal LLMs Post-Training and Tool-Augmented Agentic System. I am fortunate to collaborate with Manling Li, Zhengzhong Tu, Yue Zhao, and Jiacheng Zhu on research focused on Multimodal Reasoning, Alignment, RAG and Agentic System. Prior to that, I worked with Zhe Liu and Victor S. Sheng on research in Robust Medical Vision and Multimodal Machine Learning. I am deeply grateful to them for guiding me into the world of research.

I am actively seeking a 26 Fall CS PhD position. I would be excited to collaborate with like-minded researchers on a broad range of topics, including LLMs, VLMs, Agents, and Embodied AI. Please feel free to reach out if our interests align, my wechat is qiancxdotcom.

Research Interests

My long-term vision is to develop efficient, robust, and generalizable machine learning systems capable of perceiving, understanding, and interacting with the world through multimodal information. I am particularly interested in advancing LLMs combined with vision, audio, action, and other modalities toward Agentic and Embodied AI systems that can reason, plan, and act in complex environments β€” enabling intelligent agents to interact with humans and make decisions across both physical and web-based settings. Specifically, my previous research focuses on these topics:

  • Generalizable Medical Vision
  • 3D Scene Reconstruction
  • Multimodal Foundation ModelsπŸ”₯
  • Visual Reasoning and AlignmentπŸ”₯πŸ”₯
  • Mutimodal RAG and Search AgentπŸ”₯πŸ”₯
  • Tool-Augmented Vision-Language Agentic SystemsπŸ”₯πŸ”₯πŸ”₯

πŸ”₯ News

  • 2025.08: Β πŸŽ‰πŸŽ‰ Our work Re-Align has been accepted by EMNLP 2025 Main Conference!
  • 2025.08: Β πŸŽ‰πŸŽ‰ I will serve as the Program Committee for AAAI 2026!.
  • 2025.07: Β πŸŽ‰πŸŽ‰ Our work on Generalizable Medical Vision has been Accepted by IEEE Transactions on Medical Imaging.
  • 2025.06: Β πŸŽ‰πŸŽ‰ We propose DVP-MVS++, a multi-view stereo method that integrates depth-normal-edge priors and visibility guidance for robust 3D Reconstruction, which is now available on ArXiv!
  • 2025.06: Β πŸŽ‰πŸŽ‰ We propose HALF-GS, an efficient dynamic 3D reconstruction framework combining sparse anchors, self-supervised guidance, and hierarchical propagation to improve reconstruction quality and temporal consistency, which is now available on ArXiv!
  • 2025.05: Β πŸŽ‰πŸŽ‰ Our work CLIMD has been Early Accepted by MICCAI 2025 (Top 9%).
  • 2025.03: Β πŸŽ‰πŸŽ‰ Excited to propose my first-author work DecAlign, a novel cross-modal decoupling and alignment framwork for multimodal representation learning, which is now available on ArXiv!
  • 2025.02: Β πŸŽ‰πŸŽ‰ Excited to propose Re-Align, a novel RAG-enhanced DPO framework to mitigate hallucinations in Vision Language Models, which is now available on ArXiv!
  • 2024.11: Β πŸŽ‰πŸŽ‰ Excited to propose my first-author work DynCIM, a novel dynamic multimodal curriculum learning framework in addressing cross-modal competition and imbalances, which is now available on ArXiv!
  • 2024.10: Β πŸŽ‰πŸŽ‰ Our work is now under Major Revision by Medical Image Analysis.
  • 2024.08: Β πŸŽ‰πŸŽ‰ Excited to propose my first-author work ALC, a novel adaptive label correction framework for medical image segmentation with noisy labels, which is now available on ArXiv!

πŸ“ Publications

Multimodal Foundation Models

Preprint
sym

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning

Arxiv Preprint

Chengxuan Qian, Kai Han, Jingchao Wang, Zhenlong Yuan, Chongwen Lyu, Jun Chen†, Zhe Liu†.

MICCAI 2025
sym

CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis

MICCAI 2025 Early Accept (Top 9% Paper)

Kai Han, Chongwen Lyu, Chengxuan Qian, Siqi Ma, Jun Chen†, Zhe Liu†,

Preprint
sym

Contrastive Intra- and Inter-modal Clustering for Multimodal Semantic Discovery

Under Review

Zhengzhong Zhu, Pei Zhou, Chengxuan Qian, Ruohong Yang, Yixuan Ye, Jiangping Zhu

Multimodal LLMs Post-Training

Preprint
sym

AutoDrive-RΒ²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving

Preprint

Zhenlong Yuan, Jing Tang, Jinguo Luo, Rui Chen, Lei Sun, Chengxuan Qian, Yujun Cai, Dapeng Zhang, Shuo Li.

Preprint
sym

Medical Image Analysis

Preprint
sym

Adaptive Label Correction Framework for Robust Medical Image Segmentation with Noisy Labels

Arxiv Preprint

Chengxuan Qian, K Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chen†, Zhe Liu†.

IEEE TMI 2025
sym

Region Uncertainty Estimation for Medical Image Segmentation with Noisy Labels

IEEE Transaction on Medical Imaging (CCF B, IF:9.8)

Kai Han, Shuhui Wang, Jun Chen†, Chengxuan Qian, Chongwen Lyu, Siqi Ma, Victor S. Sheng†, Qingming Huang†, Zhe Liu†.

TCSVT 2024
sym

Frequency Domain Unlocks New Perspectives for Medical Image Segmentation

IEEE Transactions on Circuits and Systems for Video Technology (CCF B, IF: 8.3)(Under Review)

Kai Han, Siqi Ma, Chengxuan Qian, Jun Chen†, Chongwen Lyu, Victor S. Sheng†, Zhe Liu†.

TCSVT 2024
sym

Curriculum for Region-guided Automatic Radiology Report Generation

IEEE Transactions on Circuits and Systems for Video Technology (CCF B, IF: 8.3)(Under Review)

Chongwen Lyu, Chengxuan Qian, Kai Han, Jun Chen†, Victor S. Sheng†, Zhe Liu†.

MedIA 2024
sym

LiMT: A Multi-task Liver Image Benchmark Dataset

Medical Image Analysis (IF: 10.7)(Major Revision)

Z Liu†, K Han, S Ma, J Chen†, …, C Qian, C Lyu, …, V S. Sheng†.

  • Dataset and Benchmarking work

  • A multi-task medical image benchmark dataset for Segmentation, Classification and Detection of liver lesions, encompassing CT liver scans annotated for four common liver diseases.

  • Collaborated with researchers from Jiangsu University, Texas Tech University, and clinicians from the Affiliated Hospital of Jiangsu University.

3D Vision

Preprint
sym

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo

IEEE Transactions on Circuits and Systems for Video Technology (CCF B, IF: 8.3)(Under Review)

Zhenlong Yuan, Dapeng Zhang, Zehao Li, Chengxuan Qian, Jianing Chen, Yinda Chen, Kehua Chen, Tianlu Mao, Zhaoxin Li, Hao Jiang and Zhaoqi Wang

Preprint
sym

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene

Arxiv Preprint

Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang.

πŸŽ– Academical Services

  • Reviewer of IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia (TMM), IEEE International Conference on Multimedia & Expo (ICME 2025), ICCV 2025 and AAAI 2026.