I am a rising senior undergraduate student with a strong interest in Multimodal Large Language Models (MLLMs), Agentic AI, Embodied AI, Spatial Intelligence and Video Understanding. I am fortunate to collaborate with Zhengzhong Tu, Manling Li, Yue Zhao, and Jiacheng Zhu on research focused on Reasoning and Alignment in Vision-Language Models (VLMs). Prior to that, I worked with Zhe Liu, and Victor S. Sheng on research in Robust Medical Vision and Multimodal Machine Learning. I am deeply grateful to them for guiding me into the world of research.
I am actively seeking a Ph.D. position in Computer Science for Fall 2026. I would be excited to collaborate with like-minded researchers on a broad range of topics, including Large Language Models (LLMs), Vision-Language Models (VLMs), Agentic AI, and Embodied AI. Please feel free to reach out if our interests align, my wechat is qiancxdotcom.
Research Interests
My long-term vision is to develop efficient, robust, and generalizable machine learning systems capable of perceiving, understanding, and interacting with the world through multimodal information. I am particularly interested in advancing LLMs combined with vision, audio, action, and other modalities toward Agentic and Embodied AI systems that can reason, plan, and act in complex environments — enabling intelligent agents to interact with humans and make decisions across both physical and web-based settings. Specifically, my previous research focuses on these topics:
- Generalizable Medical Image Segmentation with Sparse and Noisy Labeled Data
- Modality Competition and Imbalances for Multimodal Machine Learning
- Cross-modal Decoupling and Alignment for Multimodal Foundation Models
- Aligning Large Vision-language Models with Human Preference
- Reasoning and Alignment for Large Vision-language Models
- Reforcement Learning-driven Open-World Embodied Agents
🔥 News
- 2025.06: 🎉🎉 We propose DVP-MVS++, a multi-view stereo method that integrates depth-normal-edge priors and visibility guidance for robust 3D Reconstruction, which is now available on ArXiv!
- 2025.06: 🎉🎉 We propose HALF-GS, an efficient dynamic 3D reconstruction framework combining sparse anchors, self-supervised guidance, and hierarchical propagation to improve reconstruction quality and temporal consistency, which is now available on ArXiv!
- 2025.05: 🎉🎉 Our work CLIMD has been Early Accepted by MICCAI 2025 (Top 9%), ArXiv is coming soon.
- 2025.05: 🎉🎉 Our paper is now under Accept pending minor revision by IEEE Transaction on Medical Imaging (IF: 8.9).
- 2025.03: 🎉🎉 Excited to propose my first-author work DecAlign, a novel cross-modal decoupling and alignment framwork for multimodal representation learning, which is now available on ArXiv!
- 2025.02: 🎉🎉 Excited to propose Re-Align, a novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models, which is now available on ArXiv!
- 2024.11: 🎉🎉 Excited to propose my first-author work DynCIM, a novel dynamic multimodal curriculum learning framework in addressing cross-modal competition and imbalances, which is now available on ArXiv!
- 2024.11: 🎉🎉 Our work is now under Major Revision by IEEE Transaction on Medical Imaging (IF: 8.9).
- 2024.10: 🎉🎉 Our work is now under Major Revision by Medical Image Analysis (IF: 10.9).
- 2024.08: 🎉🎉 Excited to propose my first-author work ALC, a novel adaptive label correction framework for medical image segmentation with noisy labels, which is now available on ArXiv!
📝 Publications

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tu†.

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
Chengxuan Qian, Kai Han, Jingchao Wang, Zhenlong Yuan, Chongwen Lyu, Jun Chen†, Zhe Liu†.

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing, Yuping Wang, Peiran Li, Ruizheng Bai, Yueqi Wang, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tu†.

Adaptive Label Correction Framework for Robust Medical Image Segmentation with Noisy Labels
Chengxuan Qian, K Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chen†, Zhe Liu†.

CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis
MICCAI 2025 Early Accept (Top 9% Paper)
Kai Han, Chongwen Lyu, Chengxuan Qian, Siqi Ma, Jun Chen†, Zhe Liu†,

Region Uncertainty Estimation for Medical Image Segmentation with Noisy Labels
IEEE Transaction on Medical Imaging (CCF B, IF:8.9)(Accept pending minor revision)
Kai Han, Shuhui Wang, Jun Chen†, Chengxuan Qian, Chongwen Lyu, Siqi Ma, Victor S. Sheng†, Qingming Huang†, Zhe Liu†.

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo
IEEE Transactions on Circuits and Systems for Video Technology (CCF B, IF: 8.3)(Under Review)
Zhenlong Yuan, Dapeng Zhang, Zehao Li, Chengxuan Qian, Jianing Chen, Yinda Chen, Kehua Chen, Tianlu Mao, Zhaoxin Li, Hao Jiang and Zhaoqi Wang

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang.

Frequency Domain Unlocks New Perspectives for Medical Image Segmentation
IEEE Transactions on Circuits and Systems for Video Technology (CCF B, IF: 8.3)(Under Review)
Kai Han, Siqi Ma, Chengxuan Qian, Jun Chen†, Chongwen Lyu, Victor S. Sheng†, Zhe Liu†.

Curriculum for Region-guided Automatic Radiology Report Generation
IEEE Transactions on Circuits and Systems for Video Technology (CCF B, IF: 8.3)(Under Review)
Chongwen Lyu, Chengxuan Qian, Kai Han, Jun Chen†, Victor S. Sheng†, Zhe Liu†.

LiMT: A Multi-task Liver Image Benchmark Dataset
Medical Image Analysis (IF: 10.7)(Major Revision)
Z Liu†, K Han, S Ma, J Chen†, …, C Qian, C Lyu, …, V S. Sheng†.
-
Dataset and Benchmarking work
-
A multi-task medical image benchmark dataset for Segmentation, Classification and Detection of liver lesions, encompassing CT liver scans annotated for four common liver diseases.
-
Collaborated with researchers from Jiangsu University, Texas Tech University, and clinicians from the Affiliated Hospital of Jiangsu University.

Diffusion Contrastive Learning for Image Classification
Under Review
Xincheng Zhu, Yonghan Lu, Kai Han, Chongwen Lyu, Chengxuan Qian, J Chen†, Z Liu†.
Note: Details of some papers above are not allowed to show since they are currently under reviewed by double-blind conference. † is the note for advisor.
🎖 Academical Services
- Reviewer of IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia (TMM), IEEE International Conference on Multimedia & Expo (ICME 2025) and ICCV 2025.
💬 Open-source Projects
- Re-Align, a novel Direct Preference Optimization (DPO)-based alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models. See more in the corresponding website with codes.
- DecAlign, a novel cross-modal decoupling and alignment framwork for multimodal representation learning. See more in the corresponding website with codes(Will be fully released soon!).