Hi, I am Chengxuan Qian (钱承炫), an incoming CS PhD student at University of California, Santa Barbara (UCSB), advised by Prof. Yao Qin.
My research focuses on identifying and eliminating barriers to foundation models reaching superintelligence. Foundation models grow from large-scale, static, and idealized settings, yet the real world is dynamic and partially observable. Can they actively explore, interact with tools and environments, and connect with memory? Can they simulate world dynamics, imagine future states, and continually evolve from experience? I aim to teach machines to think like humans and explore frontiers beyond human reach.
I will attend ACL 2026 from July 5-8 — see you in San Diego!
🏆 ACL 2026 Oral Presentation: I will present ProgressLM on Tuesday, July 7, Harbor H-I, Oral Session F, 09:00-10:30.
Research Interests
-
Multimodal Intelligence: How can machines extract learnable neural-symbolic concepts from the complex physical world to enable grounded understanding, integration, interaction, and decision-making across multi-sensory signals, ultimately leading to superhuman yet interpretable intelligence?
-
Generative World Modeling: Toward multimodal superintelligence, guiding foundation models to deeply understand the underlying mechanisms of the complex physical world, internalize world dynamics within their parameter space, and reason about complex object properties and interactions in dynamic 3D environments.
-
Real-World Adaption, Robustness, and Generalization: How can we teach foundation models to see, plan, and act in open-world settings, while autonomously interacting with external environments such as tools, knowledge bases, and simulators, thereby continuously extending their capability boundaries in real-world applications?
-
Human-AI Interaction: As foundation models soar in capability, how can we make human-AI interaction simpler and more efficient, empowering zero-knowledge users to reach professional-level outcomes? This remains a promising and enduring frontier.
🔥 News
- 2026.05: 🏆🏆 Recognized as a CVPR 2026 Outstanding Reviewer, top 5% of 17,491 reviewers.
- 2026.05: 🏆🏆 Our work ProgressLM has been selected as an ACL 2026 Oral Presentation (Top 3.3%) 🔥
- 2026.04: 🎉🎉 Our work ProgressLM on General Reward Model for Embodied Agents has been accepted to ACL 2026 Main Conference and ICLR 2026 Workshop on World Models!
- 2026.04: 🎉🎉 My first-author work DynCIM on cross-modal imbalance in multimodal foundation models has been accepted by CVPR 2026 Workshop on Cognitive Foundations for Multimodal Models!
- 2026.03: 🔥🔥 Joined University of California, Santa Barbara (UCSB) as a CS PhD student.
- 2026.02: 🎉🎉 We release What If Agents Could Imagine?, a study that breaks through the static perception barrier of VLMs via active generative world modeling.
- 2026.02: 🎉🎉 Our work fMRI-LM on Medical Foundation Models has been accepted by CVPR 2026!
- 2026.01: 🎉🎉 Three first/co-first author papers have been accepted by ICLR 2026!
- DecAlign: Aligning Cross-Modal Semantics for Multimodal Foundation Models
- AutoDrive-R²: Towards Physical-Grounded Multimodal Reasoning for Autonomous Driving
- Video-STAR: Tool-Augmented Agentic RL for Thinking with Videos
- 2026.01: 🎉🎉 We propose ProgressLM, which further investigates whether VLMs can acquire human-like, generalizable mental understanding and simulation in embodied scenarios from a single example, and serves as an early step toward building general-purpose reward models. See More: [Website] [Paper] [Code] [Model] [Dataset]
- 2025.11: 🎉🎉 Our work LiMT, an unified multi-task liver image benchmark work, has been accepted by Journal of Biomedical and Health Informatics (JBHI)!
- 2025.10: 🎉🎉 Our work DVP-MVS++, Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo, has been accepted by IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT)!
- 2025.10: 🎉🎉 My first-author work on Medical Segmentation under sparse and noisy labeled annotations has been accepted by BIBM AIBH 2025!
- 2025.10: 🎉🎉 We propose Video-STAR, a powerful Tool-Augmented Agentic RL approach for Thinking with Videos. On open-vocabulary action recognition benchmarks like K-400 and HMDB-51, our 3B VLM achieves nearly 40% accuracy improvement over base models!🔥
- 2025.09: 🎉🎉 Our work HAIF-GS, Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene, has been accepted by NeurIPS 2025!
- 2025.09: 🎉🎉 We propose AutoDrive-R², Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving. We’re also honored that our work was featured by AutoDrive Heart (自动驾驶之心)!
- 2025.08: 🎉🎉 Our work Re-Align has been accepted by EMNLP 2025 Main Conference!
- 2025.07: 🎉🎉 Our work on Generalizable Medical Vision has been Accepted by IEEE Transactions on Medical Imaging.
- 2025.05: 🎉🎉 Our work CLIMD has been Early Accepted by MICCAI 2025 (Top 9%).
- 2025.03: 🎉🎉 Excited to propose my first-author work DecAlign, a novel cross-modal decoupling and alignment framwork for multimodal representation learning.
- 2024.11: 🎉🎉 Excited to propose my first-author work DynCIM, a novel dynamic multimodal curriculum learning framework in addressing cross-modal competition and imbalances, which is now available on ArXiv!
- 2024.10: 🎉🎉 We propose FASS, a novel frequency domain-enhanced approach for Medical Image Segmentation under Low-Contrast environment.
📝 Selected Publications (For the full list, please see Google Scholar)

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Multimodal Alignment Foundation Model Interpretability
ICLR 2026
Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tu†.

ProgressLM: Towards Progress Reasoning in Vision-Language Models
Spatial Intelligence Embodied Robotics Data-Centric Multimodal Reasoning Open-World Applications
Website Paper Code Model Dataset
🏆 ACL 2026 (Oral Presentation, Top 3.3%) and ICLR 2026 Workshop on World Models
Jianshu Zhang*, Chengxuan Qian*, Haosen Sun, Haoran Lu, Dingcheng Wang, Letian Xue, Han Liu

Multimodal Reasoning Autonomous Driving Open-World Applications
Featured by AutoDrive Heart (自动驾驶之心)
ICLR 2026
Zhenlong Yuan*, Chengxuan Qian*, Jing Tang, Jinguo Luo, Rui Chen, Lei Sun, Xiangxiang Chu, Yujun Cai, Dapeng Zhang, Shuo Li.

Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Think with Videos Tool-Using Agent Multi-turn Agentic RL
Paper Code 3B Model 7B Model Dataset
ICLR 2026
Zhenlong Yuan*, Xiangyan Qu*, Chengxuan Qian*, Rui Chen, Jing Tang, Lei Sun, Xiangxiang Chu, Dapeng Zhang, Yiwei Wang, Yujun Cai, Shuo Li.
🌟 Misc
I’m grateful for the mentorship of Prof. Zhengzhong Tu (TAMU), Prof. Yue Zhao (USC), and Prof. Han Liu (Northwestern University) during my undergraduate years. Outside of research, I enjoy Photography📹, swimming🏊, biking🚴, billiards🎱, table tennis🏓. I strive to stay energetic every day and maintain a strong sense of passion for both academic research and life.
🎖 Academical Services
- Journal Reviewer: IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia (TMM), Pattern Recognition (PR).
- Conference Reviewer: ICME 2025-2026, AAAI 2026, ICASSP 2026, CVPR 2026, NeurIPS 2026.
- Workshop Reviewer: ACL 2025 SRW, NeurIPS 2025 Imageomics, NeurIPS 2025 Efficient Reasoning, ICLR 2026 Workshop on Lifelong Agents, ICLR 2026 Workshop World Models.