Zun Wang

I'm a first-year CS Ph.D. student at UNC Chapel Hill, advised by Prof. Mohit Bansal. I was a Master of Machine Learning and Computer Vision student at the Australian National University advised by Prof. Stephen Gould. In my master time, I also interned at the OpenGVLab, Shanghai AI Laboratory, led by Prof. Yu Qiao. Before that, I got my bachelor degree in applied mathematics from the University of Science and Technology of China.

Email / CV / twitter / Google Scholar / Github

Research

My research goal is to build multimodal, generative, and embodied agents, with current interests in:

Multimodal Understanding and Generation
Scalable Learning for Embodied Agents
Multimodal Data Generation and Curation

News

(2025-05) New preprint! Check EPiC for efficient video camera control learning.
(2025-05) Summer Intern at 🛒 Amazon!
(2025-01) Self-refining Data Flywheel for high-quality VLN data generation is accepted to ICLR 2025! 🤖 surpasses human on R2R-VLN for the first time!
(2024-11) New preprint DreamRunner✨ for storytelling video generation! My first PhD project at UNC MURGE-Lab🥳!
(2024-11) Our VLN survey paper is accepted to TMLR!
(2024-08) Started my Ph.D. in the MURGe Lab at UNC Chapel Hill. Hello UNC😆!
(2024-07) Two paper accepted to ECCV 2024! Congrats Gengze and InternVideo Team!
(2024-04) One paper accepted to TPAMI! Congrats Dong!
(2024-02) One paper accepted to CVPR 2024 as Highlight! Congrats Kunchang!
(2023-10) Attending ICCV2023 @ Paris in person😆! Great pleasure to learn from so many researchers/scholars🥹!
(2023-07) One paper accepted to ICCV 2023 as Oral presentation!
(2023-07) I'm awarded a Postgraduate Medal for Academic Excellence from ANU!Photos here!
(2023-07) I graduated from Masters of Machine Learning and Computer Vision with Commendation from ANU.
(2022-11) I'm awarded the Chancellor's Letter of Commendation from ANU.
(2022-09) We won 1st place of the REVERIE VLN Challenge in CSIG 2022!
(2022-06) We won 1st place of the RxR-Habitat VLN Competition in Embodied AI Workshop, CVPR 2022!
(2022-03) We have a paper accepted to CVPR 2022!

Papers

EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance
Zun Wang, Jaemin Cho, Jialu Li, Han Lin, Jaehong Yoon, Yue Zhang, Mohit Bansal
preprint, 2025
paper / code / project page

DreamRunner: Fine-grained Storytelling Video Generation with Retrieval-augmented Motion Adaptation
Zun Wang, Jialu Li, Han Lin, Jaehong Yoon, Mohit Bansal
Preprint
paper / code / project page

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang
ICLR, 2025
paper / code

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu
preprint
paper / code

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Yue Zhang*, Ziqiao Ma*, Jialu Li *, Yanyuan Qiao*, Zun Wang* , Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi
TMLR, 2024
paper

MVBench: A Comprehensive Multi-Modal Video Understanding Benchmark
Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, Limin Wang, Yu Qiao
CVPR, 2024, Highlight (3%)
paper / code

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang*, Kunchang Li*, Xinhao Li*, Jiashuo Yu*, Yinan He*, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang
ECCV, 2024
paper / code

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu
ECCV, 2024
paper / code

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang
TPAMI, 2024
paper / code

Scaling Data Generation in Vision-and-Language Navigation
Zun Wang*, Jialu Li*, Yicong Hong*, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao
ICCV, 2023, Oral presentation (1.9%)
paper / code / project page

InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang*, Kunchang Li*, Yizhuo Li*, Yinan He*, Bingkun Huang*, Zhiyu Zhao*, Hongjie Zhang*, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Yali Wang, Limin Wang, Yu Qiao
Technical Report , 2022
paper / code

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Yicong Hong*, Zun Wang*, Qi Wu, Stephen Gould
CVPR, 2022
paper / code

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)
Dong An*, Zun Wang*, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao
Technical Report , 2022
paper

Competitions

	REVERIE Challenge @ CSIG 2022 Our team BPT (Zun Wang, Yi Wang, Yinan He, Yu Qiao) is the Winner of both channels (out of 50+ teams). report / certificate (channel 1) / certificate (channel 2) / leaderboard
	RxR-Habitat Competition @ CVPR 2022 Our team Joyboy (Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao) is the Winner of the competition. Our solution improves SoTA performance from 37% to 55%. report / certificate / leaderboard

Collaborators

I work closely and discuss deeply with my friend Dr. Yicong Hong. I also share lots of VLN thinkings and collaborate with my friend Dong An.

Source pages from here.