The first CVPR workshop on

3D Vision Language Models (VLMs) for Robotics Manipulation: Opportunities and Challenges

June 11, 2025, Nashville, TN. Location: TBA

Introduction

The intersection of 3D vision and language models (VLMs) in robotics presents a new frontier, blending spatial understanding with contextual reasoning. This workshop, seeks to explore the opportunities and challenges posed by in- tegrating these technologies to enhance robot perception, decision-making, and interaction with the real world. As robots evolve to operate in increasingly com- plex environments, bridging the gap between 3D spatial reasoning and language understanding becomes critical. Key questions at the heart of this workshop include:

We are bringing together a diverse group of leading experts in the field to present their latest research findings and future perspectives, with an emphasis on scalable, generalizable, and adaptable 3D VLM frameworks for Robotics. Our goal is for this workshop to inspire and drive future innovations in 3D foundation models specifically realting to the robotics manipulation domain.

Call for Papers

Coming soon!

Paper topics

A non-exhaustive list of relevant topics:

Workshop Schedule (Tentative)

Start Time (PDT) End Time (PDT) Event
9:00 AM 9:10 AM Opening remarks
9:10 AM 9:45 AM Hao Su
Talk Title (TBD)
9:45 AM 10:20 AM Chelsea Finn
Pretraining and Posttraining Robotic Foundation Models
10:20 AM 10:55 AM Angel Chang
Building vision-language maps for embodied AI
10:55 AM 11:10 AM Coffee Break
11:10 AM 11:45 AM Yunzhi Li
Foundation Models for Structured Scene Modeling in Robotic Manipulation
11:45 AM 12:20 PM Katerina Fragkiada
Talk Title (TBD)
12:20 PM 1:30 PM Lunch
1:30 PM 2:00 PM Poster Session
2:00 PM 2:35 PM Ranjay Krishna
Talk Title (TBD)
2:35 PM 3:10 PM Chuang Gan
Talk Title (TBD)
3:10 PM 3:25 PM Coffee Break
3:25 PM 4:00 PM Justin Johnson
Talk Title (TBD)
4:00 PM 4:45 PM Spotlight Paper Talks (5 min talk each / 2 min Q&A)
4:45 PM 5:00 PM Ending Remarks and Paper Awards
The website template is borrowed from here.
For inquiries, contact us at: robo-3dvlm@googlegroups.com