The intersection of 3D vision and language models (VLMs) in robotics presents a new frontier, blending spatial understanding with contextual reasoning. This workshop, seeks to explore the opportunities and challenges posed by in- tegrating these technologies to enhance robot perception, decision-making, and interaction with the real world. As robots evolve to operate in increasingly com- plex environments, bridging the gap between 3D spatial reasoning and language understanding becomes critical. Key questions at the heart of this workshop include:
Coming soon!
A non-exhaustive list of relevant topics:
Start Time (PDT) | End Time (PDT) | Event |
---|---|---|
9:00 AM | 9:10 AM | Opening remarks |
9:10 AM | 9:45 AM | Hao Su Talk Title (TBD) |
9:45 AM | 10:20 AM | Chelsea Finn Pretraining and Posttraining Robotic Foundation Models |
10:20 AM | 10:55 AM | Angel Chang Building vision-language maps for embodied AI |
10:55 AM | 11:10 AM | Coffee Break |
11:10 AM | 11:45 AM | Yunzhi Li Foundation Models for Structured Scene Modeling in Robotic Manipulation |
11:45 AM | 12:20 PM | Katerina Fragkiada Talk Title (TBD) |
12:20 PM | 1:30 PM | Lunch |
1:30 PM | 2:00 PM | Poster Session |
2:00 PM | 2:35 PM | Ranjay Krishna Talk Title (TBD) |
2:35 PM | 3:10 PM | Chuang Gan Talk Title (TBD) |
3:10 PM | 3:25 PM | Coffee Break |
3:25 PM | 4:00 PM | Justin Johnson Talk Title (TBD) |
4:00 PM | 4:45 PM | Spotlight Paper Talks (5 min talk each / 2 min Q&A) |
4:45 PM | 5:00 PM | Ending Remarks and Paper Awards |
listed alphabetically