Opening Remarks - Sofien Bouaziz (Meta)
Philip Rosedale (Linden Lab)
(Remote)
Talk Title: Computer Vision for Avatar Communication
Bio: Philip Rosedale is the Founder of Linden Lab, parent company of Second Life, an open-ended, Internet-connected virtual world and pioneering metaverse. Following Second Life, he worked on several projects related to distributed work and computing. Excited by innovations in these areas and the proliferation of new VR-enabling devices, he re-entered the virtual worlds space in 2013, co-founding High Fidelity, a company devoted to exploring the future of next-generation shared virtual reality. Philip rejoined Second Life in 2022, as Strategic Advisor, focused on helping to shape and build a better metaverse.
Prior to Linden Lab, Philip created an innovative Internet video conferencing product ("FreeVue"), which was later acquired by RealNetworks, where he went on to become Vice President and CTO.
Michael Kass (NVIDIA)
(Remote)
Talk Title: Computer Vision and the Metaverse
Abstract: Computer vision technology has advanced tremendously in recent years, but it is of limited use by itself. It’s full value is only realized when it becomes well integrated into larger user solutions in the real or virtual world, involving training, inference, performance evaluation or other aspects. Custom integrations are always possible, but if we can integrate computer vision into standardized infrastructure, it will become vastly simpler to deploy.
Our best example of a universal platform for integrating varied information technologies together is the world-wide web. Unfortunately the current web has its feet solidly planted in the 2D world. A variety of efforts have been made to add 3D capabilities to the web, but these have failed to address the central issue. 3D is fundamentally harder than 2D. So in order to create the solution we really want, we should start with the needs and requirements of 3D, and then integrate 2D into a proper 3D web instead of the other way around. Here, we explore what that proper 3D web should look like, and how it can act as the foundation of the metaverse. The full vision remains to be realized, but we will show some of the concrete steps that NVIDIA has already taken in this direction with the Omniverse platform.
Bio: Michael Kass is a senior distinguished engineer at NVIDIA and the architect of NVIDIA Omniverse, NVIDIA's platform for collaborative 3D content creation and digital twins. In 2005, Kass received a Scientific and Technical Academy Award for “pioneering work in physically-based computer-generated techniques used to simulate realistic cloth in motion pictures.” In 2009, he received the ACM Computer Graphics Achievement Award for "his extensive and significant contributions to computer graphics, ranging from image processing to animation to modeling and in particular for his introduction of optimization techniques as a fundamental tool in graphics." And in 2017, the ACM honored him as an ACM Fellow “for contributions to computer vision and computer graphics, particularly optimization and simulation.” Kass has been granted over 30 U.S. patents, and was honored in 2018 as Inventor of the Year by the NY Intellectual Property Law Association. Before switching to computer graphics, he had an extensive career in computer vision. His Helmholtz-award winning computer vision paper “Snakes: Active contour models” is one of the most cited papers in computer science with over 25k citations. Kass holds a B.A. from Princeton, an M.S. from M.I.T. and a Ph.D. from Stanford.
AM Break
Erica Kaitz (Amelia Virtual Care)
(Remote)
Talk Title: Virtual Reality and the treatment of PTSD and Behavioral Health Conditions
Abstract: In this presentation I will review the value of VR in treating PTSD and in assisting patients in processing and working through behavioral health conditions.
Bio: Erica Kaitz, LCSW, is the VP of Behavioral Health at Amelia Virtual Care.
Ronald Mallet (Meta)
(In person)
Title: Biophysical Digital Characters in the Metaverse
Abstract: To be successful, the Metaverse will need to provide its users a full sense of immersion and the ability to naturally interact with the world and others around them, in ways that they feel comfortable with. It also will present the unique opportunity for users to control their appearance and express themselves in distinctive and creative ways, beyond what’s possible in the real world. In this talk we’ll explore ideas and concepts to enable such capabilities at scale, and discuss what kind of technological breakthroughs and intuitive content creation tools will be needed.
Bio: Ronald Mallet is the director and founder of Meta Reality Labs Research in Sausalito & Zurich, working on the future of digital humans and characters for AR/VR applications. The team's research includes biomechanical motion analysis, data-driven biophysical simulations, machine perception, and photorealistic rendering, spanning sub-disciplines in computer vision, computer graphics, and machine learning. Prior to joining Meta, he was a lead researcher at Industrial Light & Magic, a Lucasfilm division, working on cutting edge technologies to deliver visual effects and digital characters for high-end feature films, including Avatar, Star Wars, Harry Potter, Pirates of the Caribbean, and many others. He received an Academy Award for Technical Achievement for his ground breaking work on markerless full body on-set motion capture. Prior to ILM, he held various research and engineering positions, including leading the awarding-winning MatchMover software project for 3d camera tracking.
Panel #1: Philip Rosedale (Linden Lab), Michael Kass (NVIDIA), Erica Kaitz (Amelia Virtual Care), Ronald Mallet (Meta)
Moderators: Andrew Rabinovich (Headroom Inc.), Serge Belongie (University of Copenhagen)
Lunch Break
Poster Session - Organizer: Harald Haraldsson (Cornell Tech)
For in-person attendees and authors: Poster hall, boards 8b-27b (see signs at venue).
For virtual attendees and authors: Virtual poster session on Gatherly (see link on CVPR virtual site).
Spotlight videos can be watched asynchronously here. To interact with authors please join the poster session.
See all submissions here.
Sergey Tulyakov (Snap Inc.)
(In person)
Title: Object Digitization, Manipulation and Rendering for Immersive Experiences
Abstract: The digitization, manipulation and rendering of objects requires a great deal of skill and time, substantially limiting the available immersive experiences. In our work, we build tools to simplify this process and make it intuitive. To move the needle here, there are two key questions that need to be considered: 1) how to digitize and insert objects into existing scenes and 2) how does one interactively manipulate objects within volumetric environments? We’d like to solve both in 3D. To answer the first question, we show an efficient method to neural object capture and rendering from online images. Given images with the same object in different environments and lighting conditions, our method estimates material properties making it seamless to insert neural objects into scenes. To answer the second question we present a new volumetric representation, dubbed “playable environments,” which allows one to play objects inside scenes with intuitive controls, camera and style changes, making scene manipulation akin to playing a game.
Bio: Sergey Tulyakov is a Principal Research Scientist heading the Creative Vision team at Snap Inc. His work focuses on creating methods for manipulating the world via computer vision and machine learning. This includes human and object understanding, photorealistic manipulation and animation, video synthesis, prediction and retargeting. He pioneered the unsupervised image animation domain with MonkeyNet and First Order Motion Model that sparked a number of startups in the domain. His work on Interactive Video Stylization received the Best in Show Award at SIGGRAPH Real-Time Live! 2020. He has published 30+ top conference papers, journals and patents resulting in multiple innovative products, including Snapchat Pet Tracking, OurBaby, Real-time Neural Lenses (gender swap, baby face, aging lens, face animation) and many others. Before joining Snap Inc., Sergey was with Carnegie Mellon University, Microsoft, NVIDIA. He holds a PhD degree from the University of Trento, Italy.
Lourdes Agapito (University College London)
(Remote)
Learning 3D Representations of the World from Images and Video
Bio: Lourdes Agapito holds the position of Professor of 3D Vision at the Department of Computer Science, University College London (UCL). Her research in computer vision has consistently focused on the inference of 3D information from single images or videos acquired from a single moving camera. She received her BSc, MSc and PhD degrees from the Universidad Complutense de Madrid (Spain). In 1997 she joined the Robotics Research Group at the University of Oxford as an EU Marie Curie Postdoctoral Fellow. In 2001 she was appointed as Lecturer at the Department of Computer Science at Queen Mary University of London. From 2008 to 2014 she held an ERC Starting Grant funded by the European Research Council to focus on theoretical and practical aspects of deformable 3D reconstruction from monocular sequences. In 2013 she joined the Department of Computer Science at University College London and was promoted to full professor in 2015. She now heads the Vision and Imaging Science Group, is a founding member of the AI centre and co-director of the Centre for Doctoral Training in Foundational AI. Lourdes serves regularly as Area Chair for the top Computer Vision conferences (CVPR, ICCV, ECCV) was Program Chair for CVPR 2016 and will serve again for ICCV 2023. She was keynote speaker at ICRA 2017 and ICLR 2021. In 2017 she co-founded Synthesia, the London based synthetic media startup responsible for the AI technology behind the Malaria no More video campaign that saw David Beckham speak 9 different languages to call on world leaders to take action to defeat Malaria.
PM Break
Angela Dai (TU Munich)
(In person)
Title: Towards Commodity 3D Content Creation
Abstract: With the increasing availability of high quality imaging and even depth imaging now available as commodity sensors, comes the potential to democratize 3D content creation. State-of-the-art reconstruction results from commodity RGB and RGB-D sensors have achieved impressive tracking, but reconstructions remain far from usable in practical applications such as mixed reality or content creation, since they do not match the high quality of artist-modeled 3D graphics content: models remain incomplete, unsegmented, and with low-quality texturing. In this talk, we will address these challenges: I will present a self-supervised approach to learn effective geometric priors from limited real-world 3D data, then discuss object-level understanding of from a single image, followed by realistic 3D texturing from real-world image observations. This will help to enable a closer step towards commodity 3D content creation.
Bio: Angela Dai is an Assistant Professor at the Technical University of Munich where she leads the 3D AI group. Prof. Dai's research focuses on understanding how the 3D world around us can be modeled and semantically understood. Previously, she received her PhD in computer science from Stanford in 2018 and her BSE in computer science from Princeton in 2013. Her research has been recognized through a Eurographics Young Researcher Award, ZDB Junior Research Group Award, an ACM SIGGRAPH Outstanding Doctoral Dissertation Honorable Mention, as well as a Stanford Graduate Fellowship.
Omer Shapira (NVIDIA)
(Remote)
Title: Hyperscale Spatial Computing: Implications of Inverting the Bandwidth Funnel
Abstract: Recent advances in compute pipelines have enabled leaps in body-centered technology such as Ray-Traced Virtual Reality. Simultaneously, network bottlenecks have decreased to the point that streaming pixels directly from datacenters to HMDs is a reality. This talk explores the potential of body-centered computing at datacenter scales - what applications, experiences and new science it enables.
Bio: Omer Shapira is an Engineer and Artist, leading the Omniverse Extended Reality group at NVIDIA. Omer’s work and research focuses on Virtual Reality, Human-Robot Interaction, Synthetic Data for Autonomous Systems, Haptics, and Collaborative Hyperscale Computing. Omer's work has been published and displayed at SIGGRAPH, IEEE Robosoft, CVPR, The Barbican, Tribeca Film Festival, Sundance Film Festival, Eyebeam and others. Before working at NVIDIA, Omer was Director of Virtual Reality at Fake Love (A New York Times Company), Software Engineer at Framestore and Director, Editor and Talent at Channel 10 (Israel). Omer studied Mathematics and Linguistics in Tel Aviv University and HCI in New York University.
Panel #2: Sergey Tulyakov (Snap Inc.), Lourdes Agapito (University College London), Angela Dai (TU Munich), Omer Shapira (NVIDIA)
Moderators: Fernando De la Torre (CMU), Natalia Neverova (Meta)
Concluding Remarks