
Opening Remarks

Rachel McDonnell (Trinity College Dublin)
Talk: Should we tread softly across the uncanny valley?
Abstract: In recent days, virtual humans are increasing in popularity across many different domains. Besides their traditional use in gaming, and VFX, we are now seeing a huge increase in their use in newer applications such as AR/VR, video conferencing, social media influencers and youtubers, virtual assistants, or for therapy/learning. In the next 5 years, we will become much more accustomed to conversing with virtual humans for a range of tasks. However, research on the perception of virtual humans is not advancing as quickly, and we are lacking knowledge on how humans really perceive these types of interactions - do they register as real human encounters or an interaction with a robot? Achieving photorealism will surely make this distinction more difficult in the future. In this talk, I will discuss some of our recent research on the perception of virtual and augmented humans, focusing on the effect of photorealism.
Bio: Rachel McDonnell is an Associate Professor of Creative Technologies at Trinity College Dublin, and a principal investigator with ADAPT, Trinity’s Centre for AI-driven Digital Content Technology. She combines research in cutting-edge computer graphics and investigating the perception of virtual characters to both deepen our understanding of how virtual humans are perceived, and directly provide new algorithms and guidelines for industry developers on where to focus their efforts. She has published over 100 papers in conferences and journals in her field, including many top-tier publications at venues such as SIGGRAPH, Eurographics, TOCHI, and IEEE TVCG, etc. She has served as Associate Editor journals such as ACM Transactions on Applied Perception and Computer Graphics Forum, and a regular member of many international program committees (including ACM SIGGRAPH and Eurographics). She was recently elected a Fellow of Trinity College Dublin.

Matthias Grundmann (Google)
Talk: On-device ML solutions for Mobile and Web
Abstract: In this talk, I will present several on-device Machine Learning (ML) solutions for mobile and web that are powering a wide range of impactful Google Products. On-device ML has major benefits enabling low-latency, offline and privacy-preserving approaches. However, to ship these solutions in production, we need to overcome substantial technical challenges to deliver on-device ML in real-time and with low-latency. Once solved, our solutions power applications like background replacement and light adjustment in Google Meet, AR effects in YouTube and Duo, gesture controls of devices and view-finder tracking for Google Lens and Translate.
In this talk, I will cover some of the core-recipes behind Google’s on-device ML solutions, from model design over enabling ML solutions infrastructure (MediaPipe) to on-device ML inference acceleration. In particular we will be covering video segmentation, face meshes and iris tracking, hand tracking for gesture control and body tracking to power 3D avatars. The covered solutions are also available to the research and developer community via MediaPipe, —an open source cross platform framework for building customizable ML pipelines for mobile, web, desktop and python.
Bio: Matthias Grundmann is a Director of Research at Google working in the area of Machine Learning, Computer Vision and Computational Video. He is leading a vertical team of ~40 Applied ML and Software Engineers with focus on Machine Learning solutions for Live ML (low-latency, on-device and real-time). His team develops high-quality, cross-platform ML solutions (MediaPipe) driven by GPU/CPU accelerated ML inference (TFLite GPU and XNNPack) for mobile and web. Among the wide portfolio of technologies his team develops are solutions for hand and body tracking, high-fidelity facial geometry and iris estimation, video segmentation for Google Meet and YouTube, 2D object and calibration-free 6 DOF camera tracking, 3D object detection, Motion Photos and Live Photo stabilization.
Matthias received his Ph.D. from the Georgia Institute of Technology in 2013 for his work on Computational Video with focus on Video Stabilization and Rolling Shutter removal for YouTube. His work on Rolling Shutter removal won the best paper award at ICCP, 2012. He was the recipient of the 2011 Ph.D. Google Fellowship in Computer Vision.

Short Break

Kaan Akşit (University College London)
Talk: Towards Unifying Display Experiences with Computer-Generated Holography
Abstract: From smartphones to desktop computers, display technologies play a crucial role in shaping how we exchange visual information. The most significant challenges in display technologies are allowing most users to access a more extensive set of comfortable visual experiences and generating authentic three-dimensional visual experiences inherent to the human visual system. This talk's overarching aim is to formulate a new research ground to address these issues by inventing and co-designing proof-of-concept hardware and software for the future's display.
A common consensus among academia and industry is that a genuine holographic display representing light fields is the future's immersive display. Hence, computer-generated holography will be at the centre of focus in this talk.
Bio: Kaan Akşit is an Associate Professor at University College London. Kaan received his PhD degree in electrical engineering at Koç University, Turkey, in 2014, his M.Sc. degree in electrical power engineering from RWTH Aachen University, Germany, in 2010, and his B.S. degree in electrical engineering from Istanbul Technical University, Turkey, in 2007. Kaan researches the intersection of light and computation, including computational approaches in imaging, fabrication and displays. Kaan’s research works are most known among the optics and graphic community for his contributions to display technologies dedicated to virtual reality, augmented reality, and three-dimensional displays with glasses and without glasses. He worked as a research intern in Philips Research, the Netherlands, and Disney Research, Switzerland, in 2009 and 2013, respectively. In addition, he was a scientist at NVIDIA, the USA, between 2014 and 2020. He is the recipient of Emerging Technologies best in show awards in SIGGRAPH 2019 and SIGGRAPH 2018, DCEXPO special prize in SIGGRAPH 2017, and among the best papers in IEEE VR 2021, IEEE VR 2019, ISMAR 2018, and IEEE VR 2017.

Christophe Peroz (Sony)
Talk: What can AR/MR display do and not do in 2021?
Bio: Christophe Peroz has recently moved from the Bay Area to Tokyo to join Sony Group R&D center to work on the development of xR display. Christophe has led R&D projects in industry (Magic Leap, aBeam Tech, Saint Gobain) and academy (Berkeley Lab, CNRS) and has been involved in the development of several products from early-stage concept to commercialization. Since 2015, he is focusing on the development of xR display to enable the next technological revolution. Christophe is co-author of 100+ publications and patents and received his PhD in Applied Physics from Grenoble Alpes University, France. He serves on several scientific committees and is co-chair of SPIE AR/VR/MR conference.
Hiroshi Mukawa (Sony)
Talk: What can AR/MR display do and not do in 2021?
Bio: Hiroshi Mukawa is currently responsible for AR/MR HMD technology development at Sony Group R&D center as a corporate distinguished engineer and heading the AR display module business at Sony Semiconductor Solutions Corporation as a general manager. In 2004, he started his research on optical see-through HMDs and has been leading all the AR eyewear product development and commercialization including the world’s first waveguide-based optical see-through closed caption glasses in 2012. He has over 150 patent families related to optics and mechanics in the fields of an AR/MR HMD and optical disc storage. He serves as an executive committee member of the SPIE AR/VR/MR conference. He received M.S. degrees in electrical engineering and physical engineering from Stanford University and Kyoto University respectively.

Panel Discussion #1
Rachel McDonnell (Trinity College Dublin),
Matthias Grundmann (Google)
Kaan Akşit (University College London)
Christophe Peroz (Sony)
Hiroshi Mukawa (Sony)

Lunch Break

Spotlight Videos
Playback of Spotlight Videos.
Moderator: Harald Haraldsson (Cornell Tech)
Submit questions to the authors via Zoom Q&A.
See the videos on YouTube here.

Virtual Poster Session (Discord)

Sujoy Ganguly (Unity)
Talk: Customizable Computer Vision Expands Data Access Without Compromising Privacy
Abstract: In recent years, computer vision has made huge strides, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis diversity. Additionally, privacy concerns may limit the ability to collect more data. These problems are particularly acute in human-centric computer vision for AR/VR applications. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, the creation of synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. To promote research into the use of synthetic data, we release a set of data generators for computer vision. We found that pre-training a network using synthetic data and fine-tuning on real-world target data results in models that outperform models trained with the real data alone. Furthermore, we find remarkable gains when limited real-world data is available. Join us to learn how these freely available data generators should enable a wide range of research into the emerging field of simulation to real transfer learning for computer vision.
Bio: Sujoy Ganguly is the Head of Applied Machine Learning Research at Unity Technologies. Sujoy earned his Ph.D. in Applied Mathematics and Theoretical Physics from the University of Cambridge, understanding collective dynamics and transport phenomena in biological systems. After his Ph.D. Sujoy was a postdoctoral fellow at Yale University working in computational neuroscience. He has many years of industrial experience bringing the next generation of AI-driven technologies to market while publishing papers at major AI conferences. At Unity technologies, he is leading efforts in using simulated data to train AI that performs real-world tasks.

Short Break

Kristen Grauman (UT Austin)
Talk: First-Person Video for Interaction Learning
Bio: Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Scientist in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on video, visual recognition, and action for perception or embodied AI. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, a Microsoft Research New Faculty Fellow, and a recipient of NSF CAREER and ONR Young Investigator awards, the PAMI Young Researcher Award in 2013, the 2013 Computers and Thought Award from the International Joint Conference on Artificial Intelligence (IJCAI), the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013. She was inducted into the UT Academy of Distinguished Teachers in 2017. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award). She currently serves as an Associate Editor-in-Chief for the Transactions on Pattern Analysis and Machine Intelligence (PAMI) and as an Editorial Board member for the International Journal of Computer Vision (IJCV). She previously served as a Program Chair of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015 and a Program Chair of Neural Information Processing Systems (NeurIPS) 2018 and will serve as a Program Chair of the IEEE International Conference on Computer Vision (ICCV) 2023.

Evan Nisselson (LDV Capital)
Talk: Investment Trends & Opportunities from Content Tools, to Digital Beings and the Metaverse?
Abstract: We are witnessing a massive shift in content creation. By 2027, visual tech tools will automate the technical skills required today for content creation and monetization. They will power the rise of the metaverse.
We look forward to hearing your insights, learning about your startups, and reading your research papers on how businesses are addressing these challenges and opportunities. https://www.ldv.co/
Bio: Evan is General Partner at LDV Capital which invests in people building businesses powered by visual technologies from computer vision, machine learning and artificial intelligence that analyze visual data. We invest at the earliest stages of a company typically with a prototype or some initial customer validation. Some example visual technology verticals: Photonics, Autonomous Vehicles, Mapping, Robotics, Food/Agriculture, Augmented Reality, Logistics, Manufacturing, Search, Security, Entertainment, Healthcare and much more. The unique LDV Capital platform includes an annual LDV Vision Summit, LDV Community, annual LDV Insights reports and extensive expert network.
Evan is a serial entrepreneur, professional photographer and digital media expert since the early 1990's. His international expertise ranges from building four visual technology businesses to assisting technology startups in raising capital, business development, marketing and product development. He is a frequent speaker, moderator and master of ceremonies at technology conferences.

Panel Discussion #2
Sujoy Ganguly (Unity)
Kristen Grauman (UT Austin)
Evan Nisselson (LDV Capital)

Final Remarks















