Virtual reality lets you experience amazing things—from exploring new worlds, to painting with trails of stars, to defending your fleet to save the world. But, headsets can get in the way. If you're watching someone else use VR, it's hard to tell what's going on and what they’re seeing. And if you’re in VR with someone else, there aren’t easy ways to see their facial expressions without an avatar representation.
Daydream Labs and Google Research teamed up to start exploring how to solve these problems. Using a combination of machine learning, 3D computer vision, and advanced rendering techniques, we’re now able to “remove” headsets and show a person’s identity, focus and full face in mixed reality. Mixed reality is a way to convey what’s happening inside and outside a virtual place in a two dimensional format. With this new technology, we’re able to make a more complete picture of the person in VR.
Using a calibrated VR setup including a headset (like the HTC Vive), a green screen, and a video camera, combined with accurate tracking and segmentation, you can see the “real world” and the interactive virtual elements together. We used it to show you what Tilt Brush can do and took Conan O’Brien on a virtual trip to outer space from our YouTube Space in New York. Unfortunately, in mixed reality, faces are obstructed by headsets.
Artist Steve Teeple in Tilt Brush, shown in traditional mixed reality on the left and with headset removal on the right, which reveals the face and eyes for a more engaging experience.
The first step to removing the VR headset is to construct a dynamic 3D model of the person’s face, capturing facial variations as they blink or look in different directions. This model allows us to mimic where the person is looking, even though it's hidden under the headset.
Next, we use an HTC Vive, modified by SMI to include eye-tracking, to capture the person’s eye-gaze from inside the headset. From there, we create the illusion of the person’s face by aligning and blending the 3D face model with a camera’s video stream. A translucent "scuba mask" look helps avoid an "uncanny valley" effect.
Finally, we composite the person into the virtual world, which requires calibrating between the Vive tracking system and the external camera. We’re able to automate this and make it highly accurate so movement looks natural. The end result is a complete view of both the virtual world and the person in it, including their entire face and where they’re looking.
Our initial work focused on mixed reality is just one potential application of this technology. Seeing beyond VR headsets could help enhance communication and social interaction in VR. Imagine being able to VR video conference and see the expressions and nonverbal cues of the people you are talking to, or seeing your friend’s reactions as you play your favorite game together.
It’s just the beginning for this technology and we’ll share more moving forward. But, if you’re game to go deeper, we’ve described the technical details on the Google Research blog. This is an ongoing collaboration between Google Research, Daydream Labs, and the YouTube team. We’re making mixed reality capabilities available in select YouTube Spaces and are exploring how to bring this technology to select creators in the future.