While Facebook and Mark Zuckerberg has been receiving a lot of flack for their deception in data collection, we know that their technology is truly impressive. Yaser Sheikh, the Director of Research at Facebook Reality Labs recently published his work in regards to creating lifelike avatars using 3D technology called Codec Avatars.
The reason behind creating this new technology for Facebook was to overcome the challenges of physical distance between people, and between people and opportunity. “Using 3D capture technology and AI systems, Codec Avatars could let people in the future create lifelike virtual avatars of themselves quickly and easily, helping with social connections in virtual reality become as natural and common as those in the real world,” according to Facebook.
“Most of us, myself included, don’t live in the places where we grew up,” Shiekh stated. “I’ve spent my life moving from city to city, and each time, I’ve left relationships that are important to me.”
The technology itself requires the use of 180 HD cameras that captures the movements on your face. The FRL team has made significant progress in the two years since Schroepfer debuted their work on lifelike avatars. “We’ve completed two capture facilities, one for the face and one for the body,” says Sheikh. “Each one is designed to reconstruct body structure and to measure body motion at an unprecedented level of detail. Reaching these milestones has enabled the team to take captured data and build an automated pipeline to create photorealistic avatars.” With recent breakthroughs in machine learning, these ultra-realistic avatars can be animated in real time.
Facebook describes Codec Avatars to measure human expression through two primary functions: an encoder and a decoder. The encoder uses a system of cameras and microphones on the headset to capture what the subject is doing and where he or she is doing it. Once captured, the encoder takes the information and assembles a unique code, a numeric representation of the state of a person’s body and environment that is ready to send wherever it needs to go. The decoder then translates this code into audio and visual signals the recipient sees as a picture-perfect representation of the sender’s likeness and expression.
Codec Avatars measure human expression through two primary functions: an encoder and a decoder. First, the encoder uses a system of cameras and microphones on the headset to capture what the subject is doing and where he or she is doing it. Once captured, the encoder takes the information and assembles a unique code, a numeric representation of the state of a person’s body and environment that is ready to send wherever it needs to go. The decoder then translates this code into audio and visual signals the recipient sees as a picture-perfect representation of the sender’s likeness and expression.
“Codec Avatars need to capture your three-dimensional profile, including all the subtleties of how you move and the unique qualities that make you instantly recognizable to friends and family. And, for billions of people to use Codec Avatars every day, making them has to be easy and without fuss. FRL approached the challenge by creating a pair of world-class capture studios — one for capturing faces and another for capturing full bodies. There are hundreds of high-resolution cameras across both studios, with each camera capturing data at a rate of 1 GB per second.”
An average 10-megapixel smartphone camera uses millions of light sensors to produce vivid pictures. Using captured data and fancy software, a smartphone can automatically adjust ambient light, field of view, and other factors to provide a great photo. Building Codec Avatars is also a combination of physical data and sophisticated software, but there’s a lot more involved than what’s in an Instagram post.
“To put this into perspective, a laptop with 512 GB disk space will survive for three seconds of recording before running out of space,” says Yu. “And our captures last around 15 minutes. The large number of cameras really pushes the limits of our capture hardware, but pushing these limits lets us collect the best possible data to create one of the most photorealistic avatars in existence.” One of the Facebook studios has 1,700 microphones, enabling the reconstruction of sound fields in 3D for truly immersive audio — an essential component of immersive environments.
“Codec Avatars isn’t the only approach to realistic avatars that FRL is pursuing. A different team at FRL Sausalito is exploring physics-based avatars that can interact with any virtual environment. This work combines fundamental research in areas like biomechanics, neuroscience, motion analysis, and physically driven simulations. This technique still relies on live data capture, just like Codec Avatars, but instead of the live sensor data driving a neural network, it drives a physics-based model inspired by human anatomy (more to come on that approach later this year).”