Making your clips look like an episode of The Simpsons or a Van Gogh painting may seem gimmicky, but the artificial intelligence required to do this would usually need to run on massive servers. Google squeezed a neural network into its Google Translate app last year. Now, Facebook has developed a deep learning system called Caffe2Go that is condensed enough to run directly in mobile apps on iOS and Android. The style transfer technique will be the first opportunity for users to try it out.
New Scientist spoke to Facebook’s chief technology officer, Mike Schroepfer, about the company’s work in AI and how it will affect the way we communicate, from the existing Facebook newsfeed to the future of virtual reality and increasing global connectivity.
How do you make a neural network that’s efficient enough to run on a mobile device?
If you think of this neural net as a sequence of steps, where you’re processing information at each step and feeding it to the next one, then one of the goals from the algorithmic standpoint is to reduce that to the smallest number of steps yet get the same results. So, basically, the algorithmic challenge is building smaller models that produce very similar results.
And then the second part is lots of optimisation specific to working on mobile devices. Even if you have one of these small neural net models, if you take it and naively implement it on a mobile phone, it just won’t work. So we had a really interesting pairing of the scientists, who were trying to figure out how to do model compression, combined with people who are really good at chip-level optimisation, who were trying lots of different techniques to optimise each of the parts to make it run very quickly on the phone.
Changing videos to make them more artistic is fun, but what else could we use it for?
One of the reasons we focused on this, although it seems like just a fun, slightly silly application, is that when you’re creating something, the delay could turn something that would otherwise be fun into something arduous. That time delay is the difference between fun, creative spontaneity and not doing it, basically.
But there are other things. We have demos running where you can combine this application with object detection, so if you want to apply different effects to the foreground and background of the video, you could do that.
What else is Facebook training neural net technology to do?
It’s doing all sorts of different things. We’re using it for translations. We’re using it to automatically generate captions for the billions of images uploaded every day, so if you have a visual disability and want to have a photo effectively read to you, you can have that happen. We’re using it to help improve newsfeed ranking: of the thousands of possible stories you can see, you’re going to read only 10 or 20 or 30, and we’re going to show you the best possible ones. We use it for spam detection, so if people are trying to share things on Facebook that don’t belong, we can detect it and eliminate it.
You’ve previously talked about the role of virtual reality in future social interactions. How is Facebook’s AI going to help?
AI is a key technology to make VR work. Figuring out where your head and hands are in the real world and mapping them into the VR world is a computer vision and VR problem. Without that, the system just doesn’t work. You couldn’t easily have done this 10 or 20 years ago the way you can today.
Think about the further problem of how we bring realistic avatars into the VR world. If someone’s laughing while I’m in VR with them, we can detect that and make sure the avatar looks like it’s laughing. And as the person is speaking, we’re actually analysing the phonemes and animating the mouth of their avatar so it looks realistic, like the individual is speaking rather than just having the avatar sitting there not moving its mouth. You’re not going to feel a sense of presence with that person if their avatar is just stony-faced all the time.
In the long run, think about all these systems out there that are building intelligent agents, whether they are messenger bots or things you can speak to in the home. VR will be a natural environment for that too because you could have something that could help you navigate the mass of the virtual world. You could say, “Hey, take me to Mars,” or “Take me to see my friend Joe,” and the virtual agent could help you navigate rather than clicking menus or moving buttons around. It would be a natural place for a virtual assistant, but that’s probably in the more distant future.
What would it take to develop that?
I think speech recognition is a generally well-solved problem in artificial intelligence and is working really well, but a harder challenge in AI that people are also making progress on is natural language understanding: disambiguating what people are saying. When I say, “Take me to Mars,” what does that mean? Is this a specific game? Is it a trailer for The Martian? What am I referring to? That is a challenging problem in AI.
When these systems work and they give you exactly what you want, it’s awesome and magical. But when they give you the wrong answer, it’s really frustrating. So you want to build systems that work more often than not, otherwise people won’t use them. That’s one of the problems with AI: building systems that understand language in the way humans do.
What’s your vision for when we’ve all got neural nets in our pockets?
The one resource that people can’t get back is time. The days pass and the time goes, and you can’t get it back. I think where AI can really help us is by focusing our time on the things we care about. I could spend the time learning three more languages so I can communicate with family members, or if I have a system that can automatically translate, I can spend that time with those family members instead, or I can spend that time creating music or pursuing hobbies or doing work, whatever it may be.
That’s my hope: that people waste no time on things that are unimportant because we have systems watching out for us and making sure we’re focused on the things we most care about.