What does it mean when a software company obsessively focused on innovating the way we use our mobile devices to see and communicate with the world adds virtual voice agents? Possibly e-commerce magic, with a powerful layer of augmented reality.
That's what may be in the offing with the recent acquisition of Israel-based Voca.ai by Snap, the company behind Snapchat.
The deal was first reported by Israeli tech news site CalCalist, which claims the price was $70 million, however, neither party involved in the deal has confirmed the figure.
By now, most of us are familiar with virtual voice agents that pop up when we're making a customer service call to a bank or retail store. They can be frustrating, especially when you're upset about a transaction, or in a rush. What Voca's approach to virtual voice agents promises is what it calls a "speech-to-intent" algorithm that can more efficiently detect what a human caller is attempting to accomplish.
This is all delivered (based on samples we've listened to) via a very natural sounding virtual voice agent. The result is so impressive that credit card giant American Express invested in the startup via its Amex Ventures fund late last year. Among the use cases listed on Voca's website are banking, insurance, legal services, and telecommunications.
So what could this all dry, business-centric customer service have to do with the messaging and AR fun happening over at Snap? What's really behind the acquisition? Snap isn't saying much publicly, but the more intimately familiar you are with Snapchat, the more the potential synergy of adding Voca Agents to Snap becomes obvious.
Beyond the fun Snap Lens modules that allow you to use augmented reality to transform your face using Snapchat, the app also lets you point your smartphone's camera at nearly any product to reveal its price and an e-commerce link, monitor a song to detect its artist and title information, scan plants and animals to detect their designations, and even solve complex math problems. Quietly, Snap has built one of the most powerful world understanding and all-in-one e-commerce tools ever, even though many still incorrectly continue to view Snapchat as limited to messaging. It's clear that Snap is building a mobile world interface, largely powered by machine learning and augmented reality.
Now, what if you took all of that machine learning and e-commerce interactivity and added a clever virtual voice agent to the mix? You might suddenly find that the Snapchat app feels "alive" as it guides you through its features, whether the task is shopping for the right product, or helping you with your homework or research. More ambitiously, in the future, it's not difficult to imagine Voca's voice agents being paired with 3D-modeled AR virtual agents similar to what I experienced last year (limited to 2D on the desktop) with Soul Machine's convincing customer service avatars.
The Soul Machines virtual assistant avatar.Image by Adario Strange/Next Reality
And maybe, if we're lucky, the feature could one day be offered as a part of the Lens Studio software suite, possibly allowing anyone to create an AR experience with added virtual voice assistant interactivity. Now that you can add music to your Snapchat messages, it only makes sense to eventually add a virtual voice presence for more interactive shares. That's the dream, but for now, if Snap can just add this powerful virtual agent to its e-commerce camera-enabled offering, it could be a game-changer.
Oh, and one more thing…what if Snap adds the voice agent technology to Spectacles? We've already seen Amazon's Echo Frames, which don't offer AR functionality but provide a baby-step toward full-fledged smartglasses by adding the virtual assistant powers of Alexa to your face, enabled with a tap on the side of the glasses. Nevertheless, most agree that Echo Frames aren't quite premium in terms of build and design — something Snap excels at with the latest models of Spectacles. Putting virtual audio assistance into a pair of Snap Spectacles (think the movie Her) while not sacrificing any style points is something a number of us would sign up for yesterday.
Additionally, imagine if some of the machine learning-powered world understanding inherent in the Snapchat app could one day be embedded in the Spectacles camera, allowing you to glance at a doggo, tap the side of your Spectacles, and then have the embedded Voca virtual agent whisper the name of the mysterious pooch into your ear. Yes, we're getting a little ahead of ourselves in the "imagine if" department, but Snap moves pretty fast, so we shouldn't be surprised at how rapidly Voca's algorithm gets deployed via the company's flagship products.