Over the last decade, online multiplayer games have turned into more personal experiences. Call of Duty: Black Ops 4, Fortnite, Anthem and more are offering new ways for players to enhance their characters. From outfits to weapon skins and even emotes, players can adorn their avatars in a way that reflects their taste and personal style.
Missing from these games, however, is the ability to change your voice. Modulate, a computer software company co-founded by Mike Pappas and Carter Huffman, aims to address this with a technology called “voice skins” which allows you to change your voice on the fly.
Using deep neural networks and machine learning, Modulate allows you to customize your voice. You can choose to sound like the opposite gender, a celebrity or even create your own custom voice. Your emotion and cadence will remain the same, with Modulate giving you full control over how your vocal cords will be used.
According to CTO Carter Huffman, he became interested in the potential for voice skin technology around 2015 after trying photo editing apps such as Prisma. These have the potential to drastically edit existing photos and make them look like other famous works of art.
Huffman realized that there was potential for this kind of technology to find a home in audio. It took him about a year to get results during his experimentation phase, eventually finding that adversarial technology made the process easier.
“This is something that people have wanted for 100 years,” CEO Pappas said. “It has shown up in sci-fi, in games, and in stories all over the place – as something that, obviously, we should be developing.”
Modulate works by having one neural network listen to a user’s voice and then try to produce something, which is then examined by a second adversarial neural network. That network then determines whether or not the voice produced is doing what it aimed to do. For example, in order to make the voice skin for Barack Obama sound like him, the adversarial network was given clips of his speeches so it could better understand his voice.
The process is iterative, with the adversarial network identifying specific parts of the voice skin’s audio that don’t sound correct. If a voice is the wrong pitch, for instance, this will be corrected, and the voice skin network will not make this mistake on its next try.
“Eventually, it outputs speech that the adversary cannot tell the difference between the voice skin’s output and real Barack Obama. And if the adversary is really good, then we also cannot tell the difference,” Huffman added.
The goal, however, is not for you to impersonate another person. Modulate uses a digital watermark that computer programs can detect that will alert them of someone making use of a voice skin. The plan is for the technology to be directly implemented into other programs rather than used on its own. This should make voice fraud during phone calls impossible, and you will not be able to impersonate well-known voice actors in order to make a reel for your own work.
If Modulate was used in a large game like Fortnite, it would likely be built natively into the application. Pappas also clarified that certain companies develop the voice chat systems for multiple games, and Modulate could work with them to implement it across several supported games as well. The technology would allow for the games to point out which users are using Modulate but the company will ultimately leave it up to the game’s developers to determine whether or not they’ll make use of it.
Pappas and Huffman want Modulate to be used for players to better express themselves in their favorite games.
Pappas and Huffman want Modulate to be used for players to better express themselves in their favorite games. If the skin you happen to be wearing is of something menacing or makes your avatar look intimidating, Modulate could more easily portray this. Likewise, for those self-conscious about their own voices, the technology would allow them to communicate with others more comfortably.
In the immediate future, Modulate plans on continuing its pilot program which is integrating and testing the technology into existing chat platforms and games. As the company continues to grow, it aims to add additional features such as changes to your accent. Pappas believes it could have applications outside of just video game chat. Since the technology is affecting tonality rather than the words themselves, it would be easily applicable across multiple languages, as well.
“We’re starting in the gaming space, but we really see this as a fundamentally required technology in order for you to use voice chat, and everyone’s going to use voice chat,” he said.
Huffman noted that with virtual reality technology becoming more lifelike, voice skins could make the experience even more immersive. As of now, your options are limited.
The possibilities are nearly endless, and we’ll likely see the first fruits of the team’s labor later this year.
“It’s the Ready Player One dream, right?” Huffman said. “You’re inhabiting this character, and then you speak, and it’s just your voice, or maybe a Darth Vader voice. But you can’t convincingly be the rest of that character that you want to be.”
Modulate would certainly give game developers more options for in-game goodies. Alongside the latest costume, games could offer voice skins as rewards for high-level play. The possibilities are nearly endless, and we’ll likely see the first fruits of the team’s labor later this year. The plan is for Modulate to be integrated into existing games by the end of 2019, and possibly within the next six months.
If you’d like to hear Modulate’s technology in real-time, you can try a demonstration on the company’s website. Multiple sliders let you make fine adjustments to your recording, and the results are both impressive and hilarious. As more users try out Modulate and it learns more about sounds (such as laughter) the neural networks will go on to create a more polished and believable product.