© GETTY IMAGES
Last week, Radiolab—the ever-enlightening podcast and public radio show from WNYC—rebroadcast an episode from August last year called "Breaking News." The title is a double entendre: "breaking news," in this case, refers as much to destroying the news as it does breaking news stories, and the technology that is enabling this destruction is being developed right here in Seattle.
In 2016, Adobe (creator of blockbuster photo editing software Photoshop and the audio editing software Audition) developed a software program that enables users to edit audio using text. At the time, ArsTechnica's Sebastian Anthony broke down how it works: "The tech, dubbed VoCo (voice conversion), presents the user with a text box. Initially the text box shows the spoken content of the audio clip. You can then move the words around, delete fragments, or type in entirely new words. When you type in a new word, there's a small pause while the word is constructed—then you can press play and listen to the new clip."
Adobe demonstrated the effects at a conference in 2016, hosted by Jordan Peele:
In theory, this could be used by, say, film editors who, instead of requiring an actor to come into a studio to record audio, could just take an extended audio clip (between 20 and 40 minutes), and, using this software, extrapolate the actor's voice and replace the words that are actually coming out of their mouth with whatever the script or scene calls for. For film and commercial use, this software could save huge amounts of money: Instead of paying an actor to come and in do voiceovers, you just feed their vocals into the software, type out your script, and voilà.
But for everyone else, this tech could be a serious nightmare.
The whole Radiolab episode is worth a listen (it's embedded below), but imagine, if you will, a future in which fake news isn't just emerging from text: It's coming audio and video as well. That technology isn't quite there yet, but it is coming. You can see how it works here, on a website Radiolab developed to showcase what's possible. They used existing video of Barack Obama, but replaced the words—in Obama's own voice—with text they'd written themselves.
As you can see from the video, this tech isn't yet perfect. But it is getting better, and it's already improved since Radiolab's Simon Adler first reported this story last year.
"As we predicted," Adler told me, "the technology is improving at just a drastic rate and it's already becoming far more ubiquitous. There's an app called FakeApp, which is essentially a face-swapping app that will take the face of one person and put it onto someone else. It's become particularly popular in two arenas: One is pornography, so, face-swapping celebrities onto performers' bodies. The other place, and I don't know why this meme is going around, is Nicholas Cage. People are having a fun time swapping his face onto characters and movies that he didn't appear in."
And it works. If you've ever wondered what Nicholas Cage looks like as Spock (or The Rock), here's your chance:
As Radiolab notes in the story, the people developing this tech are frequently myopic about its potential repercussions. Adler interviewed Ira Kemelmacher-Shlizerman—a Facebook researcher and professor at UW's School of Computer Science who works on this sort of tech—about the potential downsides. She said, essentially, that it's her job to build the tech, and other peoples' job to consider the implications.
"I think that if people know this technology exists, they will become more skeptical," she said. "I don't know. But if people know fake news exists, if they know fake text exists, fake videos exist, fake photos exist, then everyone is more skeptical in what they read and see."
But we already know that people aren't skeptical of what they read: Just this week, fake news stories that spread online included a principal at a West Virginia school implementing a Halal menu in the school cafeteria, the mysterious death of a CDC doctor who warned that flu shoots were causing a pandemic, and Buzz Aldrin, who apparently revealed the existence of aliens. None were true, but that hardly matters at this point in time. People believe what they read; they will certainly believe audio they hear and video they see. So what's going to happen when someone pieces together a video of Donald Trump declaring war on North Korea and it actually looks and sounds just like him?
Ira Kemelmacher-Shlizerman did not immediately respond to a request for comment (I'll update this post if she does), but her attitude reminds me of nearly everyone I've interviewed in tech. When, for example, I talked to virtual reality developers about the potential downsides of VR, more than one reminded me that people were wary of television when it first came out, too.
And then, when I pointed out that television has, indeed, lead to great harm (do you think Donald Trump would be in the White House if not for reality TV?), they all shrugged it off. As Kemelmacher-Shlizerman told Adler when he asked if she is afraid of the technology that she herself is developing, she said, after a long pause. "I'm a technologist. I'm a computer scientist, so, not really... I'm not worried too much."
Perhaps the rest of us should be. For his part, Adler is wary, but optimistic. He told me that he thinks this technology could easily be weaponized. But, he added, he's also hopeful we'll adapt. "I think we're going to figure this stuff out," he said. "I think there will be a painful period that we as media consumers are going to go through, but I think we will make it out the other end okay. But it could be rough for a while."