New Tools Allow Voice Patterns To Be Cloned To Produce Realistic But Fake Sounds Of Anyone Saying Anything
from the shopped-images-are-so-yesterday dept
Fake images, often produced using sophisticated software like Photoshop or the GIMP, were around long before so-called "fake news" became an issue. They are part and parcel of the Internet's fast-moving creative culture, and a trap for anyone that passes on striking images without checking their provenance or plausibility. Until now, this kind of artful manipulation has been limited to the visual sphere. But a new generation of tools will soon allow entire voice patterns to be cloned from relatively small samples with increasing fidelity such that it can be hard to spot they are fake. For example, in November last year, the Verge wrote about Adobe's Project VoCo:
"When recording voiceovers, dialog, and narration, people would often like to change or insert a word or a few words due to either a mistake they made or simply because they would like to change part of the narrative," reads an official Adobe statement. "We have developed a technology called Project VoCo in which you can simply type in the word or words that you would like to change or insert into the voiceover. The algorithm does the rest and makes it sound like the original speaker said those words."
Since then, things have moved on apace. Last week, the Economist wrote about the French company CandyVoice:
Utter 160 or so French or English phrases into a phone app developed by CandyVoice, a new Parisian company, and the app's software will reassemble tiny slices of those sounds to enunciate, in a plausible simulacrum of your own dulcet tones, whatever typed words it is subsequently fed. In effect, the app has cloned your voice.
The Montreal company Lyrebird has a page full of fascinating demos of its own voice cloning technology, which requires even less in the way of samples:
Lyrebird will offer an API to copy the voice of anyone. It will need as little as one minute of audio recording of a speaker to compute a unique key defining her/his voice. This key will then allow to generate anything from its corresponding voice. The API will be robust enough to learn from noisy recordings. The following sample illustrates this feature, the samples are not cherry-picked.
Please note that those are artificial voices and they do not convey the opinions of Donald Trump, Barack Obama and Hillary Clinton.
As Techdirt readers will have spotted, this technical development raises big ethical questions, articulated here by Lyrebird:
Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries. Our technology questions the validity of such evidence as it allows to easily manipulate audio recordings. This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else.
The Economist quantifies the problem. According to its article, voice-biometrics software similar to the kind deployed by many banks to block unauthorized access to accounts was fooled 80% of the time in tests using the new technology. Humans didn't do much better, only spotting that a voice had been cloned 50% of the time. And remember, these figures are for today's technologies. As algorithms improve, and Moore's Law kicks in, it's not unreasonable to think that it will become almost impossible to tell by ear whether the voice you hear is the real thing, or a version generated using the latest cloning technology.
Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: cloning voice, voice, voice patterns
Reader Comments
Subscribe: RSS
View by: Time | Thread
In that episode he had to ingest a robot that had copied Hardeen's voice so he could sound like him.
Makes me wonder if Star Wars tech isn't as science fantasy as people thought.
[ link to this | view in chronology ]
Re:
Reminds me of the StarTrek TOS episode A Taste of Armageddon.
While Kirk is held on the planet below, the Enterprise receives a message in Kirk's voice. They don't believe it with good reason. Their first theory, which they exclude for technical reasons is that it was not done by a "voice synthesizer".
[ link to this | view in chronology ]
https://www.youtube.com/watch?v=ohmajJTcpNk
[ link to this | view in chronology ]
Response to: Anonymous Coward on May 2nd, 2017 @ 6:40pm
[ link to this | view in chronology ]
Re: Response to: Anonymous Coward on May 2nd, 2017 @ 6:40pm
[ link to this | view in chronology ]
Re: Response to: Anonymous Coward on May 2nd, 2017 @ 6:40pm
No, I did not have sex with that woman / man / goat / etc.
He was resisting arrest. I did not use excessive force. That is not him screaming. I did not beat him mercilessly.
[ link to this | view in chronology ]
And really Josh, you shouldn't call your mother that! We have you on her voicemail.
[ link to this | view in chronology ]
The GIMP?
[ link to this | view in chronology ]
Re: The GIMP?
[ link to this | view in chronology ]
Re: The GIMP?
First sentence on gimp.org:
(Yes, the rest of the page refers to it with no definite article. But there is one right there upfront at the top of the page where the term is defined.)
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
I, on the other hand, welcome this new technology!
"Honey, I never said THAT! You aren't remembering our conversation correctly. Here, let me play the recording I made of it."
[ link to this | view in chronology ]
Re:
Just TRY to get a court to accept your claim that you never said anything of the sort.
[ link to this | view in chronology ]
Re: Re:
Just a few of the linked articles entered into evidence and maybe an expert witness with an audio "recording" of the prosecutor admitting to having had sexual relations with a goat (or something equally implausible) should convince a jury that the tape is unreliable evidence on its own (in addition to giving them a good laugh).
[ link to this | view in chronology ]
As Techdirt readers will have spotted, this technical development raises big ethical questions
There are probably some ethical questions to ask here, but that statement doesn't talk about any of them. It, at most, introduces a question of the efficacy of police/criminal justice best practices. That's not ethical, it's procedural. The only vaguely ethical conundrum it alludes to is whether or not we should execute anyone who displays talent in any scientific or engineering field to prevent technology driven change.
[ link to this | view in chronology ]
Maybe someday...
The very best TTS these days is very very good, but it still requires carefully collecting phonemes and a lot of work to make it sound realistic. Even then, its generally distinguishable from a real voice within a sentence or two.
Color me skeptical, but I think the nightmare scenario of creating forensically-realistic fake audio from just a few minutes of voice sample is a long way away. The old-fashioned way of splicing together words and phrases is still better.
[ link to this | view in chronology ]
Re: Maybe someday...
1. So what if it's a long way away, should the implications of the tech be ignored until someone perfects it?
2. These things do tend to have a tendency to improve exponentially, so it could be a lot sooner than you think.
"Even then, its generally distinguishable from a real voice within a sentence or two."
3. Given the tendency for political debate to be driven by soundbites and for people to jump to conclusions based on a couple of seconds of video, that might be all that's needed.
[ link to this | view in chronology ]
Re: Maybe someday...
The reason that you can -- currently -- readily detect that the output isn't real is that you're a human being who's evolved an extraordinary auditory sense over millenia. Of all our senses, it's arguably the most highly developed -- which is why, for example, we can detect a musical note that's only a tiny fraction off or recognize each other with a sample size of one word. In other words, our ability to detect ersatz speech is much better than our ability to detect ersatz pictures.
But this technology, or one like it, will eventually confound that too. Whether it takes a year or twenty, it's coming. So just as "pictures don't lie" is now obsolete, we'll have to change our standards for evidence to cope.
[ link to this | view in chronology ]
Re: Re: Maybe someday...
Conversely, if you want to protect a recording from alteration, play some songs at low level in the background, as that will make changing the recording hugely more difficult, in both separating your words from the background, and in syncing up the replacement background..
[ link to this | view in chronology ]
Re: Maybe someday...
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
I was convinced it was her.
"and I cannot lie"
Clearly not Hillary.
[ link to this | view in chronology ]
In The Future ...
... everyone will sound like Doctor Bot from Space Station 76.
[ link to this | view in chronology ]
This could make future voice recordings invalid as evidence
[ link to this | view in chronology ]
Re: This could make future voice recordings invalid as evidence
[ link to this | view in chronology ]
Re: Re: This could make future voice recordings invalid as evidence
[ link to this | view in chronology ]
Re: This could make future voice recordings invalid as evidence
[ link to this | view in chronology ]
Re: Re: This could make future voice recordings invalid as evidence
Nah, not China, not anymore. Maybe North Korea. In China, and even countries like Iran, despite the government's best efforts the public can still get access to the open internet.
That's where Orwell was wrong: he lived in an era where the government could control the public's access to mass communications media, and he assumed that would still be the case in the future. It's not, except in nations with crippling poverty like NK.
China still does just fine with its disinformation campaigns, of course. And anyone, even Alex Jones, can make outlandish claims and convince some people that they're true. But China doesn't have the propaganda stranglehold on its public that it used to, and the way I see it, improvements to technology will benefit the public's ability to see through bullshit more than the governments' ability to create it.
(Whether or not people actually see through the bullshit is, to my mind, a separate issue. There are plenty of people who will believe what they want to believe regardless of evidence; more realistic fakes will color that issue but I don't think they'll fundamentally change it.)
[ link to this | view in chronology ]
Let's hear Trump say...
[ link to this | view in chronology ]
Re: Let's hear Trump say...
[ link to this | view in chronology ]
Open source version
[ link to this | view in chronology ]
I am the system administrator.
[ link to this | view in chronology ]
Re: I am the system administrator.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
In fact, how would news reporting work, given that nobody trusts first hand accounts any more even when accurate audio & video evidence is gathered.
[ link to this | view in chronology ]
Re: Re:
The same way it works today, no?
[ link to this | view in chronology ]
Questions came to light after the defendant submitted hospital records showing he was in an ICU in a coma when these alleged recording were made.
[ link to this | view in chronology ]
But I can't help think this would work great in games:
Being able to generate dynamic NPC dialog without having to record hundreds of hours.
Calling the player by their actual chosen full name.
Imagine something like DA:O only fully voiced this time. Yes, even your character's lines.
[ link to this | view in chronology ]
Re:
And the modding scene would take off. New quests would only need text typed into a database if the modder is happy with the existing library of in-game voices.
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]
Re: Re: Re:
Western gamers don't generally buy games based on voice actor casting. The studio's name matters more than who's voicing.
For example Lara Croft was voiced by at least 5 voice actors. And Cole MacGrath was voiced by 2 actors.
Anyway the VA market is a lot bigger in East Asia (S. Korea, Japan and maybe China) than it is here.
[ link to this | view in chronology ]
Dawn of a new age
Obligatory Edward Tufte Tweet demonstrating that the more things change the more they stay the same: Edward Tufte on photoshop
[ link to this | view in chronology ]
Voice Rights
[ link to this | view in chronology ]
"New tools allow voice patterns to be cloned to produce 'totally legitimate confessions of crimes say Police'"
[ link to this | view in chronology ]
Actually kind of excited for this technology
[ link to this | view in chronology ]
Ahhh....Classic leverage
[ link to this | view in chronology ]
scary possibilities
We need to teach everyone to be skeptical, inquisitive, and knowledgeable of the many ways people can be manipulated.
[ link to this | view in chronology ]
Real people don't talk like that.
If Lyrebird want authenticity they need to try harder to get rid of those qualities.
[ link to this | view in chronology ]
could this be the savior of Reality TV?
The next logical innovation for "reality" shows may well be the ability to "photoshop" these synthesized words into people's mouths so the camera won't be forced to cut away whenever they "speak" spliced words.
... but on the other hand, wouldn't it be so much easier to just give these "reality" actors an actual script instead of creating dialog in the editing room?
[ link to this | view in chronology ]
Fake voice, ID Theft and Wall Street/401K
Remember, the scum on Wall Street use your recorded voice (on the phone) and the personal identity information (routinely exposed by Wall Street's own butt kissing firms) to "establish" your identity. You/We are truly screwed.
The next time Al-Qaeda or ISIS or the Mob or the Drug Lords attack Wall Street (and Washington), I think I'm not likely to care too much. Wall Street/Washington is looting us so viciously, that their enemies attacking them is very low on my list of concerns!
[ link to this | view in chronology ]
But then again one of these companies is Adobe and they pretty much never turn down any idea they think will make money, regardless of ethics or fairness.
[ link to this | view in chronology ]
We're not supposed to say "yes" to robo-callers...
This could really pump up FedEx stock, since oral contracts just became completely unreliable.
[ link to this | view in chronology ]
The problem isn't so much matching the pitch and such of a specific voice, it's writing the software to properly pronounce words. People have been putting up videos on YouTube with artificial voices for years. Many of them are very good and sound almost perfect, but then they mispronounce a word and you realize that it's a machine.
[ link to this | view in chronology ]
Does this mean...
I can actually get my voice-activated digital assistant to sound like GlaDOS?
[ link to this | view in chronology ]
You already can.
[ link to this | view in chronology ]
Benefits for communications
Hats off to science fiction author David Drake and his Hammer’s Slammers series, where hovercraft tank commanders use this approach to hold voice conversations via radio waves bounced off of the ionized trails left by the small meteors that constantly burn up in the atmosphere; a very robust but low data rate communications channel.
[ link to this | view in chronology ]
Or Samuel L Jackson, or or or
Are you not entertained?
[ link to this | view in chronology ]