The cat is out of the bag and despite many years of warning before this and similar technology became widely available, nobody was really prepared for it - and everyone is solely acting in their own best interests (or what they think their best interests to be). I think the biggest failure is that despite there being warnings signs long before, every single country failed to enact legislation that could actually meaningfully protect people, their identity and their work(s) while still leaving enough room for research and the beneficial use of generative AI (or at least finding beneficial use cases).
In a way, this is the flip side of the coin of providing such easy access to cutting edge tech like machine learning to everyone. I don’t want technology itself to become the target of censorship, but where it’s being used in a way that harms people, like the examples used in the article and many more, there should be mechanisms, legal and otherwise, for victims to effectively fight back.
This problem cannot be solved by tools, because you can use these tools to make AI-generated content more realistic (adversarial training).
Welp…we’re boned I guess
The only way to limit the damage is the tedious old-fashioned way: An honest debate, thorough public education, followed by laws and regulations, which are backed up by international treaties. This takes a long time however, the tech is evolving very quickly, too quickly, self-regulation isn’t working and there are lots of bad actors, from pervy individuals to certain nation states (the likes of Russia, Iran and China have used generative AI to manipulate public opinion) which need to be contained.
I’d honestly go one step further and say that the problem cannot be fully solved period.
There are limited uses for voice cloning: commercial (voice acting), malicious (impersonation), accessibility (TTS readers), and entertainment (porn, non-commercial voice acting, etc.).
Out of all of these only commercial uses can really be regulated away as corporations tend to be risk averse. Accessibility use is mostly not an issue since it usually doesn’t matter whose voice is being used as long as it’s clear and understandable. Then there’s entertainment. This one is both the most visible and arguably the least likely to disappear. Long story short, convincing enough voice cloning is easy - there are cutting-edge projects for it on github, written by a single person and trained on a single PC, capable of being run locally on average hardware. People are going to keep using it just like they were using photoshop to swap faces and manual audio editing software to mimic voices in the past. We’re probably better off just accepting that this usage is here to stay.
And lastly, malicious usage - in courts, in scam calls, in defamation campaigns, etc. There’s strong incentive for malicious actors to develop and improve these technologies. We should absolutely try to find a way to limit its usage, but this will be eternal cat and mouse game. Our best bet is to minimize how much we trust voice recordings as a society and, for legal stuff, developing some kind of cryptographic signature that would confirm whether or not the recording was taken using a certified device - these are bound to be tampered with, especially in high profile cases, but should hopefully somewhat limit the damage.