Opinion

Microsoft's new voice mimicking AI VALL-E presents both opportunities and risks

Are the capabilities of VALL-E to spoof a voice also an infringement of the right of publicity?

Dov Greenbaum | 09:04, 22.01.23

Microsoft recently announced that it has developed a new artificial intelligence that can simulate anyone’s voice after listening to just three seconds of audio recording. VALL-E is a neural codec language model. According to their paper, the AI tokenizes speech and employs its algorithms to use those tokens to generate waveforms that sound like the speaker, even preserving the speaker’s timbre and emotional tone.

Fortunately, Microsoft’s responsible AI corporate principles have resulted in the company withholding the AI’s code. Clearly there is the possibility of unethical uses for this technology. Potential nefarious uses range from bypassing voice biometric locks, to creating realistically sounding deep fakes, to causing general havoc and distress.

Consider a low-tech voice spoof: In the United Kingdom, a hospital caring for Kate Middleton was famously duped into thinking that the Queen and then Prince Charles had called to talk to the Duchess, by two on-air Australian radio personalities. The nurse who took the call committed suicide soon after. Notably, beyond being shunned socially and professionally, the two radio hosts never faced criminal or civil charges.

In addition to these aforementioned concerns, there might also be a problem of widespread infringement of a person’s right of publicity, a form of intellectual property.

In 2004, the Israeli Supreme Court in Aloniel v McDonald recognized a right of publicity beyond privacy laws. These rights provide some form of ownership and control of one’s own image, name and voice. Subsequently in 2016, this right was further expanded in a lawsuit against two Israeli companies, Beverly Hills Fashion and Ha-Mashbir. The companies were purportedly using the artist Salvador Dali’s name for commercial purposes. (In re Fundacio Gala Salvador Dali v. V.S Marketing). Under this ruling, the right to one’s voice and other characteristics was expanded and deemed to be a transferable right, lasting like other IP rights for years after death.

The Israeli line of cases protects one’s own voice and likeness. But what about the capabilities of VALL-E to spoof that voice. Is that also an infringement of the right of publicity?

There are two principle U.S. cases in this area: In a 1992 ruling the singer Tom Waits –known for his distinct gravelly voice described as “like how you'd sound if you drank a quart of bourbon, smoked a pack of cigarettes and swallowed a pack of razor blades.... Late at night. After not sleeping for three days” -- successfully sued the snack company Frito Lay for over $2.5 million for using a Tom Waits impersonator in a Dorito’s commercial.

In an earlier 1988 ruling, the Ninth Circuit court similarly found that a commercial that used an actor with a voice that sounded like Bette Midler’s infringed on Midler’s rights under California law. As per the ruling: “when a distinctive voice of a professional singer is widely known and is deliberately imitated in order to sell a product, the sellers have appropriated what is not theirs and have committed a tort in California."

To wit: under California law: “any person who knowingly uses another's name, voice, ... for purposes of advertising or selling ... without such person's prior consent ... shall be liable for any damages sustained by the person or persons injured as a result thereof.”

In both cases, while the court was interested in protecting consumers from deceptive practices and false advertising, the courts also found that when voice is “a sufficient indicia of a celebrity's identity, the right of publicity protects against its imitation for commercial purposes without the celebrity's consent.” According to this line of decisions, the nonconsensual use of Microsoft’s AI to mimic a person’s voice, especially a celebrity voice for commercial purposes, could be an infringement of rights of personality.

Similarly, in France, the Right of One’s Image extends to one’s voice, even unknown anonymous people and seemingly without any commercial consideration.

And yet, despite these and other jurisdictions that provide some rights over one’s voice, there is no shortage of comedians who successfully mimic famous personalities’ voices, even building their careers based on these mimicry skills, all seemingly without legal consequences.

Consider, for example, the comics on Eretz Nehederet or Saturday Night Live who clearly profit from such voice spoofing. If these shows can make their livelihood off of another’s voice, perhaps VALL-E can also legally spoof other people’s voices for fun and even for profit?

Or perhaps not. It seems that there should be a distinction between the narrow aims of a comic’s impressions and the use of my voice and yours for all other purposes. Perhaps a comparison can be drawn to copyright law’s distinction between the fair use defenses for parody and for satire.

Parody and satire are closely related forms of comedy and both can be used for important messaging. And yet, under laws such as section 19 of the 2007 Israeli Copyright Law, fair use is a much more likely qualified defense of putative copyright infringement for parodies of works than for satire employing the same copyrighted works. Similar laws have been codified from judicial rulings in Canada and the United States.

This distinction between parody and satire comes down to the fact that parody employs the protected work to comment on the work itself. Parodists are unlikely to get permission from their target, so the law needs to provide greater protection to achieve the desirable speech; the means and the ends are closely related. In contrast, satire employs the protected work to make broad commentary, not necessarily regarding the work itself. As such the law will often deem the infringement as an unnecessary and indefensible means, despite the potentially laudable end.

In drawing a comparison: when a VALL-E is employed to spoof a voice with the goal of creating speech specifically regarding that individual, for example to make an AI version of Eretz Nehederet, then this could be found to be a fair use and protected speech, at least under right of publicity laws. Why should the AI be any more liable than a human impressionist?

In contrast, using VALL-E for the non-consensual use of a person’s voice for a purpose unrelated to the voice itself, for example, where any other voice would have been equally useful for the goals of that speech, then that use could be deemed an infringement of the right of publicity.

In the ongoing battle over the differences between humans and AI in content creation, AI is currently losing in its cause to be considered to be just as good as a human. Perhaps a successful fair use defense of an AI created parody voice spoof will begin to turn the tide.

Prof. Dov Greenbaum is the director of the Zvi Meitar Institute for Legal Implications of Emerging Technologies at the Harry Radzyner Law School, at Reichman University.

Microsoft's new voice mimicking AI VALL-E presents both opportunities and risks

Are the capabilities of VALL-E to spoof a voice also an infringement of the right of publicity?

Related articles:

TAGS