Wait For It …The OpenAI Voice Cloning Tool

OpenAI has announced the preview of its (two years in the making) ‘Voice Engine’ voice cloning tool, although there’s no firm release date yet.

What Can It Do?

OpenAI says Voice Engine uses “text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.” OpenAI says this “small model” with a single 15-second sample can create “emotive and realistic voices.”

Two Years On

Voice Engine was first developed almost 2 years ago in late 2022, since then it’s been used to power the preset voices available in the text-to-speech API and ChatGPT Voice and Read Aloud. ChatGPT Voice is the feature that enables ChatGPT to use voice commands and AI to speak its responses. OpenAI’s text-to-speech (TTS) API is the service that converts text into natural-sounding speech, i.e. it uses AI models to produce speech that closely mimics human voices.

Being Cautious

Although the voice cloning tool has been powering other aspects of OpenAI’s voice command and text-to-speech features for almost two years, the announcement of Voice Engine itself has been delivered with more than a hint of caution about it. For example, OpenAI’s announcement about Voice Engine says it’s just “preliminary insights and results from a small-scale preview.” Also, OpenAI admits it is deliberately taking a “cautious and informed approach to a broader release” which it says is because of the “potential for synthetic voice misuse” (e.g. deepfakes) and using convincing fake audio recordings for fraudulent purposes, impersonation, or spreading misinformation.

OpenAI says that it recognises that generating speech that resembles people’s voices “has serious risks, which are especially top of mind in an election year” and is “engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.“

Also, testing partners for Voice Engine have had to agree to usage policies that prohibit the impersonation of another individual or organisation without consent or legal right. OpenAI is also asking partners to get explicit and informed consent from the original speaker and to disclose to their audience that the voices they’re hearing are AI-generated.

To enable OpenAI to monitor and enforce these policies and requirements, OpenAI says it’s implemented a set of safety measures, which include “watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used.“

What Now?

Although OpenAI wants to announce the fact that it has developed a powerful AI voice cloning tool, it wants to temper the disappointment about not releasing it yet by highlighting a few positive uses for Voice Engine. For example, in its recent announcement about Voice Engine, OpenAI listed how it could be used to :

Provide reading assistance to non-readers and children
Translate content like videos and podcasts (for creators and businesses)
Support people who are non-verbal (therapeutic applications).

OpenAI also highlights how Voice Engine could prove extremely useful for patients recovering their voice or for those people suffering from sudden or degenerative speech conditions, and for improving essential service delivery in remote settings, thereby reaching global communities.

What Does This Mean For Your Business?

With this being a very important election year for at least 64 countries (including the US, UK and India), each of the large AI companies are very reluctant to be named as the one that allowed misuse of their AI products and/or didn’t take the right precautions to prevent misuse. For example, just as Google has put restrictions on what its Gemini AI model will answer about elections for fear of it being misused, OpenAI has decided now is not the right time, without the right protections in place, to release its two years in the making voice cloning tool.

OpenAI, therefore, is happy to let the world and OpenAI’s competitors know that it has an advanced AI ‘Voice Engine’ in the pipeline, but it isn’t prepared to take the risk of the tool and the company’s name being tarnished by misuse within the global arena of elections. It’s likely that we’ll see much more of this caution being exercised by AI companies releasing new features and products, particularly this year.

For businesses and organisations, plus those in the health/therapy sectors hoping to make use of the powerful, value-adding capabilities of Voice Engine, it’s a case of waiting a bit longer. The danger, however, in the fast-moving field of AI is that while time passes (as testing and safety policies are being put in place), another competitor with a new or updated existing powerful voice cloning tool may be released during the meantime, thereby stealing some of Voice Engine’s thunder.

Even when Voice Engine is regarded to be safe to release, this won’t guarantee attempts by bad actors to misuse it, so it will be interesting to see whether it’s as well protected as OpenAI says it will be and what users are able to produce with it. Ultimately, OpenAI will want to get this tool out there, being used by as many people as possible as soon as possible – pending this period of caution.