From the time Steve Jobs first unveiled the Macintosh that introduced itself with Text-to-Speech in 1984 to Google showcasing their voice assistant booking an appointment over a phone call during Google’s IO 2018, Voice technology, in particular, the speech synthesis technology has evolved with an unimaginable degree of realism that many tech giants believe AI Voices – The Future Of Voice-Over Audio
Thanks to Machine Learning, perception of computer voices a.k.a Text-to-Speech voices to be robotic, monotonous and life-less have now been transformed into natural-sounding realistic voices. They are not only capable of mimicking human-like speech but can generate full-length near-perfect audio narrations.
Listen to these samples for example –
AI voices are now being used in mainstream applications with this unprecedented level of quality in a computer generated voice such as creating audiobooks, narrating blog posts or news, creating voice overs for videos, elearning etc. Thereby reducing the need to hire professional voice actors.
With the tremendous amount of R&D that’s happening around open source projects and the competition among technology companies to create even more realistic voices; AI voices are only going to get better, cheaper, and more accessible.
But can AI voices completely replace voice actors?
Let’s look at some of the key factors that come into play when creating voice-over audio, especially when hiring voice-over actors compare them with AI voices.
Cost of creating Voice Over audio
Hiring voice actors is expensive. An average voice actor charges $50-$100 for every 100 words.
Some voice actors may even charge thousands of dollars depending on what you are hiring them for – is it a commercial, audiobook? The price varies. And there’s additional cost for securing commercial or distribution rights to the audio.
AI voices on the other hand come at a fraction of a cost – $16 for converting roughly 142K words. The standard voices are even cheaper – $4 for the same amount of words. Plus you have the rights to own, distribute and commercialize the audio as you wish.
Today, the goal of Text-to-Speech is to make the voices sound better than a human. Although, getting closer to their goal, they are simply not there yet.
There still are nuances in the speech that renders it “automated” and even though they may not be applicable to creating voice overs for commercials and radios, they are perfectly suitable for a lot of other applications that don’t demand such high expressiveness.
The quality of a synthetic voice is measured by these parameters
1. Intelligibility – Degree of each word pronounced in the sentence
2. Naturalness – In terms of timing, pronunciation & rendering emotions
3. Comprehensibility – Degree of the message understood
And today’s AI voices excel in the above parameters.
The Conversational and Newscaster profiles created Amazon push the voice quality of AI voices even further.
It depends on the use case however, it’s not for every application that you can use an AI voice.
For example, you can’t have a TV commercial voiced by an AI voice, but a voice over for a Facebook ad for your small business? Of course!
Turn around time for creating the audio
Typically voice over artists take days to finish the recording depending on the amount of text to voice. Usually, time gets wasted while finding the right voice over artist for the job. Once the recording is sent over, there’s a need to edit or change something.
What was supposed to be a 2 day job can quickly turn into a 5-7 day time sink. Bottom line is it takes time for a human to record, edit and make the audio perfect.
AI voices on the other hand can generate the audio almost instantaneously. Also it’s quick and easy to edit the recording as it’s just a matter of editing the text and generating the audio again.
Voice over artists seems feasible when working on smaller projects like YouTube video or podcasts. where all it takes is to identify the right artist, and work towards getting the job done which would take anywhere from 24- 48 hrs. But when a project is big and requires hours of investment, voice-over artists increase the dependency and potential risk factors of not meeting deadlines leading to losses.
Well, for AI voices it is not the case. AI Voices are hosted on the cloud, which means they can handle any amount of text thrown at them and convert it all at the same time. They offer a way more scalable solution than humans.
AI voices offer the flexibility to choose from 50 different languages and create audio using any of those languages. One can easily translate the original content into multiple languages and use the AI voices from different languages to create audio in multiple translations, saving huge amounts of time and money trying to find multiple actors.
Voice technology has come a long way and is now at a point where AI voices are capable of generating human-like audio making them useful in mainstream applications which otherwise would have required hiring a voice over artist. AI voices offer a great opportunity to save to time and cost while creating high quality audio content.
Even though AI voices are still far from delivering the expressiveness and experience created by a real voice actor. They may not be suitable for voicing commercials or radio content, it’s just a matter of time these voices catch up to sound exactly like professional voice actors, if not better.