In the beginning
Speech-generating devices have been around since the late 1970s. For decades, these devices were expensive, and were mostly used for communication, and not for language development. As a result, they were mostly only used by adults and teenagers who had already acquired language. At first, most of the voices available then were American and male, which didn’t represent the diversity of Augmented and Alternative Communication (AAC) users. Female voices, as well as other accents and languages, only came along later.
The development of low-cost AAC solutions like Proloquo2Go in 2009 paved the way for lowering the barriers for AAC use. Now, this innovative technology was available to people for whom it might have been considered a risky investment before - for example, a child who still needed to develop language skills before they could start to communicate.
Even though this technology was now available for a wider audience, this still did not address the nearly half of AAC users under the age of 12. With no genuine children’s voices yet available, users made do adult voices or artificially modified voices with raised tones that sounded like they had inhaled helium. This meant that most young AAC users had to speak in a voice they could not identify with and which seemed unnatural or implausible to their communication partners.
Taking on the challenge
Based on requests from users, and our knowledge of the AAC world, we decided to start offering our young users the best AAC experience possible with genuine children’s voices. Since there were none available, we teamed up with Acapela Group, one of the leading Text to Speech companies, to take on the challenge of creating the first genuine children’s voices for Text to Speech.
How does it work?
Text to Speech voices are based on real recordings of a voice talent reading from a long script in the studio. Recording all of the words in a language would take way too long, so the script is designed to contain as many sounds and sound combinations as possible. This takes a great deal of research, because the sounds necessary will differ per language. Speech is then synthesized by the Text to Speech software which creates words by combining sounds together. Once the recordings are made, the voices still require a lot of processing and testing to sound as natural as possible. All told, it took Acapela Group and AssistiveWare about a year to develop the first two children’s voices.