In the beginning
Speech-generating devices have been around since the late 1970s. For decades, these devices were expensive, and were mostly used for communication, and not for language development. As a result, they were mostly only used by adults and teenagers who had already acquired language. At first, most of the voices available then were American and male, which didn’t represent the diversity of Augmented and Alternative Communication (AAC) users. Female voices, as well as other accents and languages, only came along later.
The development of low-cost AAC solutions like Proloquo2Go in 2009 paved the way for lowering the barriers for AAC use. Now, this innovative technology was available to people for whom it might have been considered a risky investment before - for example, a child who still needed to develop language skills before they could start to communicate.
Even though this technology was now available for a wider audience, this still did not address the nearly half of AAC users under the age of 12. With no genuine children’s voices yet available, users made do adult voices or artificially modified voices with raised tones that sounded like they had inhaled helium. This meant that most young AAC users had to speak in a voice they could not identify with and which seemed unnatural or implausible to their communication partners.
Taking on the challenge
Based on requests from users, and our knowledge of the AAC world, we decided to start offering our young users the best AAC experience possible with genuine children’s voices. Since there were none available, we teamed up with Acapela Group, one of the leading Text to Speech companies, to take on the challenge of creating the first genuine children’s voices for Text to Speech.
How does it work?
Text to Speech voices are based on real recordings of a voice talent reading from a long script in the studio. Recording all of the words in a language would take way too long, so the script is designed to contain as many sounds and sound combinations as possible. This takes a great deal of research, because the sounds necessary will differ per language. Speech is then synthesized by the Text to Speech software which creates words by combining sounds together. Once the recordings are made, the voices still require a lot of processing and testing to sound as natural as possible. All told, it took Acapela Group and AssistiveWare about a year to develop the first two children’s voices.
The real key to each of these voices is the talent. Recording children is especially difficult because they need to sound young, but be able to read well and have the discipline to spend days in the studio for recording. The videos below show you how we recorded recorded the British and American children’s voices.
Innovating for the future
We released the first British children’s voices in 2012, and the American children’s voices followed soon after. Because we recognized the demand and impact these can have on our user’s lives, we collaborated with Acapela Group on children’s voices for even more languages: Australian English and German in 2013, bilingual American Spanish and English in 2014, French and Swedish in 2016, and Italian in 2017 and Dutch in 2018. Offering voices for more languages, regional accents and age groups is an important project for AssistiveWare, as we recognize the diversity and individuality of our users.
In addition to increasing the number of languages offered, synthesis technology has continued to improve. We also report any awkward sounding words or intonation to Acapela Group, who fine-tune them based on these suggestions.
The impact of these voices on our users has been tremendous. They’re available for free in all of our AAC apps: Proloquo2Go, Pictello, and Proloquo4Text. But since we recognized the value that these voices might have for users of other apps, we agreed with Acapela Group that they could license the voices for other uses. As a result, now over 20 other companies are using the children’s voices we helped to create.