alexa | Speech Rec Pros

By Sam Brooks for The Spinoff

The following scenario is not uncommon for me: I have to make a phone call, usually to the bank. They say my call may be recorded to improve customer service in the future (and I can almost certainly guarantee my voice is indeed on file in some call centres for training purposes). I’ll wait, impatiently, in the queue. I’ll listen to whatever banal Kiwi playlist they have piped in.

Then, a call centre employee picks up and goes: “Hello, you’re speaking with [name].” I immediately encounter a block – a gap in my speech. The call centre employee hears silence and, not unfairly, hangs up. I repeat this process until I finally get through. It used to feel humiliating, but at this point in my life, it’s been downgraded to merely frustrating. I don’t blame anyone when it happens, aware that we’re all just doing our best in this situation.

Still, I never thought I’d purposefully replicate that hellish experience in my own home. Which is why when I was sent an Alexa (specifically a fourth generation Echo Dot) last week, I was a little bit stoked, but mostly apprehensive. Not just about all the boring security and data issues, but that it’d be useless to me. Nevertheless, I set up the Alexa and asked it to do something an ideal flatmate would do: play ‘Hung Up’ by Madonna, at the highest audio quality possible.

“Alexa.”

Alexa’s little blue light lit up, indicating that it was ready to hear, and act on, my command.

“Play–

I had a block.

Alexa’s little blue light turned off.

Stutters are like snowflakes: they come in all shapes, sizes and severities. No one stutter is the same. My stutter does not sound like the one Colin Firth faked in The King’s Speech, or like any stutter you might’ve heard onscreen. I don’t repeat myself, but instead have halting stops and interruptions in my speech – it might sound like an intake of breath, or just silence. For listeners, it might feel like half a second. For me, it could feel like a whole minute.

I’m used to having a stutter – life would be truly hell if I wasn’t. None of my friends care, and 95% of the strangers I interact with either don’t notice it or do so with so little issue that I don’t notice it myself. In person, my stutter is easily recognisable. You can see when I’m stuttering because you see me stop talking. My mouth stays open, but no sound comes out. You wait for me to resume talking. It’s a blip, a bump in the conversation.

When I’m communicating solely with my voice, it’s a whole other ballgame. There are no visual cues, I can’t wave my hand or roll my eyes to signal I’m experiencing a block. All I’ve got is the silence.

Voice recognition has become markedly more common in the past decade, with the most popular assistants being Siri (Apple), Alexa (Amazon), Cortana (Microsoft) and Google Now (Google, obvs). At their most basic level, they allow the user access to music, news, weather and traffic reports with only a few words. At their most complex, they allow control over your home’s lighting and temperature levels; if you’re having trouble sleeping, you can ask them to snore. Because artificial snoring is apparently a comfort for some people?

They’re especially handy for those with certain physical disabilities. Voice recognition makes a range of household features, ones that might otherwise require assistance to use, much more immediately accessible.

This accessibility does not extend to those of us with dysfluency – those who have speech disabilities, or disabilities that lead to disordered speech. For non-disordered speech, a speech recognition rate of 90-95% is considered satisfactory. With disordered speech, the software will clearly recognise far less. Nearly 50,000 people in New Zealand have a stutter alone, and if you include other speech dysfluencies – or simply not being entirely fluent in English – that’s a huge section of the population who can’t access this technology.

For many people with disordered speech, a voice recognition assistant seems pointless – like a shiny new car for somebody who doesn’t have a driver’s licence. But the tech companies who make them are working to make the interface more accessible for people like me.

In 2019, Google launched Project Euphonia, which collects voice data from people with impaired speech to remedy the AI bias towards fluency. The idea is that by collecting this data, Google can improve its algorithms, and integrate these updates into their assistant. In the same year, Amazon announced a similar integration with Alexa and Voiceitt, an Israeli startup that lets people with impaired speech train an algorithm to recognise their voice. (I considered using this with my own Alexa, but decided against it, out of pure stubbornness.)

Ironically, the intended purpose of voice recognition software is the exact one I’ve had my entire life: To have what I say be recognised, rather than the way I say it.

My first week with Alexa has been an interesting one. I’ve lived alone for about two months now and I generally don’t speak unless I have visitors over. It might be worth pointing out that I don’t stutter when I talk to myself; I also don’t stutter when I think, or when I sing (that last one would make an incredible story if I had an amazing singing voice, but I do not.)

My Alexa doesn’t care about any of that though. All it hears is my silence as I struggle in vain to get it to play ‘Time to Say Goodbye’ on repeat while I have a shower. My Alexa doesn’t know if I’m having a bad speech day or a good one. All it hears is me saying “Alexa” and then nothing. Alexa also expects perfection. It expects me to hit the “d” on “Play ‘I Like Dat’ by T-Pain and Kehlani”. I know I won’t meet that standard. I know I’ll probably stutter multiple times, and Alexa might pick up on that.

My stutter has changed as I’ve aged, as has my speech. That’s not uncommon, especially with people who stutter the way I do. We find ways to avoid stuttering, and when one tic stops giving us a backdoor into fluency, we find another one to settle on.

It took me a long time before I could stop thinking of stuttering as failing at being fluent. It’s not. It’s simply talking in a very different way. I changed my philosophy from “failing is a part of life” to “being different is a part of life”. Both are true, but one is less self-punishing than the other.

If I had an Alexa at a different point in my life, I would probably have thrown it out the window. I would be “failing” constantly in my own home, and I do that enough in public already. But coming to voice recognition in my 30s, when I’ve completely reframed my relationship to my speech, has been a surprisingly chill experience. (Also, I get to pretend I’m a captain on Star Trek, because yes, Alexa will respond to the command “Alexa belay that order!”)

Usually, I hate repeating myself to people, because chances are I’ll stutter a bit more the second time around. I don’t mind repeating myself to Alexa, which I admit is because I’m using it to perform a non-essential function: Nobody ever needed to play T-Pain’s amazing new song featuring Kehlani, and definitely not five times in a row.

By ERIC HAL SCHWARTZ for Voice Bot AI

Amazon has augmented Alexa’s voice profile feature with a version aimed specifically at children. Parents and guardians can use the new Alexa Voice Profiles for Kids tool to enable a personalized experience for up to four children per account. The profiles have debuted alongside Reading Sidekick, a new AI-powered tutor to encourage and help children become literate.

AI READING

Reading Sidekick is the central part of the kid-focused profiles at the moment. Designed for those between the ages of six and nine, Reading Sidekick uses Alexa to help teach a kid to read any of the several hundred titles in its library of supported books, both in digital and physical form. It just required an Echo smart speaker or smart display and an Amazon Kids+ subscription. Amazon Kids+ is what Amazon renamed FreeTime and FreeTime Unlimited and offers exclusive Alexa Skills and other content for $3 a month for Prime members and $5 a month for non-prime members. When a child says, “Alexa, let’s read,” the voice assistant asks what book they want to read and how much they want to read, with choices of taking turns, a little, or a lot. Taking turns means Alexa and the child will trade reading sections, while a little or a lot shifts the ratio one way or the other. Regardless, Alexa will praise their success and even prompt them with the next word if they get stuck.

“With the arrival of Reading Sidekick, we are hopeful we can make reading fun for millions of kids to set them up for a lifetime of learning and a love of reading,” Alexa Education and Learning head Marissa Mierow said. “Alexa provides a welcoming, no-judgment zone and is always ready to help and to read.”

ALEXA FOR KIDS

Amazon first debuted voice profiles for Alexa users back in 2017, enabling Alexa to respond differently to the same query based on who is speaking without switching accounts. This made it easier for a family or roommates to share an Alexa device. Third-party developers were given permission to integrate that element into their Alexa skill in 2019, and the voice assistant began applying user contact information to personalize interactions with Alexa last year. The voice recognition feature even expanded to Amazon’s call center platform in December. The voice profiles created for children function largely the same way but with a narrower range of functions.

It would be an impressive feat for Amazon to have Alexa understand children as well as it does adults. The difficulties involved are why children’s speech recognition tech startup SoapBox Labs were formed. SoapBox, which new Voice Activity Detection (VAD) and Custom Wakeword tools in May, builds on a database of thousands of hours of children’s speech and its own deep learning technology to understand the unique patterns and inflections of children’s speech. There’s no denying that there’s a growing demand for kid-focused voice AI, however. Earlier this year, Google released its own reading tutor for kids, but that feature doesn’t have the personalized touch of Amazon’s new profiles. However, it will almost inevitably be included in the cluster of lawsuits Amazon faces over whether Alexa violates children’s privacy. The new features also meant teaching Alexa to understand better how kids speak and the many variations based on location, age, background, and other factors. The microphones in an Echo are also adjusted when the kids’ profile is engaged as they may be farther away or sitting behind a book when using Reading Sidekick.

Tag: alexa

Alexa and me: A stutterer’s struggle to be heard by voice recognition AI

Alexa Introduces Voice Profiles for Kids and New AI Reading Tutor

AI READING

ALEXA FOR KIDS

Get the Speech Rec Pros Newsletter!