Posted on

Alexa and me: A stutterer’s struggle to be heard by voice recognition AI

By Sam Brooks for The Spinoff

The following scenario is not uncommon for me: I have to make a phone call, usually to the bank. They say my call may be recorded to improve customer service in the future (and I can almost certainly guarantee my voice is indeed on file in some call centres for training purposes). I’ll wait, impatiently, in the queue. I’ll listen to whatever banal Kiwi playlist they have piped in.

Then, a call centre employee picks up and goes: “Hello, you’re speaking with [name].” I immediately encounter a block – a gap in my speech. The call centre employee hears silence and, not unfairly, hangs up. I repeat this process until I finally get through. It used to feel humiliating, but at this point in my life, it’s been downgraded to merely frustrating. I don’t blame anyone when it happens, aware that we’re all just doing our best in this situation.

Still, I never thought I’d purposefully replicate that hellish experience in my own home. Which is why when I was sent an Alexa (specifically a fourth generation Echo Dot) last week, I was a little bit stoked, but mostly apprehensive. Not just about all the boring security and data issues, but that it’d be useless to me. Nevertheless, I set up the Alexa and asked it to do something an ideal flatmate would do: play ‘Hung Up’ by Madonna, at the highest audio quality possible.

“Alexa.”

Alexa’s little blue light lit up, indicating that it was ready to hear, and act on, my command.

“Play–

I had a block. 

Alexa’s little blue light turned off.

Stutters are like snowflakes: they come in all shapes, sizes and severities. No one stutter is the same. My stutter does not sound like the one Colin Firth faked in The King’s Speech, or like any stutter you might’ve heard onscreen. I don’t repeat myself, but instead have halting stops and interruptions in my speech – it might sound like an intake of breath, or just silence. For listeners, it might feel like half a second. For me, it could feel like a whole minute.

I’m used to having a stutter – life would be truly hell if I wasn’t. None of my friends care, and 95% of the strangers I interact with either don’t notice it or do so with so little issue that I don’t notice it myself. In person, my stutter is easily recognisable. You can see when I’m stuttering because you see me stop talking. My mouth stays open, but no sound comes out. You wait for me to resume talking. It’s a blip, a bump in the conversation.

When I’m communicating solely with my voice, it’s a whole other ballgame. There are no visual cues, I can’t wave my hand or roll my eyes to signal I’m experiencing a block. All I’ve got is the silence.

Voice recognition has become markedly more common in the past decade, with the most popular assistants being Siri (Apple), Alexa (Amazon), Cortana (Microsoft) and Google Now (Google, obvs). At their most basic level, they allow the user access to music, news, weather and traffic reports with only a few words. At their most complex, they allow control over your home’s lighting and temperature levels; if you’re having trouble sleeping, you can ask them to snore. Because artificial snoring is apparently a comfort for some people?

They’re especially handy for those with certain physical disabilities. Voice recognition makes a range of household features, ones that might otherwise require assistance to use, much more immediately accessible.

This accessibility does not extend to those of us with dysfluency – those who have speech disabilities, or disabilities that lead to disordered speech. For non-disordered speech, a speech recognition rate of 90-95% is considered satisfactory. With disordered speech, the software will clearly recognise far less. Nearly 50,000 people in New Zealand have a stutter alone, and if you include other speech dysfluencies – or simply not being entirely fluent in English – that’s a huge section of the population who can’t access this technology.

For many people with disordered speech, a voice recognition assistant seems pointless – like a shiny new car for somebody who doesn’t have a driver’s licence. But the tech companies who make them are working to make the interface more accessible for people like me. 

In 2019, Google launched Project Euphonia, which collects voice data from people with impaired speech to remedy the AI bias towards fluency. The idea is that by collecting this data, Google can improve its algorithms, and integrate these updates into their assistant. In the same year, Amazon announced a similar integration with Alexa and Voiceitt, an Israeli startup that lets people with impaired speech train an algorithm to recognise their voice. (I considered using this with my own Alexa, but decided against it, out of pure stubbornness.)

Ironically, the intended purpose of voice recognition software is the exact one I’ve had my entire life: To have what I say be recognised, rather than the way I say it.

My first week with Alexa has been an interesting one. I’ve lived alone for about two months now and I generally don’t speak unless I have visitors over. It might be worth pointing out that I don’t stutter when I talk to myself; I also don’t stutter when I think, or when I sing (that last one would make an incredible story if I had an amazing singing voice, but I do not.)

My Alexa doesn’t care about any of that though. All it hears is my silence as I struggle in vain to get it to play ‘Time to Say Goodbye’ on repeat while I have a shower. My Alexa doesn’t know if I’m having a bad speech day or a good one. All it hears is me saying “Alexa” and then nothing. Alexa also expects perfection. It expects me to hit the “d” on “Play ‘I Like Dat’ by T-Pain and Kehlani”. I know I won’t meet that standard. I know I’ll probably stutter multiple times, and Alexa might pick up on that. 

My stutter has changed as I’ve aged, as has my speech. That’s not uncommon, especially with people who stutter the way I do. We find ways to avoid stuttering, and when one tic stops giving us a backdoor into fluency, we find another one to settle on. 

It took me a long time before I could stop thinking of stuttering as failing at being fluent. It’s not. It’s simply talking in a very different way. I changed my philosophy from “failing is a part of life” to “being different is a part of life”. Both are true, but one is less self-punishing than the other.

If I had an Alexa at a different point in my life, I would probably have thrown it out the window. I would be “failing” constantly in my own home, and I do that enough in public already. But coming to voice recognition in my 30s, when I’ve completely reframed my relationship to my speech, has been a surprisingly chill experience. (Also, I get to pretend I’m a captain on Star Trek, because yes, Alexa will respond to the command “Alexa belay that order!”)

Usually, I hate repeating myself to people, because chances are I’ll stutter a bit more the second time around. I don’t mind repeating myself to Alexa, which I admit is because I’m using it to perform a non-essential function: Nobody ever needed to play T-Pain’s amazing new song featuring Kehlani, and definitely not five times in a row.

Need more dictation or transcription supplies and accessories?

Visit our friends over at TranscriptionGear to get the rest of what you need! From headsets to foot pedals, they have you covered.