Posted on

The Race to Save Indigenous Languages, Using Automatic Speech Recognition

By Tanner Stening for News@Northeastern

Michael Running Wolf still has that old TI-89 graphing calculator he used in high school that helped propel his interest in technology. 

“Back then, my teachers saw I was really interested in it,” says Running Wolf, clinical instructor of computer science at Northeastern University. “Actually a couple of them printed out hundreds of pages of instructions for me on how to code” the device so that it could play games. 

What Running Wolf, who grew up in a remote Cheyenne village in Birney, Montana, didn’t realize at the time, poring over the stack of printouts at home by the light of kerosene lamps, was that he was actually teaching himself basic programming.

“I thought I was just learning how to put computer games on my calculator,” Running Wolf says with a laugh. 

But it hadn’t been his first encounter with technology. Growing up in the windy plains near the Northern Cheyenne Indian Reservation, Running Wolf says that although his family—which is part Cheyenne, part Lakota—didn’t have daily access to running water or electricity, sometimes, when the winds died down, the power would flicker on, and he’d plug in his Atari console and play games with his sisters. 

These early experiences would spur forward a lifelong interest in computers, artificial intelligence, and software engineering that Running Wolf is now harnessing to help reawaken endangered indigenous languages in North and South America, some of which are so critically at risk of extinction that their tallies of living native speakers have dwindled into the single digits. 

Running Wolf’s goal is to develop methods for documenting and maintaining these early languages through automatic speech recognition software, helping to keep them “alive” and well-documented. It would be a process, he says, that tribal and indigenous communities could use to supplement their own language reclamation efforts, which have intensified in recent years amid the threats facing languages. 

“The grandiose plan, the far-off dream, is we can create technology to not only preserve, but reclaim languages,” says Running Wolf, who teaches computer science at Northeastern’s Vancouver campus. “Preservation isn’t what we want. That’s like taking something and embalming it and putting it in a museum. Languages are living things.”

The better thing to say is that they’ve “gone to sleep,” Running Wolf says. 

And the threats to indigenous languages are real. Of the roughly 6,700 languages spoken in the world, about 40 percent are in danger of atrophying out of existence forever, according to UNESCO Atlas of Languages in Danger. The loss of these languages also represents the loss of whole systems of knowledge unique to a culture, and the ability to transmit that knowledge across generations.

While the situation appears dire—and is, in many cases—Running Wolf says nearly every Native American tribe is engaged in language reclamation efforts. In New England, one notable tribe doing so is the Mashpee Wampanoag Tribe, whose native tongue is now being taught in public schools on Cape Cod, Massachusetts. 

But the problem, he says, is that in the ever-evolving field of computational linguistics, little research has been devoted to Native American languages. This is partially due to a lack of linguistic data, but it is also because many native languages are “polysynthetic,” meaning they contain words that comprise many morphemes, which are the smallest units of meaning in language, Running Wolf says. 

Polysynthetic languages often have very long words—words that can mean an entire sentence, or denote a sentence’s worth of meaning. 

Further complicating the effort is the fact that many Native American languages don’t have an orthography, or an alphabet, he says. In terms of what languages need to keep them afloat, Running Wolf maintains that orthographies are not vital. Many indigenous languages have survived through a strong oral tradition in lieu of a robust written one.

But for scholars looking to build databases and transcription methods, like Running Wolf, written texts are important to filling in the gaps. What’s holding researchers back from building automatic speech recognition for indigenous languages is precisely that there is a lack of audio and textual data available to them.

Using hundreds of hours of audio from various tribes, Running Wolf has managed to produce some rudimentary results. So far, the automatic speech recognition software he and his team have developed can recognize single, simple words from some of the indigenous languages they have data for. 

“Right now, we’re building a corpus of audio and texts to start showing early results,” Running Wolf says. 

Importantly, he says, “I think we have an approach that’s scientifically sound.”

Eventually, Running Wolf says he hopes to create a way for tribes to provide their youth with tools to learn these ancient languages by way of technological immersion—through things like augmented or virtual reality, he says. 

Some of these technologies are already under development by Running Wolf and his team, made up of a linguist, a data scientist, a machine learning engineer, and his wife, who used to be a program manager, among others. All of the ongoing research and development is being done in consultation with numerous tribal communities, Running Wolf says.

“It’s all coming from the people,” he says. “They want to work with us, and we’re doing the best to respect their knowledge systems.”

Posted on

Physician burnout in healthcare: Quo vadis?

By Ifran Khan for Fast Company

Burnout was included as an occupational phenomenon in the International Classification of Diseases (ICD-11) by the World Health Organization in 2019.

Today, burnout is prevalent in the forms of emotional exhaustion, personal, and professional disengagement and a low sense of accomplishment. While cases of physician fatigue continue to rise, some healthcare companies are looking to technology as a driver of efficiency. Could technology pave the way to better working conditions in healthcare?

While advanced technologies like AI cannot solve the issue on their own, data-driven decision-making could alleviate some operational challenges. Based on my experience in the industry, here are some tools and strategies healthcare companies can put into practice to try and reduce physician burnout.


Clinical decision support (CDS) tools help sift through copious amounts of digital data to catch potential medical problems and alert providers about risky medication interactions. To help reduce fatigue, CDS systems can be used to integrate decision-making aids and channel accurate information on a single platform. For example, they can be used to get the correct information (evidence-based guidance) to the correct people (the care team and patient) through the correct channels (electronic health record and patient portal) in the correct intervention (order sets, flow sheets or dashboards) at the correct points (for workflow-based decision making).

When integrated with electronic health records (EHRs) to merge with existing data sets, CDS systems can automate data collection on vital life signs and alerts to aid physicians in improving patient care and outcomes.


Companies can use AI-enabled speech recognition solutions to reduce “click fatigue” by interpreting and converting human voice into text. When used by physicians to efficiently translate speech to text, these intelligent assistants can reduce effort and error in documentation workflows.

With the help of speech recognition through AI and machine learning, real-time automated medical transcription software can help alleviate physician workload, ultimately addressing burnout. Data collected from dictation technology can be seamlessly added to patient digital files and built into CDS systems. Acting as a virtual onsite scribe, this ambient technology can capture every word in the physician-patient encounter without taking the physician’s attention off their patient.


Resource-poor technologies sometimes used in telehealth often lack the bandwidth to transmit physiological data and medical images — and their constant usage can lead to physician distress.

In radiology, advanced imaging through computer-aided ultrasounds can reduce the need for human intervention. Offering a quantitative assessment through deep analytics and machine learning, AI recognizes complex patterns in data imaging, aiding the physician with the diagnosis.


Upgrading the digitized medical record system, automating the documentation process, and augmenting the medical transcription are the foremost benefits of natural language processing (NLP)-enabled software. These tools can reduce administrative burdens on physicians by analyzing and extracting unstructured clinical data to document relevant points in a structured manner. That avoids the instance of under-coding and streamlines the way medical coders extract diagnostic and clinical data, enhancing value-based care.


Advanced medical technologies can significantly reduce physician fatigue, but they must be tailored to the implementation environment. That reduces physician-technology friction and makes the adaptation of technology more human-centered.

The nature of a physician’s job may always put them at risk of burnout, but optimal use and consistent management of technology can make a positive impact. In healthcare, seeking technological solutions that reduce the burden of repetitive work—and then mapping the associated benefits and studying the effects on staff well-being and clinician resilience—provides deep insights.

Posted on

Are residency policies creating physician shortages? 5 recent studies to know

By Patsy Newitt for Becker’s Hospital Review

California has the most active specialty physicians in the U.S., according to 2021 data published by the Kaiser Family Foundation. 

Here are five things to know from recently published studies:

1. Artificial intelligence technology may deter one-sixth of medical students from pursuing careers in radiology because of negative opinions of AI in the medical community, according to a study published in Clinical Imaging Oct. 2 .

2. Medical students identifying as sexual minorities are underrepresented in undergraduate medical training and among certain specialties following graduation, according to a study published Sept. 30 in JAMA Network Open.

3. Bottlenecks in the physician training and education pipeline are limiting entry for residency and playing a vital role in U.S. physician shortages and care access issues, according to a Sept. 20 report from nonpartisan think tank Niskanen Center. 

4. California has the most active specialty physicians in the U.S, according to 2021 data published by Kaiser Family Foundation Sept. 22. Here are the number of specialty physicians by state.

5. At least 93 percent of providers qualified for a positive payment adjustment from 2017 through 2019 under the Merit-based Incentive Payment System, according to a new report from the Government Accountability Office.

Posted on

Three Ways AI Is Improving Assistive Technology

Wendy Gonzalez for Forbes

Artificial intelligence (AI) and machine learning (ML) are some of the buzziest terms in tech and for a good reason. These innovations have the potential to tackle some of humanity’s biggest obstacles across industries, from medicine to education and sustainability. One sector, in particular, is set to see massive advancement through these new technologies: assistive technology. 

Assistive technology is defined as any product that improves the lives of individuals who otherwise may not be able to complete tasks without specialized equipment, such as wheelchairs and dictation services. Globally, more than 1 billion people depend on assistive technology. When implemented effectively, assistive technology can improve accessibility and quality of life for all, regardless of ability. 

Here are three ways AI is currently improving assistive technology and its use-cases, which might give your company some new ideas for product innovation: 

Ensuring Education For All

Accessibility remains a challenging aspect of education. For children with learning disabilities or sensory impairments, dictation technology, more commonly known as speech-to-text or voice recognition, can help them to write and revise without pen or paper. In fact, 75 out of 149 participants with severe reading disabilities reported increased motivation in their schoolwork after a year of incorporating assistive technology.

This technology works best when powered by high-quality AI. Natural Language Processing (NLP) and machine learning algorithms have the capability to improve the accuracy of speech recognition and word predictability, which can minimize dictation errors while facilitating effective communication from student to teacher or among collaborating schoolmates. 

That said, according to a 2001 study, only 35% of elementary schools — arguably the most significant portion of education a child receives — provide any assistive technology. This statistic could change due to social impact of AI programs. These include Microsoft’s AI for Accessibility initiative, which invests in innovations that support people with neuro-diversities and disabilities. Its projects include educational AI applications that provide students with visual impairments the text-to-speech, speech recognition and object recognition tools they need to succeed in the classroom.  

Better Outcomes For Medical Technology

With a rapidly aging population estimated to top approximately 2 billion over the age of 60 by 2050, our plans to care for our loved ones could rely heavily on AI and ML in the future. Doctors and entrepreneurs are already paving the way; in the past decade alone, medical AI investments topped $8.5 billion in venture capital funding for the top 50 startups. 

Robot-assisted surgery is just one AI-powered innovation. In 2018, robot-assisted procedures accounted for 15.1% of all general surgeries, and this percentage is expected to rise as surgeons implement additional AI-driven surgical applications in operating rooms. When compared to traditional open surgery, robot-assisted surgeons tend to leave smaller incisions, which reduces overall pain and scarring, thereby leading to quicker recovery times.

AI-powered wearable medical devices, such as female fertility-cycle trackers, are another popular choice. Demand for products including diabetic-tracking sweat meters and respiratory patients’ oximeters have created a market that’s looking at a 23% CAGR by 2023. 

What’s more, data taken from medical devices could contribute to more than $7 billion in savings per year for the U.S. healthcare market. This data improves doctors’ understanding of preventative care and better informs post-recovery methods when patients leave after hospital procedures.

Unlocking Possibilities In Transportation And Navigation

Accessible mobility is another challenge that assistive technology can help tackle. Through AI-powered running apps and suitcases that can navigate through entire airports, assistive technology is changing how we move and travel. One example is Project Guideline, a Google project helping individuals who are visually impaired navigate their way through roads and paths with an app that combines computer vision and a machine-learning algorithm to aid the runner alongside a pre-designed path. 

Future runners and walkers may one day navigate roads and sidewalks unaccompanied by guide dogs or sighted guides, gaining autonomy and confidence while accomplishing everyday tasks and activities without hindrance. For instance, developed and spearheaded by Chieko Asakawa, a Carnegie Mellon Professor who is blind, CaBot is a navigation robot that uses sensor information to help avoid airport obstacles, alert someone to nearby stores and assist with required actions like standing in line at airport security checkpoints. 

The Enhancement Of Assistive Technology

These are just some of the ways that AI assistive technology can transform the way individuals and society move and live. To ensure assistive technologies are actively benefiting individuals with disabilities, companies must also maintain accurate and diverse data sets with annotation that is provided by well-trained and experienced AI teams. These ever-updating data sets need to be continually tested before, during and after implementation.

AI possesses the potential to power missions for the greater good of society. Ethical AI can transform the ways assistive technologies improve the lives of millions in need. What other types of AI-powered assistive technology have you come across and how could your company make moves to enter this industry effectively? 

Posted on

VA to move Nuance’s voice-enabled clinical assistant to the cloud: 5 details

By Katie Adams for Becker’s Hospital Review

The Department of Veterans Affairs is migrating to the cloud platform for Nuance’s automated clinical note-taking system, the health system said Sept. 8.

Five details:

  1. ​​The VA will use the Nuance Dragon Medical One speech recognition cloud platform and Nuance’s mobile microphone app, allowing physicians to use their voices to document patient visits more efficiently. The system is intended to allow physicians to spend more time with patients and less time on administrative work.
  2. The VA deployed Nuance Dragon Medical products systemwide in 2014. It is now upgrading to the system’s cloud offering so its physicians can utilize the added capabilities and mobile flexibility.
  3. To ensure Nuance’s products adhere to the government’s latest guidance on data security and privacy, the Federal Risk and Authorization Management Program approved the VA’s decision to adopt the technologies.
  4. “The combination of our cloud-based platforms, secure application framework and deep experience working with the VA health system made it possible for us to demonstrate our compliance with FedRAMP to meet the needs of the U.S. government. We are proving that meeting security requirements and delivering the outcomes and workflows that matter to clinicians don’t have to be mutually exclusive,” Diana Nole, Nuance’s executive vice president and general manager of healthcare, said in a news release.
  5. Nuance Dragon Medical One is used by more than 550,000 physicians.
Posted on

Voice AI Technology Is More Advanced Than You Might Think

For Annie Brown for Forbes

Systems that can handle repetitive tasks have supported global economies for generations. But systems that can handle conversations and interactions? Those have felt impossible, due to the complexity of human speech. Any of us who regularly use Alexa or Siri can attest to the deficiencies of machine learning in handling human messages. The average person has yet to interact with the next generation of voice AI tools, but what this technology is capable of has the potential to change the world as we know it.

The following is a discussion of three innovative technologies are accelerating the pace of progress in this sector.

Conversational AI for Ordering

Experts in voice AI have prioritized technology that can alleviate menial tasks, freeing humans up to engage in high-impact, creative endeavors. Drive-through ordering was early identified by developers as an area in which conversational AI could make an impact, and one company appears to have cracked the code.

Creating a conversational AI system that can handle drive-through restaurant ordering may sound simple: load in the menu, use chat-based AI, and you’ve done it. The actual solutions aren’t quite so easy. In fact, creating a system that works in an outdoor environment—handling car noises, traffic, other speakers—and one that has sophisticated enough speech recognition to decipher multiple accents, genders, and ages, presents immense challenges.

The co-founders of Hi Auto, Roy Baharav and Eyal Shapira, both have a background in AI systems for audio: Baharav in complex AI systems at Google and Shapira in NLP and chat interfacing.

Baharav describes the difficulties of making a system like this work: “Speech handling in general, for humans, is hard. You talk to your phone and it understands you – that is a completely different problem from understanding speech in an outdoor environment. In a drive-through, people are using unique speech patterns. People are indecisive – they’re changing their minds a lot.”

That latter issue illustrates what they call multi-turn conversation, or the back-and-forth we humans do so effortlessly. After years of practice, model training, and refinement, Hi Auto has now installed their conversational AI systems in drive-throughs around the country, and are seeing a 90% level of accuracy.

Shapira forecasts, “Three years from now, we will probably see as many as 40,000 restaurant locations using conversational AI. It’s going to become a mainstream solution.” 

“AI can address two of the critical problems in quick-serve restaurants,” comments Joe Jensen, a Vice President at Intel Corporation, “Order accuracy which goes straight to consumer satisfaction and then order accuracy also hits on staff costs in reducing that extra time staff spends.” 

Conversation Cloud for Intelligent Machines

A second groundbreaking innovation in the world of conversational AI is using a technique that turns human language into an input.

The CEO of Whitehead AI, Diwank Tomer, illustrates the historical challenges faced by conversational AI: “It turns out that, when we’re talking or writing or conveying anything in human language, we depend on background information a lot. It’s not just general facts about the world but things like how I’m feeling or how well defined something is.

“These are obvious and transparent to us but very difficult for AI to do. That’s why jokes are so difficult for AI to understand. It’s typically something ridiculous or impossible, framed in a way that seems otherwise. For humans, it’s obvious. For AI, not so much. AI only interprets things literally.”

So, how does a system incapable of interpreting nuance, emotion, or making inferences adequately communicate with humans? The same way a non-native speaker initially understands a new language: using context.

Context aware AI is building models that can use extra information, beyond the identity of the speaker or other facts. Chatbots are one area which are inherently lacking, and could benefit from this technology. For instance, if a chatbot could glean contextual information from a user’s profile, previous interactions, and other data points, that could be used to frame highly intelligent responses.

Tomer describes it this way, “We are building an infrastructure for manipulating natural language. Something new that we’ve built is chit chat API – when you say something and it can’t be understood, Alexa will respond with, ‘I’m sorry, I can’t understand that.’ It’s possible now to actually pick up or reply with witty answers.”

Tomer approaches the future of these technologies with high hopes: “Understanding conversation is powerful. Imagine having conversations with any computer: if you’re stuck in an elevator, you could scream and it would call for help. Our senses are extended through technology.”

Data Process Automation

Audio is just one form of unstructured data. When collected, assessed, and interpreted, the output of patterns and trends can be used to make strategic decisions or provide valuable feedback.

super.AI was founded by Brad Cordova. The company uses AI to automate the processing of unstructured data. Data Process Automation, or DPA, can be used to automate repetitive tasks that deal with unstructured data, including audio and video files. 

For example, in a large education company, children use a website to read sentences aloud. super.AI used a process automation application to see how many errors a child made. This automation process has a higher accuracy and faster response time than when done by humans, enabling better feedback for enhanced learning.

Another example has to do with personal information (PI), which is a key point of concern in today’s privacy-conscious world, especially when it comes to AI. super.AI has a system of audio reduction whereby it can remove PI from audio, including name, address, and social security numbers. It can also remove copyrighted material from segments of audio or video, ensuring GDPR or CCPA compliance.

It’s clear that the supportive qualities of super.AI are valuable, but when it comes to the people who currently do everything from quality assurance on website product listings to note taking at a meeting, the question is this: are we going too far to replace humans?

Cordova would say no, “Humans and machines are orthogonal. If you see the best chess players: they aren’t human or machine, they’re humans and machines working together. We know intuitively as humans what we’re put on this earth for. You feel good when you talk with people, feel empathy, and do creative tasks.

“There are a lot of tasks where you don’t feel great: tasks that humans shouldn’t be doing. We want humans to be more human. It’s not about taking humans’ jobs, it’s about allowing humans to operate where we’re best and machines aren’t.”

Voice AI is chartering unprecedented territory and growing at a pace that will inevitably transform markets. The adoption rates for this kind of tech may change most industries as we currently know them. The more AI is integrated, the more humans can benefit from it. As Cordova succinctly states, “AI is the next, and maybe the last technology we will develop as humans.” The capacity of AI to take on new roles in our society has the power to let humans be more human. And that is the best of all possible outcomes.

Posted on

Role of Artificial Intelligence and Machine Learning in Speech Recognition

By The Signal

If you have ever wondered how your smartphone can comprehend instructions like “Call Mom,” “Send a Message to Boss,” “Play the Latest Songs,” “Switch ON the AC,” then you are not alone. But how is this done? The one simple answer is Speech Recognition. Speech Recognition has gone through the roof in the recent 4-5 years and is making our lives more comfortable every day. 

Speech Recognition was first introduced by IBM in 1962 when it unveiled the first machine capable of converting human voice to text. Today, powered by the latest technologies like Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning, speech recognition is touching new milestones. 

This latest technological advancement is being used across the globe by top companies to make their user’s experience efficient and smooth. Technologies like Amazon’s Alexa, Apple’s Siri, Google Assistant, Google Speech, Google Dictate, Facebook’s Oculus VR, and Microsoft’s Cortana are all examples of Speech Recognition. 

The expanding usage of speech-to-text technologies has also opened many new job domains, and students are wonderfully exploiting them. Many students are now joining courses like PGP in AI and Machine Learning after completing their graduation to improve their prospects. The high salary package of around INR 15 lakh for freshers is the 2nd biggest reason attracting students towards this, the biggest reason being the fantastic job role. 

Speech Recognition was a very niche domain before the advent of AI and ML, which has completely transformed it now. Before we understand how AI and ML made changes, let’s understand the nuances of what all these terminologies are. 

Artificial Intelligence 

Artificial Intelligence is the technology by which machines become capable of demonstrating intelligence like humans or animals. Initially, AI was only about memorizing data and producing results accordingly; however, now it is much more than that as machines perform various activities like Speech Recognition, Object Recognition, Translating Texts, and a lot more. 

Another latest addition to AI has been Deep Learning. With the help of Deep Learning, machines can process data and create patterns that help them make valuable decisions. This behavior of a machine through Deep Learning is similar to the behavior of a human brain. Deep Learning activities can be “Supervised,” “Semi-Supervised,” as well as “Unsupervised.” 

Machine Learning 

Machine Learning is a subdomain of AI which teaches machines to memorize past events and activities. Through ML, machines are trained to retain various data sets’ information and outputs and identify patterns in these decisions. It allows the machine to learn by itself without the help of any programming code. 

An example of Machine Learning is the e-Commerce websites suggesting products to you. The code, once written, allows machines to evolve on themselves and analyze user behavior and thus recommend products according to their preferences and past purchases. This involves Zero Human Interference and makes use of approaches like Artificial Neural Networks (ANN). 

Speech Recognition 

Speech Recognition is simply the activity of comprehending a user’s voice and converting that into text. It is chiefly of 3 types: 

  1. Automatic Speech Recognition (ASR) 
  2. Computer Speech Recognition (CSR) 
  3. Speech to Text (STT) 

Note: Speech Recognition and Voice Recognition are two different things. While the former comprehends a voice sample and converts it into a text sample, the sole purpose of the latter is to identify the voice and recognize to whom it belongs. Voice Recognition is often used for security and authenticity purposes. 

How Has AI and ML Affected the Future of Speech Recognition? 

The usage of Speech Recognition in our devices has grown considerably due to the developments in AI and ML technologies. Speech Recognition is now being used for tasks ranging from awakening your appliances and gadgets to monitoring your fitness, playing mood-booster songs, running queries on search engines, and even making phone calls. 

The global market for Speech Recognition, currently growing at a Cumulative Annual Growth Rate (CAGR) of 17.2%, is expected to breach the $25 billion mark by 2025. However, there were enormous challenges initially that have been tackled with the use of AI and ML now. 

When in its initial phase, some of the biggest challenges for Speech Recognition were Poor Voice Recording Devices, Huge Noise in the Voice Samples, Different Pitches in Speech of the Same User, etc. In addition to this, the changing dialects and grammatical factors like Homonyms were also a big challenge. 

With the help of AI programs capable of filtering sound, canceling noise, and identifying the meaning of words depending on the context, most of these challenges have been tackled. Today, Speech Recognition shows an efficiency of 95%, which stood at less than 20% around 30 years back from now. The only biggest challenge remaining now for programmers is making machines capable of understanding emotions and feelings and satisfactory progress in this part. 

The increasing efficiency in Speech Recognition is becoming an essential driving factor in its success, and top tech giants are leveraging these benefits. More than 20% of users searched on Google through Voice in 2016 only, and this number is expected to be far more prominent now. Businesses today are automating their services to make their operations efficient and introducing Speech Recognition facilities at the top of their to-do lists. 

Some of the key usages of Speech Recognition today are listed below. 

  • The most common use of Speech Recognition is to perform basic activities like Searching on Google, Setting Reminders, Scheduling Meetings, Playing Songs, Controlling Synced Devices, etc. 
  • Speech Recognition is now also being used in various financial transactions, with some banks and financial companies offering the feature of “Voice Transfer” to their users. 

Speech Recognition is no doubt one of the best innovations made by expanding technological developments. However, there is one thing to be noted if you are also planning to enter this sector. The domain is inter-mingled, and the mere knowledge provided by a Speech Recognition course won’t be enough for you to survive in this field. 

Therefore, it is essential that you also sharpen your skills in allied concepts like Data Science, Data Analytics, Machine Learning, Artificial Intelligence, Neural Networks, DevOps, and Deep Learning. So what are you waiting for now? Hurry up and join an online course in Speech Recognition now!

Posted on

Alexa and me: A stutterer’s struggle to be heard by voice recognition AI

By Sam Brooks for The Spinoff

The following scenario is not uncommon for me: I have to make a phone call, usually to the bank. They say my call may be recorded to improve customer service in the future (and I can almost certainly guarantee my voice is indeed on file in some call centres for training purposes). I’ll wait, impatiently, in the queue. I’ll listen to whatever banal Kiwi playlist they have piped in.

Then, a call centre employee picks up and goes: “Hello, you’re speaking with [name].” I immediately encounter a block – a gap in my speech. The call centre employee hears silence and, not unfairly, hangs up. I repeat this process until I finally get through. It used to feel humiliating, but at this point in my life, it’s been downgraded to merely frustrating. I don’t blame anyone when it happens, aware that we’re all just doing our best in this situation.

Still, I never thought I’d purposefully replicate that hellish experience in my own home. Which is why when I was sent an Alexa (specifically a fourth generation Echo Dot) last week, I was a little bit stoked, but mostly apprehensive. Not just about all the boring security and data issues, but that it’d be useless to me. Nevertheless, I set up the Alexa and asked it to do something an ideal flatmate would do: play ‘Hung Up’ by Madonna, at the highest audio quality possible.


Alexa’s little blue light lit up, indicating that it was ready to hear, and act on, my command.


I had a block. 

Alexa’s little blue light turned off.

Stutters are like snowflakes: they come in all shapes, sizes and severities. No one stutter is the same. My stutter does not sound like the one Colin Firth faked in The King’s Speech, or like any stutter you might’ve heard onscreen. I don’t repeat myself, but instead have halting stops and interruptions in my speech – it might sound like an intake of breath, or just silence. For listeners, it might feel like half a second. For me, it could feel like a whole minute.

I’m used to having a stutter – life would be truly hell if I wasn’t. None of my friends care, and 95% of the strangers I interact with either don’t notice it or do so with so little issue that I don’t notice it myself. In person, my stutter is easily recognisable. You can see when I’m stuttering because you see me stop talking. My mouth stays open, but no sound comes out. You wait for me to resume talking. It’s a blip, a bump in the conversation.

When I’m communicating solely with my voice, it’s a whole other ballgame. There are no visual cues, I can’t wave my hand or roll my eyes to signal I’m experiencing a block. All I’ve got is the silence.

Voice recognition has become markedly more common in the past decade, with the most popular assistants being Siri (Apple), Alexa (Amazon), Cortana (Microsoft) and Google Now (Google, obvs). At their most basic level, they allow the user access to music, news, weather and traffic reports with only a few words. At their most complex, they allow control over your home’s lighting and temperature levels; if you’re having trouble sleeping, you can ask them to snore. Because artificial snoring is apparently a comfort for some people?

They’re especially handy for those with certain physical disabilities. Voice recognition makes a range of household features, ones that might otherwise require assistance to use, much more immediately accessible.

This accessibility does not extend to those of us with dysfluency – those who have speech disabilities, or disabilities that lead to disordered speech. For non-disordered speech, a speech recognition rate of 90-95% is considered satisfactory. With disordered speech, the software will clearly recognise far less. Nearly 50,000 people in New Zealand have a stutter alone, and if you include other speech dysfluencies – or simply not being entirely fluent in English – that’s a huge section of the population who can’t access this technology.

For many people with disordered speech, a voice recognition assistant seems pointless – like a shiny new car for somebody who doesn’t have a driver’s licence. But the tech companies who make them are working to make the interface more accessible for people like me. 

In 2019, Google launched Project Euphonia, which collects voice data from people with impaired speech to remedy the AI bias towards fluency. The idea is that by collecting this data, Google can improve its algorithms, and integrate these updates into their assistant. In the same year, Amazon announced a similar integration with Alexa and Voiceitt, an Israeli startup that lets people with impaired speech train an algorithm to recognise their voice. (I considered using this with my own Alexa, but decided against it, out of pure stubbornness.)

Ironically, the intended purpose of voice recognition software is the exact one I’ve had my entire life: To have what I say be recognised, rather than the way I say it.

My first week with Alexa has been an interesting one. I’ve lived alone for about two months now and I generally don’t speak unless I have visitors over. It might be worth pointing out that I don’t stutter when I talk to myself; I also don’t stutter when I think, or when I sing (that last one would make an incredible story if I had an amazing singing voice, but I do not.)

My Alexa doesn’t care about any of that though. All it hears is my silence as I struggle in vain to get it to play ‘Time to Say Goodbye’ on repeat while I have a shower. My Alexa doesn’t know if I’m having a bad speech day or a good one. All it hears is me saying “Alexa” and then nothing. Alexa also expects perfection. It expects me to hit the “d” on “Play ‘I Like Dat’ by T-Pain and Kehlani”. I know I won’t meet that standard. I know I’ll probably stutter multiple times, and Alexa might pick up on that. 

My stutter has changed as I’ve aged, as has my speech. That’s not uncommon, especially with people who stutter the way I do. We find ways to avoid stuttering, and when one tic stops giving us a backdoor into fluency, we find another one to settle on. 

It took me a long time before I could stop thinking of stuttering as failing at being fluent. It’s not. It’s simply talking in a very different way. I changed my philosophy from “failing is a part of life” to “being different is a part of life”. Both are true, but one is less self-punishing than the other.

If I had an Alexa at a different point in my life, I would probably have thrown it out the window. I would be “failing” constantly in my own home, and I do that enough in public already. But coming to voice recognition in my 30s, when I’ve completely reframed my relationship to my speech, has been a surprisingly chill experience. (Also, I get to pretend I’m a captain on Star Trek, because yes, Alexa will respond to the command “Alexa belay that order!”)

Usually, I hate repeating myself to people, because chances are I’ll stutter a bit more the second time around. I don’t mind repeating myself to Alexa, which I admit is because I’m using it to perform a non-essential function: Nobody ever needed to play T-Pain’s amazing new song featuring Kehlani, and definitely not five times in a row.

Posted on

What Microsoft’s Acquisition of Nuance Could Mean For The Future of Workplace AI

By Zachary Comeau for My Tech Decisions

Microsoft’s recent announcement that it is acquiring healthcare artificial intelligence and voice recognition company Nuance could signal a new era of voice-enabled technologies in the enterprise.

Nuance’s speech recognition technology for medical dictation is currently used in 77% of U.S. hospitals, and Microsoft plans to integrate those technologies with its Microsoft Cloud for Healthcare offering that was introduced last year.

However, the purchase price of $19.7 billion indicates that Microsoft has plans to bring more voice recognition technology to other vertical markets aside from healthcare.

We sat down with Igor Jablokov, founder and CEO of augmented AI company Pryon and an early pioneer of automated cloud platforms for voice recognition that helped invent the technology that led to Amazon’s virtual assistant Alexa, to talk about Microsoft’s move and how intelligent voice technology could impact the workplace.

What do you make of Microsoft’s acquisition of Nuance?

So look, it’s going to be a popular thing to talk about moves in healthcare, especially as we’re still through the throes of this pandemic. And most of us, I’m sure had a challenging 2020. So that’s a great, way to frame the acquisition, given Nuance, some of the medical dictation and other types of projects that they inserted into the healthcare workflow. So, that makes sense. But, would anybody actually pay that much for just something for healthcare? I would imagine Microsoft could have had as big an impact, if not larger, going directly for one of those EHR companies like Epic. So, that’s why, I’m like, “All right, healthcare, that’s good.” , is it going to be a roll up where they will be going after Epic in places like that, where there’s already lots of stored content, and then vertically integrate the whole thing? That’s, that’s the next play that I would see. They’re gunning for to own that workflow. Right. Okay. So that’s that piece. Now. On the other hand I see it as a broader play in employee productivity, because whenever Microsoft really opens up their pocketbooks, like they did here, right, this is, was what their second largest acquisition, it’s typically to reinforce the place where they’re, they’re the strongest than where they’re essentially , dairy cow is, and that’s employee productivity.

Microsoft has never been solely focused on healthcare. Their bread and butter is the enterprise. So how can the same technologies be applied to the enterprise?

You’re exactly right. Now why do we have special knowledge of the Nuance stuff? Well, the team that’s in this company Pryon, actually developed many of the engines inside of Nuance. So many years ago, Nuance felt like their engines were weak, and that IBM’s were ahead of the curve, if you will. I believe around the 2008 downturn, they came in to acquire the majority of IBM SAS speech chats and, and the like, and related AI technologies. And my now current chief technology officer was assigned to that unit project in terms of collaborating with them to integrate it into their, into their work for half a decade. So, that’s the plot twist here. We have a good sense now, these, it is true, that these engines were behind Siri and all these other experiences, but in reality, it wasn’t Nuance engines, it was IBM engines that were acquired through Nuance that ended up getting placed there, because of how highly accurate and more flexible these things were.

So let’s start with something like Microsoft Teams. To continue bolstering Teams with things like live transcriptions, to put a little AI system inside of Teams that has access to the enterprise’s knowledge as people are discussing things – it may not even be any new product, it could just be all the things that Microsoft is doing but they just needed more hands on deck, right in terms of this being a massive acqui-hire in terms of having more scientists and engineers working on applied AI. So I would say a third of it is they need more help with things that they’re already doing. , a third of it is a healthcare play, but I would watch for other moves for their vertical integration there. And then the third is for new capability that that we haven’t experienced yet on the employee productivity side of Microsoft.

Microsoft already has their version of Siri and Alexa: Cortana. What do you think about Cortana and how it can be improved?

They attempted for it to be their thing everywhere. They just pulled it off the shelves – or proverbial shelves – on mobile, so it no longer exists as a consumer tech. So the only place that it lives now is on Windows desktops, right? So that’s not a great entry point. Then they tried doing the mashup, where, Cortana could be called via Alexa and vice versa. But when I talked to the unit folks at Amazon, and I’m like, “Look, you’re, you’re not going to allow them unit to really do what they want to do, right? Because they’re not going to allow you to do what you want to do on those desktops.” So it almost ends up being this weird thing like calling into contact centers and being transferred to another contact center. That’s what it felt like. In this case, Alexa got the drop on them, which is, which is strange and sorrowful in some ways.

Other AI assistants like Alexa are much further along than Cortana, but why aren’t we seeing much adoption in the enterprise?

There’s multiple reasons for that. There’s, there’s the reason of accuracy. And accuracy isn’t just you say something, you get an answer. But where do you get it from? Well, it has to be tied into enterprise data sources, right? Because most enterprises are not like what we have at home, where we buy into the Apple ecosystem, the Amazon ecosystem, the Google ecosystem. They’re heterogeneous environments where they have bits and pieces from every vendor. The next piece is latency and getting quick results that are accurate at scale. And then the last thing is security, right. So there’s certainly things that that Alexa developers do not get access to. And that’s not going to fly in the enterprise space. One of the things that we hear from enterprises, in pilots and in production, said that they’re starting to put in these API’s is starting to be their crown jewels, and the most sensitive things that they got. And, and if you actually read the terms and conditions from a lot of the big tech companies that are leveraging AI stuff, they’re very nebulous with where the information goes, right? Does it get transcribed or not? Are people eyeballing this stuff? Or not? And so most enterprises are like, “Hold on a second, you want us to put our secrets, we make these microchips and you want us to put secrets on M&A deals we’re about to do.?” They’re uncomfortable about that. It’s just a different ball of wax. And that’s why I think it’s going to be purpose-built companies that are going to be developing enterprise API’s.

I think there will be a greater demand for bringing some of these virtual assistants we all know to the enterprise – especially since we’ve been at home for over a year and using them in our home.

Your intuition is spot on. It’s not even so much people coming from home into work environments – it’s a whole generation that has been reared with Alexa and Siri and these  things. When you actually look at the majority of user experiences at work, using Concur or SAP or Dynamics, or Salesforce, or any of these types of systems, and they’re gonna toss grenades at this stuff over time, especially as they elevate in authority through the natural motions of expanding and their influence over their career. I think there’s going to be a new a new generation of enterprise software that’s going to be purpose built for these folks that are going to be taking over business. That’s basically the chink in the armor for any of these traditional enterprise companies. If you think if you look at Oracle, if you look at IBM, if you look at HP, if you look at Dell, if you look at any one of them. I don’t know where they go, at least on the software side. When a kid has grown up with Alexa, and there they are at 26 years old, they’re like, “No, I’m not gonna use that.” Why? Why can I just blurt something out and get an instant answer? But here I am running a region of Baskin Robbins, and I can’t say, “How many ice cream cones did we sell when it was 73 degrees out?” and get an instant answer one second later. So that’s what’s going to happen. I mean, we’re certainly as a company, since our inception, we’ve been architected not for the current world, but for this future world. Already elements of this are in production, as we announced with Georgia Pacific in in late January, and we’re working through it. And I have to say, one of the biggest compliments that I get, whether it’s showing this to big enterprises or government agencies and the like, is fundamentally they’re like, “Holy smokes, this doesn’t feel like anything else that we use. But behind the scenes not only are we using top flight UX folks to develop this, but we’re also working with behavioral scientists and the like, because all that want to use our software not have to use our software. But, most enterprise software gets chosen by the the CIO, the CTO, the CISO, and things like that. And most of them are thinking checking off boxes on functionality. And most enterprise developers cook their blue and white interface, get the fun feature function in there and call it a day. And I think they’re missing such opportunities by not finishing the work.

Posted on

The Current State Of The Healthcare AI Revolution

David Talby for Forbes

Artificial intelligence (AI) is poised to change the healthcare and life sciences industry in ways we couldn’t have imagined only years ago. We’re already seeing it in vaccine development, patient care and research in important fields. From telemedicine to strides in detecting new Covid-19 variants, we’re already living in the age of healthcare AI. But getting to these breakthrough developments starts smaller than that. 

The technologies, tools, triumphs and failures are the less-talked-about aspects of creating accurate, effective and responsible AI solutions, but understanding those parts of the equation is vital to success and progress. The new 2021 Healthcare AI Survey from Gradient Flow, sponsored by my company, aims to do just that: unearth these areas to provide a better overview of where we actually stand when it comes to AI in healthcare.

One of the most telling findings here is the shift of AI technologies that organizations are currently using or plan to implement in 2021. Respondents to the survey said they wanted to have natural language processing (NLP) (36%), data integration (45%), and business intelligence (BI) (33%) as the three most widely applied technologies in their businesses by the close of 2021. These aren’t just lofty goals — they’re backed by money. The 2020 NLP Industry Survey, published by the same group in Fall 2020, reported that more than half of technology leaders — the people overseeing AI investment — have increased the budget allocated to NLP between 2019 to 2020.

Paired with data integration and BI, it’s clear that healthcare systems are getting more serious about the value of unlocking their data — structured and unstructured. NLP, BI and data integration solve some of the biggest problems the healthcare industry faces, from serving as connective tissue between siloed data sources (in electronic health records, free text, imaging and more) to safeguarding personally identifiable information (PII) and making sure it stays private. For highly regulated industries, such as healthcare and pharma, AI-powered technologies like the aforementioned will be critical to operations and safety. 

Another encouraging finding is the criteria most important to healthcare users when evaluating which AI technologies to explore further. The top three criteria for technical leaders when evaluating such technologies and tools were providing extreme accuracy (48%), ensuring no data is shared with their software providers and vendors whatsoever (44%) and having the ability to train and tune the models to match their own datasets and use cases. Privacy, trainability and accuracy are important for any AI solution, but especially when dealing with medical information that can impact the delivery of care. Access to data and ownership of specialized models are also a primary source of intellectual property that AI organizations build.

Accuracy, in particular, is a big topic of interest in clinical applications. Here’s an example of why this is so important: According to a report from the Journal of General Internal Medicine, “Collection of data on race, ethnicity, and language preference is required as part of the ‘meaningful use’ of electronic health records (EHRs). These data serve as a foundation for interventions to reduce health disparities.” The paper found important inaccuracies in what was recorded in EHRs and what patients reported. For example, “30% of whites self-reported identification with at least one other racial or ethnic group than was reflected in the EHR, as did 37% of Hispanics, and 41% of African Americans.” This is a problem when you consider patients from certain backgrounds and ethnicities may have a greater risk for developing certain comorbidities or lack access to appropriate care. This isn’t necessarily an AI problem but a data problem — and data needs to be accurate in order for AI to work its magic.

This emphasis on accuracy also feeds into what technical leaders are looking for when evaluating software libraries or SaaS solutions to fuel their AI initiatives. Per the 2021 Healthcare AI Survey, healthcare-specific models and algorithms (42%) and a production-ready codebase (40%) topped the list when considering a solution. Healthcare-specific models are familiar with the nuances of medical data, from clinical jargon and language to billing codes and other data from nontext entities, such as x-rays. Additionally, production-grade products empower users from data scientists to clinicians to integrate AI technologies into their daily workflows with a reduced risk of problems or inaccuracies — after all, they’ve already been tested and proven and are being updated over time. 

As AI begins to trickle down to use by patients with the advent of chatbots, automated appointment scheduling, or obtaining access to their medical records, it’s important to be aware of both the value and challenges this technology can bring. A chatbot not being able to connect a person to the correct department may not seem like a big deal — unless the patient is experiencing an acute medical event that needs immediate care. The varying levels of severity in medical settings make it obvious why factors like accuracy, healthcare-specific models and production-ready code bases could be the difference not just between a successful AI deployment and a failed one but, in some cases, between life and death.

With the global AI in healthcare market size expected to grow from just under $5 billion in 2020 to $45.2 billion by 2026, the investments and recent use cases for this technology are proof that AI is here to stay. But with many of these cutting-edge technologies still in their infancy and many challenges ahead, the jury is still out on what the next few years hold for AI adoption, key players and clinical advances for the healthcare industry. Thankfully, with research at our fingertips, we’re a bit closer to getting there.

Meanwhile, stay up to date on the latest case studies, innovations and lessons learned — and don’t wait too long to jump in and help build the future.