Posted on

How do humans understand speech?

By Aaron Wagner for Penn State News

UNIVERSITY PARK, Pa. — New funding from the National Science Foundation’s Build and Broaden Program will enable a team of researchers from Penn State and North Carolina Agricultural and Technical State University (NC A&T) to explore how speech recognition works while training a new generation of speech scientists at America’s largest historically Black university.

Research has shown that speech-recognition technology performs significantly worse at understanding speech by Black Americans than by white Americans. These systems can be biased, and that bias may be exacerbated by the fact that few Americans of color work in speech-science-related fields.

Understanding how humans understand speech

Navin Viswanathan, associate professor of communication sciences and disorders, will lead the research team at Penn State.

“In this research, we are pursuing a fundamental question,” Viswanathan explained. “How human listeners perceive speech so successfully despite considerable variation across different speakers, speaking rates, listening situations, etc., is not fully understood. Understanding this will provide insight into how human speech works on a fundamental level. On an immediate, practical level, it will enable researchers to improve speech-recognition technology.”

Joseph Stephens, professor of psychology, will lead the research team at NC A&T.

“There are conflicting theories of how speech perception works at a very basic level,” Stephens said. “One of the great strengths of this project is that it brings together investigators from different theoretical perspectives to resolve this conflict with careful experiments.”

According to the research team, speech-recognition technology works in many aspects of people’s lives, but it is not as capable as a human listener at understanding speech, especially when the speech varies from norms established in the software. Speech-recognition technology can be improved using the same mechanisms that humans use, once those mechanisms are understood.

Building and broadening the field of speech science

Increasing diversity in speech science is the other focus of the project. 

“When a field lacks diversity among researchers, it can limit the perspectives and approaches that are used, which can lead to technologies and solutions being limited, as well,” Stephens said. “We will help speech science to become more inclusive by increasing the capacity and involvement of students from groups that are underrepresented in the field.”

The National Science Foundation’s Build and Broaden Program focuses on supporting research, offering training opportunities, and creating greater research infrastructure at minority-serving institutions. New awards for the Build and Broaden Program, which total more than $12 million, support more than 20 minority-serving institutions in 12 states and Washington, D.C. Nearly half of this funding came from the American Rescue Plan Act of 2021. These funds aim to bolster institutions and researchers who were impacted particularly hard by the COVID-19 pandemic.

Build and Broaden is funding this project in part because it will strengthen research capacity in speech science at NC A&T. The project will provide research training for NC A&T students in speech science, foster collaborations between researchers at NC A&T and Penn State, and enhance opportunities for faculty development at NC A&T.

By providing training in speech science at NC A&T, the research team will mentor a more diverse group of future researchers. Increasing the diversity in this field will help to decrease bias in speech-recognition technology and throughout the field.

Viswanathan expressed excitement about developing a meaningful and far-reaching collaboration with NC A&T.

“This project directly creates opportunities for students and faculty from both institutions to work together on questions of common interest,” Viswanathan said. “More broadly, we hope that this will be the first step towards building stronger connections across the two research groups and promoting critical conversations about fundamental issues that underlie the underrepresentation of Black scholars in the field of speech science.”

Ji Min Lee, associate professor of communications sciences and disorders; Anne Olmstead, assistant professor of communications sciences and disorders; Matthew Carlson, associate professor of Spanish and linguistics; Paola “Guili” Dussias, professor of Spanish, linguistics and psychology; Elisabeth Karuza, assistant professor of psychology; and Janet van Hell, professor of psychology and linguistics, will contribute to this project at Penn State. Cassandra Germain, assistant professor of psychology; Deana McQuitty, associate professor of speech communication; and Joy Kennedy, associate professor of speech communication, will contribute to the project at North Carolina Agricultural and Technical State University.

Posted on

The Race to Save Indigenous Languages, Using Automatic Speech Recognition

By Tanner Stening for News@Northeastern

Michael Running Wolf still has that old TI-89 graphing calculator he used in high school that helped propel his interest in technology. 

“Back then, my teachers saw I was really interested in it,” says Running Wolf, clinical instructor of computer science at Northeastern University. “Actually a couple of them printed out hundreds of pages of instructions for me on how to code” the device so that it could play games. 

What Running Wolf, who grew up in a remote Cheyenne village in Birney, Montana, didn’t realize at the time, poring over the stack of printouts at home by the light of kerosene lamps, was that he was actually teaching himself basic programming.

“I thought I was just learning how to put computer games on my calculator,” Running Wolf says with a laugh. 

But it hadn’t been his first encounter with technology. Growing up in the windy plains near the Northern Cheyenne Indian Reservation, Running Wolf says that although his family—which is part Cheyenne, part Lakota—didn’t have daily access to running water or electricity, sometimes, when the winds died down, the power would flicker on, and he’d plug in his Atari console and play games with his sisters. 

These early experiences would spur forward a lifelong interest in computers, artificial intelligence, and software engineering that Running Wolf is now harnessing to help reawaken endangered indigenous languages in North and South America, some of which are so critically at risk of extinction that their tallies of living native speakers have dwindled into the single digits. 

Running Wolf’s goal is to develop methods for documenting and maintaining these early languages through automatic speech recognition software, helping to keep them “alive” and well-documented. It would be a process, he says, that tribal and indigenous communities could use to supplement their own language reclamation efforts, which have intensified in recent years amid the threats facing languages. 

“The grandiose plan, the far-off dream, is we can create technology to not only preserve, but reclaim languages,” says Running Wolf, who teaches computer science at Northeastern’s Vancouver campus. “Preservation isn’t what we want. That’s like taking something and embalming it and putting it in a museum. Languages are living things.”

The better thing to say is that they’ve “gone to sleep,” Running Wolf says. 

And the threats to indigenous languages are real. Of the roughly 6,700 languages spoken in the world, about 40 percent are in danger of atrophying out of existence forever, according to UNESCO Atlas of Languages in Danger. The loss of these languages also represents the loss of whole systems of knowledge unique to a culture, and the ability to transmit that knowledge across generations.

While the situation appears dire—and is, in many cases—Running Wolf says nearly every Native American tribe is engaged in language reclamation efforts. In New England, one notable tribe doing so is the Mashpee Wampanoag Tribe, whose native tongue is now being taught in public schools on Cape Cod, Massachusetts. 

But the problem, he says, is that in the ever-evolving field of computational linguistics, little research has been devoted to Native American languages. This is partially due to a lack of linguistic data, but it is also because many native languages are “polysynthetic,” meaning they contain words that comprise many morphemes, which are the smallest units of meaning in language, Running Wolf says. 

Polysynthetic languages often have very long words—words that can mean an entire sentence, or denote a sentence’s worth of meaning. 

Further complicating the effort is the fact that many Native American languages don’t have an orthography, or an alphabet, he says. In terms of what languages need to keep them afloat, Running Wolf maintains that orthographies are not vital. Many indigenous languages have survived through a strong oral tradition in lieu of a robust written one.

But for scholars looking to build databases and transcription methods, like Running Wolf, written texts are important to filling in the gaps. What’s holding researchers back from building automatic speech recognition for indigenous languages is precisely that there is a lack of audio and textual data available to them.

Using hundreds of hours of audio from various tribes, Running Wolf has managed to produce some rudimentary results. So far, the automatic speech recognition software he and his team have developed can recognize single, simple words from some of the indigenous languages they have data for. 

“Right now, we’re building a corpus of audio and texts to start showing early results,” Running Wolf says. 

Importantly, he says, “I think we have an approach that’s scientifically sound.”

Eventually, Running Wolf says he hopes to create a way for tribes to provide their youth with tools to learn these ancient languages by way of technological immersion—through things like augmented or virtual reality, he says. 

Some of these technologies are already under development by Running Wolf and his team, made up of a linguist, a data scientist, a machine learning engineer, and his wife, who used to be a program manager, among others. All of the ongoing research and development is being done in consultation with numerous tribal communities, Running Wolf says.

“It’s all coming from the people,” he says. “They want to work with us, and we’re doing the best to respect their knowledge systems.”

Posted on

Physician burnout in healthcare: Quo vadis?

By Ifran Khan for Fast Company

Burnout was included as an occupational phenomenon in the International Classification of Diseases (ICD-11) by the World Health Organization in 2019.

Today, burnout is prevalent in the forms of emotional exhaustion, personal, and professional disengagement and a low sense of accomplishment. While cases of physician fatigue continue to rise, some healthcare companies are looking to technology as a driver of efficiency. Could technology pave the way to better working conditions in healthcare?

While advanced technologies like AI cannot solve the issue on their own, data-driven decision-making could alleviate some operational challenges. Based on my experience in the industry, here are some tools and strategies healthcare companies can put into practice to try and reduce physician burnout.

CLINICAL DOCUMENTATION SUPPORT

Clinical decision support (CDS) tools help sift through copious amounts of digital data to catch potential medical problems and alert providers about risky medication interactions. To help reduce fatigue, CDS systems can be used to integrate decision-making aids and channel accurate information on a single platform. For example, they can be used to get the correct information (evidence-based guidance) to the correct people (the care team and patient) through the correct channels (electronic health record and patient portal) in the correct intervention (order sets, flow sheets or dashboards) at the correct points (for workflow-based decision making).

When integrated with electronic health records (EHRs) to merge with existing data sets, CDS systems can automate data collection on vital life signs and alerts to aid physicians in improving patient care and outcomes.

AUTOMATED DICTATION

Companies can use AI-enabled speech recognition solutions to reduce “click fatigue” by interpreting and converting human voice into text. When used by physicians to efficiently translate speech to text, these intelligent assistants can reduce effort and error in documentation workflows.

With the help of speech recognition through AI and machine learning, real-time automated medical transcription software can help alleviate physician workload, ultimately addressing burnout. Data collected from dictation technology can be seamlessly added to patient digital files and built into CDS systems. Acting as a virtual onsite scribe, this ambient technology can capture every word in the physician-patient encounter without taking the physician’s attention off their patient.

MACHINE LEARNING

Resource-poor technologies sometimes used in telehealth often lack the bandwidth to transmit physiological data and medical images — and their constant usage can lead to physician distress.

In radiology, advanced imaging through computer-aided ultrasounds can reduce the need for human intervention. Offering a quantitative assessment through deep analytics and machine learning, AI recognizes complex patterns in data imaging, aiding the physician with the diagnosis.

NATURAL LANGUAGE PROCESSING

Upgrading the digitized medical record system, automating the documentation process, and augmenting the medical transcription are the foremost benefits of natural language processing (NLP)-enabled software. These tools can reduce administrative burdens on physicians by analyzing and extracting unstructured clinical data to document relevant points in a structured manner. That avoids the instance of under-coding and streamlines the way medical coders extract diagnostic and clinical data, enhancing value-based care.

MITIGATING BURNOUT WITH AI

Advanced medical technologies can significantly reduce physician fatigue, but they must be tailored to the implementation environment. That reduces physician-technology friction and makes the adaptation of technology more human-centered.

The nature of a physician’s job may always put them at risk of burnout, but optimal use and consistent management of technology can make a positive impact. In healthcare, seeking technological solutions that reduce the burden of repetitive work—and then mapping the associated benefits and studying the effects on staff well-being and clinician resilience—provides deep insights.

Posted on

Three Ways AI Is Improving Assistive Technology

Wendy Gonzalez for Forbes

Artificial intelligence (AI) and machine learning (ML) are some of the buzziest terms in tech and for a good reason. These innovations have the potential to tackle some of humanity’s biggest obstacles across industries, from medicine to education and sustainability. One sector, in particular, is set to see massive advancement through these new technologies: assistive technology. 

Assistive technology is defined as any product that improves the lives of individuals who otherwise may not be able to complete tasks without specialized equipment, such as wheelchairs and dictation services. Globally, more than 1 billion people depend on assistive technology. When implemented effectively, assistive technology can improve accessibility and quality of life for all, regardless of ability. 

Here are three ways AI is currently improving assistive technology and its use-cases, which might give your company some new ideas for product innovation: 

Ensuring Education For All

Accessibility remains a challenging aspect of education. For children with learning disabilities or sensory impairments, dictation technology, more commonly known as speech-to-text or voice recognition, can help them to write and revise without pen or paper. In fact, 75 out of 149 participants with severe reading disabilities reported increased motivation in their schoolwork after a year of incorporating assistive technology.

This technology works best when powered by high-quality AI. Natural Language Processing (NLP) and machine learning algorithms have the capability to improve the accuracy of speech recognition and word predictability, which can minimize dictation errors while facilitating effective communication from student to teacher or among collaborating schoolmates. 

That said, according to a 2001 study, only 35% of elementary schools — arguably the most significant portion of education a child receives — provide any assistive technology. This statistic could change due to social impact of AI programs. These include Microsoft’s AI for Accessibility initiative, which invests in innovations that support people with neuro-diversities and disabilities. Its projects include educational AI applications that provide students with visual impairments the text-to-speech, speech recognition and object recognition tools they need to succeed in the classroom.  

Better Outcomes For Medical Technology

With a rapidly aging population estimated to top approximately 2 billion over the age of 60 by 2050, our plans to care for our loved ones could rely heavily on AI and ML in the future. Doctors and entrepreneurs are already paving the way; in the past decade alone, medical AI investments topped $8.5 billion in venture capital funding for the top 50 startups. 

Robot-assisted surgery is just one AI-powered innovation. In 2018, robot-assisted procedures accounted for 15.1% of all general surgeries, and this percentage is expected to rise as surgeons implement additional AI-driven surgical applications in operating rooms. When compared to traditional open surgery, robot-assisted surgeons tend to leave smaller incisions, which reduces overall pain and scarring, thereby leading to quicker recovery times.

AI-powered wearable medical devices, such as female fertility-cycle trackers, are another popular choice. Demand for products including diabetic-tracking sweat meters and respiratory patients’ oximeters have created a market that’s looking at a 23% CAGR by 2023. 

What’s more, data taken from medical devices could contribute to more than $7 billion in savings per year for the U.S. healthcare market. This data improves doctors’ understanding of preventative care and better informs post-recovery methods when patients leave after hospital procedures.

Unlocking Possibilities In Transportation And Navigation

Accessible mobility is another challenge that assistive technology can help tackle. Through AI-powered running apps and suitcases that can navigate through entire airports, assistive technology is changing how we move and travel. One example is Project Guideline, a Google project helping individuals who are visually impaired navigate their way through roads and paths with an app that combines computer vision and a machine-learning algorithm to aid the runner alongside a pre-designed path. 

Future runners and walkers may one day navigate roads and sidewalks unaccompanied by guide dogs or sighted guides, gaining autonomy and confidence while accomplishing everyday tasks and activities without hindrance. For instance, developed and spearheaded by Chieko Asakawa, a Carnegie Mellon Professor who is blind, CaBot is a navigation robot that uses sensor information to help avoid airport obstacles, alert someone to nearby stores and assist with required actions like standing in line at airport security checkpoints. 

The Enhancement Of Assistive Technology

These are just some of the ways that AI assistive technology can transform the way individuals and society move and live. To ensure assistive technologies are actively benefiting individuals with disabilities, companies must also maintain accurate and diverse data sets with annotation that is provided by well-trained and experienced AI teams. These ever-updating data sets need to be continually tested before, during and after implementation.

AI possesses the potential to power missions for the greater good of society. Ethical AI can transform the ways assistive technologies improve the lives of millions in need. What other types of AI-powered assistive technology have you come across and how could your company make moves to enter this industry effectively? 

Posted on

Talking it through: speech recognition takes the strain of digital transformation

By Nuance for Healthcare IT

HITN: COVID-19 has further exposed employee stress and burnout as major challenges for healthcare. Tell us how we can stop digital transformation technologies from simply adding to them.

Wallace: By making sure that they are adopted for the right reasons – meeting clinician’s needs without adding more stress or time pressures to already hectic workflows. For example, Covid-19 being a new disease meant that clinicians had to document their findings in detail and quickly without the process slowing them down – often while wearing PPE. I think speech recognition technology has been helpful in this respect, not just because of speed but also because it allows the clinician time to provide more quality clinical detail in the content of a note.

In a recent HIMSS/Nuance survey, 82% of doctors and 73% of nurses felt that clinical documentation contributed significantly to healthcare professional overload. It has been estimated that clinicians spend around 11 hours a week creating clinical documentation, and up to two thirds of that can be narrative.

HITN: How do you think speech recognition technology can be adapted into clinical tasks and workflow to help lower workload and stress levels?

Wallace: One solution is cloud-based AI-powered speech recognition: instead of either typing in the EPR or EHR or dictating a letter for transcription, clinicians can use their voice and see the text appear in real time on the screen. Using your voice is a more natural and efficient way to capture the complete patient story. It can also speed up navigation in the EPR system, helping to avoid multiple clicks and scrolling. The entire care team can benefit – not just in acute hospitals but across primary and community care and mental health services.

HITN: Can you give some examples where speech recognition has helped to reduce the pressure on clinicians?

Wallace: In hospitals where clinicians have created their outpatient letters using speech recognition, reduction in turnaround times from several weeks down to two or three days have been achieved across a wide range of clinical specialties. In some cases where no lab results are involved, patients can now leave the clinic with their completed outpatient letter.

In the Emergency Department setting, an independent study found that speech recognition was 40% faster than typing notes and has now become the preferred method for capturing ED records. The average time saving in documenting care is around 3.5 mins per patient – in this particular hospital, that is equivalent to 389 days a year, or two full-time ED doctors!

HITN: How do you see the future panning out for clinicians in the documentation space when it comes to automation and AI technologies?

Wallace: I think we are looking at what we call the Clinic Room of the Future, built around conversational intelligence. No more typing for the clinician, no more clicks, no more back turned to the patient hunched over a computer.

The desktop computer is replaced by a smart device with microphones and movement sensors. Voice biometrics allow the clinician to sign in to the EPR verbally and securely (My Voice is my Password), with a virtual assistant responding to voice commands. The technology recognises non-verbal cues – for example, when a patient points to her left knee but only actually states it is her knee. The conversation between the patient and the clinician is fully diarised, while in the background, Natural Language Processing (using Nuance’s Clinical Language Understanding engine) is working to create a structured clinical note that summarises the consultation, and codes the clinical terms eg. with SNOMED CT.

No more typing for the clinician, no more clicks, no more back turned to the patient hunched over a computer, resulting in a more professional and interactive clinician/patient consultation. 

Healthcare IT News spoke to Dr Simon Wallace, CCIO of Nuance’s healthcare division, as part of the ‘Summer Conversations’ series.

Posted on

VA to move Nuance’s voice-enabled clinical assistant to the cloud: 5 details

By Katie Adams for Becker’s Hospital Review

The Department of Veterans Affairs is migrating to the cloud platform for Nuance’s automated clinical note-taking system, the health system said Sept. 8.

Five details:

  1. ​​The VA will use the Nuance Dragon Medical One speech recognition cloud platform and Nuance’s mobile microphone app, allowing physicians to use their voices to document patient visits more efficiently. The system is intended to allow physicians to spend more time with patients and less time on administrative work.
  2. The VA deployed Nuance Dragon Medical products systemwide in 2014. It is now upgrading to the system’s cloud offering so its physicians can utilize the added capabilities and mobile flexibility.
  3. To ensure Nuance’s products adhere to the government’s latest guidance on data security and privacy, the Federal Risk and Authorization Management Program approved the VA’s decision to adopt the technologies.
  4. “The combination of our cloud-based platforms, secure application framework and deep experience working with the VA health system made it possible for us to demonstrate our compliance with FedRAMP to meet the needs of the U.S. government. We are proving that meeting security requirements and delivering the outcomes and workflows that matter to clinicians don’t have to be mutually exclusive,” Diana Nole, Nuance’s executive vice president and general manager of healthcare, said in a news release.
  5. Nuance Dragon Medical One is used by more than 550,000 physicians.
Posted on

There’s Nothing Nuanced About Microsoft’s Plans For Voice Recognition Technology

By Enrique Dans for Forbes

Several media have already reported on Microsoft’s advanced talks over an eventual acquisition of Nuance Communications, a leader in the field of voice recognition, with a long and troubled history of mergers and acquisitions. The deal, which was finally announced on Monday, was estimated to be worth as much as $16 billion, which would make it Microsoft’s second-largest acquisition after LinkedIn in June 2016 for $26.2 billion, but has ended up closing at $19.7 billion, up 23% from the company’s share price on Friday.

After countless mergers and acquisitions, Nuance Communications has ended up nearly monopolizing the market in speech recognition products. It started out as Kurzweil Computer Products, founded by Ray Kurzweil in 1974 to develop character recognition products, and was then acquired by Xerox, which renamed it ScanSoft and subsequently spun it off. ScanSoft was acquired by Visioneer in 1999, but the consolidated company retained the ScanSoft name. In 2001, ScanSoft acquired the Belgian company Lernout & Hauspie, which had previously acquired Dragon Systems, creators of the popular Dragon NaturallySpeaking, to try to compete with Nuance Communications, which had been publicly traded since 1995, in the speech recognition market. Dragon was the absolute leader in speech recognition technology accuracy through the use of Hidden Markov models as a probabilistic method for temporal pattern recognition. Finally, in September 2005, ScanSoft decided to acquire Nuance and take its name.

Since then, the company has grown rapidly through acquisitions, buying as many as 52 companies in the field of speech technologies, in all kinds of industries and markets, creating a conglomerate that has largely monopolized related commercial developments, licensing its technology to all kinds of companies: Apple’s Siri was originally based on Nuance technology — although it is unclear how dependent on the company it remains.

The Microsoft purchase reveals the company’s belief in voice as an interface. The pandemic has seen videoconferencing take off, triggering an explosion in the use of technologies to transcribe voice: Zoom, for example, incorporated automatic transcription in April last year using Otter.ai, so that at the end of each of my classes, I automatically receive not only the video of them, but also their full transcript (which works infinitely better when the class is online than when it takes place in face-to-face mode in a classroom).

Microsoft, which is in the midst of a process of strong growth through acquisitions, had previously collaborated with Nuance in the healthcare industry, and many analysts feel that the acquisition intends to deepen even further into this collaboration. However, Microsoft could also be planning to integrate transcription technology into many other products, such as Teams, or throughout its cloud, Azure, allowing companies to make their corporate environments fully indexable by creating written records of meetings that can be retrieved at a later date. 

Now, Microsoft will try to raise its voice — it has almost twenty billion reasons to do so — and use it to differentiate its products via voice interfaces. According to Microsoft, a pandemic that has pushed electronic and voice communications to the fore is now the stimulus for a future with more voice interfaces, so get ready to see more of that. No company plans a twenty billion dollar acquisition just to keep doing the same things they were doing before.

Posted on

‘Siri Speech Study’ app to improve Apple speech recognition

By Katya Pivcevic for Bio Metric Update

Apple launched a new iOS app this month, ‘Siri Speech Study’, designed to collect speech data and improve the speech recognition capabilities of its virtual assistant Siri, reports TechCrunch.

A product of a research study, the app allows participants who have opted in to share voice requests and other feedback with Apple, though documentation does not elaborate on the study’s specific goals. Apple have meanwhile said that the app is being used for Siri improvements via a focus group-like study using human feedback. For example, if Siri misheard a question, users could explain what they were trying to ask, or if Siri on Apple’s HomePod misidentified the speaker in a multi-person household, the participant could note that as well.

Speech pattern recognition and voice-based computing have proliferated in recent years,  yet virtual assistants can often still misunderstand types of speech.

The tech giant has been involved in several disputes in recent years over the collection and retention of users’ biometric data, in 2019, an Apple whistleblower shed light on Apple’s collection of confidential consumer voice recordings for manual grading and review, originally reported by The Guardian. Other issues have arisen around consent for the collection of such data such as with Biometric Information Privacy Act (BIPA) of Illinois last year.

Siri Speech Study is currently available in the U.S., Canada, Germany, France, Hong Kong, and India among others, however is only accessible via a direct link.

Posted on

Leveraging AI-powered speech recognition tech to reduce NHS staff burnout

From Open Access Government

The last 18 months have pushed our National Health Service (NHS) to breaking point. Services that were already overstretched and underfunded have been subjected to unprecedented strain on their resources. This strain has now become a national emergency, risking the entire future of the health service, according to a recent government report.

From treating countless Covid-19 cases and supporting vaccination programmes, to providing essential treatment and care, UK healthcare professionals are at maximum capacity and, understandably, struggling to cope. In fact, a recent survey from Nuance revealed that this period has led to dramatic increases in stress and anxiety across primary (75%) and secondary (60%) care within the NHS. When excessively high levels of stress are experienced over a prolonged period, it can result in clinician burnout which, in turn, can lead to many feeling like they have no choice but to leave the medical professional altogether. In England, GP surgeries lost almost 300 full-time medical professionals in the three months prior to Christmas and, by 2023, a shortfall of 7,000 GPs is anticipated, according to recent reports. In addition, it is believed that up to a third of nurses are thinking about leaving their profession due to pandemic-related burnout.  

These individuals enabled and maintained a new front line in the wake of the pandemic. They are also the people that we applauded every week and depended on during the most challenging days. However, the unwavering pressure and heavy workloads are causing significant damage to their own health. An urgent and effective solution is required if the NHS is to continue delivering its life-saving services and care.

The burden of administrative processes

Over the course of the pandemic, the way in which healthcare services are delivered has changed. One of the most significant changes has been a shift towards teleconsultations or virtual appointments. A RCGP investigation of GP appointments discovered that prior to the pandemic as much as 70% of consultations were face-to-face. This diminished to 23% during the first weeks of the crisis.

While some medical professionals and patients are in favour of this new format, for many, the swift switch to a virtual approach has generated an influx of workload, especially when it comes to documentation processes. In fact, Nuance’s research revealed that 67% of primary care respondents believe the pandemic has increased the overall amount of clinical administration. Although there are a few causational factors, such as heavy workloads and time pressure, the transition towards remote consultations appears to be a significant contributor. This is because the risk factor and diagnostic uncertainty of remote consultations are generally higher than face to face appointments. Also, patients that are triaged by telephone often still need a follow-up face to face appointment which is leading to more double handling of patients than happened in the past.

Before the pandemic, clinicians were reportedly spending an average of 11 hours per week on clinical documentation. This figure is only likely to have increased during the pandemic’s peak, when hospitals were at their busiest and remote appointments were most needed. And, we’re not in the clear yet, as the vaccination programme continues to progress and teleconsultation is set to stay. Therefore, moving forward, we need to think about how we can best support our clinical professionals by easing their administrative burden.

AI-powered speech recognition: a step in the right direction

Modern technologies – such as speech recognition solutions – can be leveraged to help reduce some of the administrative pressures being placed on clinical professionals and enable them to work smarter and more effectively. These technologies are designed to recognise and record passages of speech, converting them into detailed clinical notes, regardless of how quickly they’re delivered. By reducing repetition and supporting standardisation across departments, they can also enhance the accuracy as well as the quality of patient records. For example, voice activated clinical note templates can provide a standardised structure to a document or letter, thus meeting the requirements set out by the PRSB (Professional Record Standards Body).

Using secure, cloud-based speech solutions, healthcare professionals are able to benefit from these tools no matter where they are based. The latest technologies provide users with the option to access their single voice profile from different devices and locations, even when signing in from home. This advancement could significantly reduce the administrative burden of virtual consultations, therefore helping to decrease burnout levels amongst NHS staff.

Calderdale and Huddersfield NHS Trust is one of many organisations already benefiting from this technology. The team there leveraged speech recognition as part of a wider objective to help all staff members and patients throughout the Covid-19 crisis. Serving a population of around 470,000 people and employing approximately 6,000 employees, the trust wanted to save time and enable doctors to improve safety, whilst minimising inflection risk. By using this technology on mobile phones. clinicians could instantly update patient records without having to touch shared keyboards. Having experienced the benefits of this solution, the trust is considering leveraging speech recognition to support virtual consultations conducted over MS Teams, in order to enhance the quality of consultations, while alleviating some of the pressures placed upon employees. 

This challenging period has only emphasised how vital the NHS is within the UK. However, the increased workloads and administrative duties brought on by the pandemic are causing higher levels of burnout than ever before. Something needs to change and although technology advancements such as AI-powered speech recognition is now part of the solution there is also a need for public bodies to determine why the administrative burden has continued to rise and perhaps reassess the importance of bureaucratic tasks and where it is essential for information to be recorded.

Posted on

Role of Artificial Intelligence and Machine Learning in Speech Recognition

By The Signal

If you have ever wondered how your smartphone can comprehend instructions like “Call Mom,” “Send a Message to Boss,” “Play the Latest Songs,” “Switch ON the AC,” then you are not alone. But how is this done? The one simple answer is Speech Recognition. Speech Recognition has gone through the roof in the recent 4-5 years and is making our lives more comfortable every day. 

Speech Recognition was first introduced by IBM in 1962 when it unveiled the first machine capable of converting human voice to text. Today, powered by the latest technologies like Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning, speech recognition is touching new milestones. 

This latest technological advancement is being used across the globe by top companies to make their user’s experience efficient and smooth. Technologies like Amazon’s Alexa, Apple’s Siri, Google Assistant, Google Speech, Google Dictate, Facebook’s Oculus VR, and Microsoft’s Cortana are all examples of Speech Recognition. 

The expanding usage of speech-to-text technologies has also opened many new job domains, and students are wonderfully exploiting them. Many students are now joining courses like PGP in AI and Machine Learning after completing their graduation to improve their prospects. The high salary package of around INR 15 lakh for freshers is the 2nd biggest reason attracting students towards this, the biggest reason being the fantastic job role. 

Speech Recognition was a very niche domain before the advent of AI and ML, which has completely transformed it now. Before we understand how AI and ML made changes, let’s understand the nuances of what all these terminologies are. 

Artificial Intelligence 

Artificial Intelligence is the technology by which machines become capable of demonstrating intelligence like humans or animals. Initially, AI was only about memorizing data and producing results accordingly; however, now it is much more than that as machines perform various activities like Speech Recognition, Object Recognition, Translating Texts, and a lot more. 

Another latest addition to AI has been Deep Learning. With the help of Deep Learning, machines can process data and create patterns that help them make valuable decisions. This behavior of a machine through Deep Learning is similar to the behavior of a human brain. Deep Learning activities can be “Supervised,” “Semi-Supervised,” as well as “Unsupervised.” 

Machine Learning 

Machine Learning is a subdomain of AI which teaches machines to memorize past events and activities. Through ML, machines are trained to retain various data sets’ information and outputs and identify patterns in these decisions. It allows the machine to learn by itself without the help of any programming code. 

An example of Machine Learning is the e-Commerce websites suggesting products to you. The code, once written, allows machines to evolve on themselves and analyze user behavior and thus recommend products according to their preferences and past purchases. This involves Zero Human Interference and makes use of approaches like Artificial Neural Networks (ANN). 

Speech Recognition 

Speech Recognition is simply the activity of comprehending a user’s voice and converting that into text. It is chiefly of 3 types: 

  1. Automatic Speech Recognition (ASR) 
  2. Computer Speech Recognition (CSR) 
  3. Speech to Text (STT) 

Note: Speech Recognition and Voice Recognition are two different things. While the former comprehends a voice sample and converts it into a text sample, the sole purpose of the latter is to identify the voice and recognize to whom it belongs. Voice Recognition is often used for security and authenticity purposes. 

How Has AI and ML Affected the Future of Speech Recognition? 

The usage of Speech Recognition in our devices has grown considerably due to the developments in AI and ML technologies. Speech Recognition is now being used for tasks ranging from awakening your appliances and gadgets to monitoring your fitness, playing mood-booster songs, running queries on search engines, and even making phone calls. 

The global market for Speech Recognition, currently growing at a Cumulative Annual Growth Rate (CAGR) of 17.2%, is expected to breach the $25 billion mark by 2025. However, there were enormous challenges initially that have been tackled with the use of AI and ML now. 

When in its initial phase, some of the biggest challenges for Speech Recognition were Poor Voice Recording Devices, Huge Noise in the Voice Samples, Different Pitches in Speech of the Same User, etc. In addition to this, the changing dialects and grammatical factors like Homonyms were also a big challenge. 

With the help of AI programs capable of filtering sound, canceling noise, and identifying the meaning of words depending on the context, most of these challenges have been tackled. Today, Speech Recognition shows an efficiency of 95%, which stood at less than 20% around 30 years back from now. The only biggest challenge remaining now for programmers is making machines capable of understanding emotions and feelings and satisfactory progress in this part. 

The increasing efficiency in Speech Recognition is becoming an essential driving factor in its success, and top tech giants are leveraging these benefits. More than 20% of users searched on Google through Voice in 2016 only, and this number is expected to be far more prominent now. Businesses today are automating their services to make their operations efficient and introducing Speech Recognition facilities at the top of their to-do lists. 

Some of the key usages of Speech Recognition today are listed below. 

  • The most common use of Speech Recognition is to perform basic activities like Searching on Google, Setting Reminders, Scheduling Meetings, Playing Songs, Controlling Synced Devices, etc. 
  • Speech Recognition is now also being used in various financial transactions, with some banks and financial companies offering the feature of “Voice Transfer” to their users. 

Speech Recognition is no doubt one of the best innovations made by expanding technological developments. However, there is one thing to be noted if you are also planning to enter this sector. The domain is inter-mingled, and the mere knowledge provided by a Speech Recognition course won’t be enough for you to survive in this field. 

Therefore, it is essential that you also sharpen your skills in allied concepts like Data Science, Data Analytics, Machine Learning, Artificial Intelligence, Neural Networks, DevOps, and Deep Learning. So what are you waiting for now? Hurry up and join an online course in Speech Recognition now!