Posted on

VA to move Nuance’s voice-enabled clinical assistant to the cloud: 5 details

By Katie Adams for Becker’s Hospital Review

The Department of Veterans Affairs is migrating to the cloud platform for Nuance’s automated clinical note-taking system, the health system said Sept. 8.

Five details:

  1. ​​The VA will use the Nuance Dragon Medical One speech recognition cloud platform and Nuance’s mobile microphone app, allowing physicians to use their voices to document patient visits more efficiently. The system is intended to allow physicians to spend more time with patients and less time on administrative work.
  2. The VA deployed Nuance Dragon Medical products systemwide in 2014. It is now upgrading to the system’s cloud offering so its physicians can utilize the added capabilities and mobile flexibility.
  3. To ensure Nuance’s products adhere to the government’s latest guidance on data security and privacy, the Federal Risk and Authorization Management Program approved the VA’s decision to adopt the technologies.
  4. “The combination of our cloud-based platforms, secure application framework and deep experience working with the VA health system made it possible for us to demonstrate our compliance with FedRAMP to meet the needs of the U.S. government. We are proving that meeting security requirements and delivering the outcomes and workflows that matter to clinicians don’t have to be mutually exclusive,” Diana Nole, Nuance’s executive vice president and general manager of healthcare, said in a news release.
  5. Nuance Dragon Medical One is used by more than 550,000 physicians.
Posted on

4 physicians share what they wish they knew going into their career

By Ariana Portalatin for Becker’s Hospital Review

How to choose the right mentor, the importance of continuing education and the different stress factors of healthcare are among the top issues physicians wish they knew about at the beginning of their career.

Here, four physicians share what they wish they knew entering their careers:

Amit Mirchandani, MD. Texas Health Surgery Center Rockwall: The key is to seek out the right mentors for where you want to go. Knowing this early in your career is key. If it is to become a university professor, you will likely have many great choices along the way in your training. If you want to be a private practice physician or an entrepreneur, you’ll have to be proactive about finding your mentors as early as possible. It has been the secret sauce of my career thus far. Our happiness as physicians largely depends on knowing where we want to go with our career and finding the mentors to help us get there.

David Bumpass, MD. University of Arkansas for Medical Sciences (Little Rock): The first year or two of practice can bring a lot of new and somewhat unexpected stress factors — often a new city, new hospital, new staff and different norms than the hospital where a surgeon trained. Simply performing a good surgery is only a part of achieving a good outcome for a patient. Coordinating postoperative care and establishing good patient education and communication are crucial. Also, I did not anticipate the extent that complications can weigh on one’s mind when the “buck stops” with you, the surgeon.

Daniel Gittings, MD. Orthopedic Specialty Institute (Orange, Calif.): Being a great physician also means dedicating yourself to being a lifelong learner. Healthcare is constantly changing how we care for patients and the way we deliver care to patients. We cannot rest on our laurels from medical school, residency and fellowship as we owe it to our patients and our community to stay current with best practices and innovations. The COVID-19 pandemic is just one example of how physicians learned to adapt and change the way we administer healthcare via telemedicine and how we prioritize and triage resource intensive services such as surgery to patients during a crisis.

C. Ann Conn, MD. Advanced Pain Institute (Hammond, La.): I’ve always had the mindset of an early adapter and when I started practicing, I did not understand the significant barriers to payment for new procedures. The atmosphere can make it difficult for patients to access the newest treatments. Because of this, I came to understand that advocacy is critical for our profession and our patients. The decision-makers in government are often unaware of both the issues we face as providers and the specific sufferings of our patients, and, of course, these problems are interconnected. Therefore, it is important that we speak out to improve the situation.

Posted on

There’s Nothing Nuanced About Microsoft’s Plans For Voice Recognition Technology

By Enrique Dans for Forbes

Several media have already reported on Microsoft’s advanced talks over an eventual acquisition of Nuance Communications, a leader in the field of voice recognition, with a long and troubled history of mergers and acquisitions. The deal, which was finally announced on Monday, was estimated to be worth as much as $16 billion, which would make it Microsoft’s second-largest acquisition after LinkedIn in June 2016 for $26.2 billion, but has ended up closing at $19.7 billion, up 23% from the company’s share price on Friday.

After countless mergers and acquisitions, Nuance Communications has ended up nearly monopolizing the market in speech recognition products. It started out as Kurzweil Computer Products, founded by Ray Kurzweil in 1974 to develop character recognition products, and was then acquired by Xerox, which renamed it ScanSoft and subsequently spun it off. ScanSoft was acquired by Visioneer in 1999, but the consolidated company retained the ScanSoft name. In 2001, ScanSoft acquired the Belgian company Lernout & Hauspie, which had previously acquired Dragon Systems, creators of the popular Dragon NaturallySpeaking, to try to compete with Nuance Communications, which had been publicly traded since 1995, in the speech recognition market. Dragon was the absolute leader in speech recognition technology accuracy through the use of Hidden Markov models as a probabilistic method for temporal pattern recognition. Finally, in September 2005, ScanSoft decided to acquire Nuance and take its name.

Since then, the company has grown rapidly through acquisitions, buying as many as 52 companies in the field of speech technologies, in all kinds of industries and markets, creating a conglomerate that has largely monopolized related commercial developments, licensing its technology to all kinds of companies: Apple’s Siri was originally based on Nuance technology — although it is unclear how dependent on the company it remains.

The Microsoft purchase reveals the company’s belief in voice as an interface. The pandemic has seen videoconferencing take off, triggering an explosion in the use of technologies to transcribe voice: Zoom, for example, incorporated automatic transcription in April last year using, so that at the end of each of my classes, I automatically receive not only the video of them, but also their full transcript (which works infinitely better when the class is online than when it takes place in face-to-face mode in a classroom).

Microsoft, which is in the midst of a process of strong growth through acquisitions, had previously collaborated with Nuance in the healthcare industry, and many analysts feel that the acquisition intends to deepen even further into this collaboration. However, Microsoft could also be planning to integrate transcription technology into many other products, such as Teams, or throughout its cloud, Azure, allowing companies to make their corporate environments fully indexable by creating written records of meetings that can be retrieved at a later date. 

Now, Microsoft will try to raise its voice — it has almost twenty billion reasons to do so — and use it to differentiate its products via voice interfaces. According to Microsoft, a pandemic that has pushed electronic and voice communications to the fore is now the stimulus for a future with more voice interfaces, so get ready to see more of that. No company plans a twenty billion dollar acquisition just to keep doing the same things they were doing before.

Posted on

‘Siri Speech Study’ app to improve Apple speech recognition

By Katya Pivcevic for Bio Metric Update

Apple launched a new iOS app this month, ‘Siri Speech Study’, designed to collect speech data and improve the speech recognition capabilities of its virtual assistant Siri, reports TechCrunch.

A product of a research study, the app allows participants who have opted in to share voice requests and other feedback with Apple, though documentation does not elaborate on the study’s specific goals. Apple have meanwhile said that the app is being used for Siri improvements via a focus group-like study using human feedback. For example, if Siri misheard a question, users could explain what they were trying to ask, or if Siri on Apple’s HomePod misidentified the speaker in a multi-person household, the participant could note that as well.

Speech pattern recognition and voice-based computing have proliferated in recent years,  yet virtual assistants can often still misunderstand types of speech.

The tech giant has been involved in several disputes in recent years over the collection and retention of users’ biometric data, in 2019, an Apple whistleblower shed light on Apple’s collection of confidential consumer voice recordings for manual grading and review, originally reported by The Guardian. Other issues have arisen around consent for the collection of such data such as with Biometric Information Privacy Act (BIPA) of Illinois last year.

Siri Speech Study is currently available in the U.S., Canada, Germany, France, Hong Kong, and India among others, however is only accessible via a direct link.

Posted on

How physician pay in the US compares to other countries: 11 findings

By Alan Condon for Becker’s Hospital Review

Physicians in the U.S. on average earn far more than their counterparts in other countries and rank significantly higher in terms of net worth, according to Medscape’s “International Physician Compensation Report.”

The survey, released Aug. 20, includes responses from physicians in the U.S., the United Kingdom, France, Spain, Germany, Italy, Brazil and Mexico. Respondents were all full-time practicing physicians.

Eleven findings:

  1. On average, physicians in the U.S. earned the most ($316,000) per year, followed by Germany ($183,000) and the U.K. ($138,000). Physicians in Mexico earned the least at $12,000.
  2. In terms of net worth, U.S. physicians are significantly ahead of their counterparts in other countries. The average net worth of physicians in the U.S. is $1.7 million, according to the survey. Physicians in the U.K. ranked second with an average net worth of $657,000, and those in Mexico had an average net worth of $67,000.
  3. Fifty-nine percent of U.S. physicians surveyed said that they felt fairly compensated, the highest among countries surveyed. In Germany, 43 percent of physicians feel they are fairly compensated, compared to just 14 percent in Spain.
  4. On average, primary care physicians in the U.S. earn $242,000 annually, the highest of any country surveyed. Second was Germany ($200,000) and last was Mexico ($70,000).
  5. Specialists in the U.S. and Germany earn the most among the countries surveyed. On average, male specialists in the U.S. earn $376,000 per year, while female specialists earn $283,000, compared to $194,000 and $131,000 respectively in Germany.
  6. The U.S. has the lowest specialist pay disparity, with male specialists earning 33 percent more than women. The highest gender pay gap occurs in France, where male specialists earn 63 percent more than women, the survey found.
  7. Mortgages on one’s primary home is the most common debt for physicians in the U.S. (64 percent), the U.K. (67 percent), Spain (49 percent), Germany (40 percent) and Italy (36 percent). At 52 percent, credit card debt is the leading debt among physicians in Mexico, according to the survey.
  8. Of the U.S. physicians surveyed, 39 percent said that they use telemedicine in their practice. Physicians in the U.K. topped the list with 68 percent reporting the use of telemedicine.
  9. Physicians everywhere voiced frustrations about paperwork and administrative burdens. In the U.S., 26 percent of physicians reported spending between one and nine hours a week on administrative tasks, and 19 percent reported dedicating more than 25 hours a week.
  10. If given the option, 78 percent of physicians in the U.S. said they would choose medicine again, third behind physicians in Germany and Mexico, who tied at 79 percent.
  11. Eighty-one percent of physicians in the U.S. said they would choose the same specialty, the highest rate of any country.
Posted on

Voice AI Technology Is More Advanced Than You Might Think

For Annie Brown for Forbes

Systems that can handle repetitive tasks have supported global economies for generations. But systems that can handle conversations and interactions? Those have felt impossible, due to the complexity of human speech. Any of us who regularly use Alexa or Siri can attest to the deficiencies of machine learning in handling human messages. The average person has yet to interact with the next generation of voice AI tools, but what this technology is capable of has the potential to change the world as we know it.

The following is a discussion of three innovative technologies are accelerating the pace of progress in this sector.

Conversational AI for Ordering

Experts in voice AI have prioritized technology that can alleviate menial tasks, freeing humans up to engage in high-impact, creative endeavors. Drive-through ordering was early identified by developers as an area in which conversational AI could make an impact, and one company appears to have cracked the code.

Creating a conversational AI system that can handle drive-through restaurant ordering may sound simple: load in the menu, use chat-based AI, and you’ve done it. The actual solutions aren’t quite so easy. In fact, creating a system that works in an outdoor environment—handling car noises, traffic, other speakers—and one that has sophisticated enough speech recognition to decipher multiple accents, genders, and ages, presents immense challenges.

The co-founders of Hi Auto, Roy Baharav and Eyal Shapira, both have a background in AI systems for audio: Baharav in complex AI systems at Google and Shapira in NLP and chat interfacing.

Baharav describes the difficulties of making a system like this work: “Speech handling in general, for humans, is hard. You talk to your phone and it understands you – that is a completely different problem from understanding speech in an outdoor environment. In a drive-through, people are using unique speech patterns. People are indecisive – they’re changing their minds a lot.”

That latter issue illustrates what they call multi-turn conversation, or the back-and-forth we humans do so effortlessly. After years of practice, model training, and refinement, Hi Auto has now installed their conversational AI systems in drive-throughs around the country, and are seeing a 90% level of accuracy.

Shapira forecasts, “Three years from now, we will probably see as many as 40,000 restaurant locations using conversational AI. It’s going to become a mainstream solution.” 

“AI can address two of the critical problems in quick-serve restaurants,” comments Joe Jensen, a Vice President at Intel Corporation, “Order accuracy which goes straight to consumer satisfaction and then order accuracy also hits on staff costs in reducing that extra time staff spends.” 

Conversation Cloud for Intelligent Machines

A second groundbreaking innovation in the world of conversational AI is using a technique that turns human language into an input.

The CEO of Whitehead AI, Diwank Tomer, illustrates the historical challenges faced by conversational AI: “It turns out that, when we’re talking or writing or conveying anything in human language, we depend on background information a lot. It’s not just general facts about the world but things like how I’m feeling or how well defined something is.

“These are obvious and transparent to us but very difficult for AI to do. That’s why jokes are so difficult for AI to understand. It’s typically something ridiculous or impossible, framed in a way that seems otherwise. For humans, it’s obvious. For AI, not so much. AI only interprets things literally.”

So, how does a system incapable of interpreting nuance, emotion, or making inferences adequately communicate with humans? The same way a non-native speaker initially understands a new language: using context.

Context aware AI is building models that can use extra information, beyond the identity of the speaker or other facts. Chatbots are one area which are inherently lacking, and could benefit from this technology. For instance, if a chatbot could glean contextual information from a user’s profile, previous interactions, and other data points, that could be used to frame highly intelligent responses.

Tomer describes it this way, “We are building an infrastructure for manipulating natural language. Something new that we’ve built is chit chat API – when you say something and it can’t be understood, Alexa will respond with, ‘I’m sorry, I can’t understand that.’ It’s possible now to actually pick up or reply with witty answers.”

Tomer approaches the future of these technologies with high hopes: “Understanding conversation is powerful. Imagine having conversations with any computer: if you’re stuck in an elevator, you could scream and it would call for help. Our senses are extended through technology.”

Data Process Automation

Audio is just one form of unstructured data. When collected, assessed, and interpreted, the output of patterns and trends can be used to make strategic decisions or provide valuable feedback.

super.AI was founded by Brad Cordova. The company uses AI to automate the processing of unstructured data. Data Process Automation, or DPA, can be used to automate repetitive tasks that deal with unstructured data, including audio and video files. 

For example, in a large education company, children use a website to read sentences aloud. super.AI used a process automation application to see how many errors a child made. This automation process has a higher accuracy and faster response time than when done by humans, enabling better feedback for enhanced learning.

Another example has to do with personal information (PI), which is a key point of concern in today’s privacy-conscious world, especially when it comes to AI. super.AI has a system of audio reduction whereby it can remove PI from audio, including name, address, and social security numbers. It can also remove copyrighted material from segments of audio or video, ensuring GDPR or CCPA compliance.

It’s clear that the supportive qualities of super.AI are valuable, but when it comes to the people who currently do everything from quality assurance on website product listings to note taking at a meeting, the question is this: are we going too far to replace humans?

Cordova would say no, “Humans and machines are orthogonal. If you see the best chess players: they aren’t human or machine, they’re humans and machines working together. We know intuitively as humans what we’re put on this earth for. You feel good when you talk with people, feel empathy, and do creative tasks.

“There are a lot of tasks where you don’t feel great: tasks that humans shouldn’t be doing. We want humans to be more human. It’s not about taking humans’ jobs, it’s about allowing humans to operate where we’re best and machines aren’t.”

Voice AI is chartering unprecedented territory and growing at a pace that will inevitably transform markets. The adoption rates for this kind of tech may change most industries as we currently know them. The more AI is integrated, the more humans can benefit from it. As Cordova succinctly states, “AI is the next, and maybe the last technology we will develop as humans.” The capacity of AI to take on new roles in our society has the power to let humans be more human. And that is the best of all possible outcomes.

Posted on

Leveraging AI-powered speech recognition tech to reduce NHS staff burnout

From Open Access Government

The last 18 months have pushed our National Health Service (NHS) to breaking point. Services that were already overstretched and underfunded have been subjected to unprecedented strain on their resources. This strain has now become a national emergency, risking the entire future of the health service, according to a recent government report.

From treating countless Covid-19 cases and supporting vaccination programmes, to providing essential treatment and care, UK healthcare professionals are at maximum capacity and, understandably, struggling to cope. In fact, a recent survey from Nuance revealed that this period has led to dramatic increases in stress and anxiety across primary (75%) and secondary (60%) care within the NHS. When excessively high levels of stress are experienced over a prolonged period, it can result in clinician burnout which, in turn, can lead to many feeling like they have no choice but to leave the medical professional altogether. In England, GP surgeries lost almost 300 full-time medical professionals in the three months prior to Christmas and, by 2023, a shortfall of 7,000 GPs is anticipated, according to recent reports. In addition, it is believed that up to a third of nurses are thinking about leaving their profession due to pandemic-related burnout.  

These individuals enabled and maintained a new front line in the wake of the pandemic. They are also the people that we applauded every week and depended on during the most challenging days. However, the unwavering pressure and heavy workloads are causing significant damage to their own health. An urgent and effective solution is required if the NHS is to continue delivering its life-saving services and care.

The burden of administrative processes

Over the course of the pandemic, the way in which healthcare services are delivered has changed. One of the most significant changes has been a shift towards teleconsultations or virtual appointments. A RCGP investigation of GP appointments discovered that prior to the pandemic as much as 70% of consultations were face-to-face. This diminished to 23% during the first weeks of the crisis.

While some medical professionals and patients are in favour of this new format, for many, the swift switch to a virtual approach has generated an influx of workload, especially when it comes to documentation processes. In fact, Nuance’s research revealed that 67% of primary care respondents believe the pandemic has increased the overall amount of clinical administration. Although there are a few causational factors, such as heavy workloads and time pressure, the transition towards remote consultations appears to be a significant contributor. This is because the risk factor and diagnostic uncertainty of remote consultations are generally higher than face to face appointments. Also, patients that are triaged by telephone often still need a follow-up face to face appointment which is leading to more double handling of patients than happened in the past.

Before the pandemic, clinicians were reportedly spending an average of 11 hours per week on clinical documentation. This figure is only likely to have increased during the pandemic’s peak, when hospitals were at their busiest and remote appointments were most needed. And, we’re not in the clear yet, as the vaccination programme continues to progress and teleconsultation is set to stay. Therefore, moving forward, we need to think about how we can best support our clinical professionals by easing their administrative burden.

AI-powered speech recognition: a step in the right direction

Modern technologies – such as speech recognition solutions – can be leveraged to help reduce some of the administrative pressures being placed on clinical professionals and enable them to work smarter and more effectively. These technologies are designed to recognise and record passages of speech, converting them into detailed clinical notes, regardless of how quickly they’re delivered. By reducing repetition and supporting standardisation across departments, they can also enhance the accuracy as well as the quality of patient records. For example, voice activated clinical note templates can provide a standardised structure to a document or letter, thus meeting the requirements set out by the PRSB (Professional Record Standards Body).

Using secure, cloud-based speech solutions, healthcare professionals are able to benefit from these tools no matter where they are based. The latest technologies provide users with the option to access their single voice profile from different devices and locations, even when signing in from home. This advancement could significantly reduce the administrative burden of virtual consultations, therefore helping to decrease burnout levels amongst NHS staff.

Calderdale and Huddersfield NHS Trust is one of many organisations already benefiting from this technology. The team there leveraged speech recognition as part of a wider objective to help all staff members and patients throughout the Covid-19 crisis. Serving a population of around 470,000 people and employing approximately 6,000 employees, the trust wanted to save time and enable doctors to improve safety, whilst minimising inflection risk. By using this technology on mobile phones. clinicians could instantly update patient records without having to touch shared keyboards. Having experienced the benefits of this solution, the trust is considering leveraging speech recognition to support virtual consultations conducted over MS Teams, in order to enhance the quality of consultations, while alleviating some of the pressures placed upon employees. 

This challenging period has only emphasised how vital the NHS is within the UK. However, the increased workloads and administrative duties brought on by the pandemic are causing higher levels of burnout than ever before. Something needs to change and although technology advancements such as AI-powered speech recognition is now part of the solution there is also a need for public bodies to determine why the administrative burden has continued to rise and perhaps reassess the importance of bureaucratic tasks and where it is essential for information to be recorded.

Posted on

Role of Artificial Intelligence and Machine Learning in Speech Recognition

By The Signal

If you have ever wondered how your smartphone can comprehend instructions like “Call Mom,” “Send a Message to Boss,” “Play the Latest Songs,” “Switch ON the AC,” then you are not alone. But how is this done? The one simple answer is Speech Recognition. Speech Recognition has gone through the roof in the recent 4-5 years and is making our lives more comfortable every day. 

Speech Recognition was first introduced by IBM in 1962 when it unveiled the first machine capable of converting human voice to text. Today, powered by the latest technologies like Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning, speech recognition is touching new milestones. 

This latest technological advancement is being used across the globe by top companies to make their user’s experience efficient and smooth. Technologies like Amazon’s Alexa, Apple’s Siri, Google Assistant, Google Speech, Google Dictate, Facebook’s Oculus VR, and Microsoft’s Cortana are all examples of Speech Recognition. 

The expanding usage of speech-to-text technologies has also opened many new job domains, and students are wonderfully exploiting them. Many students are now joining courses like PGP in AI and Machine Learning after completing their graduation to improve their prospects. The high salary package of around INR 15 lakh for freshers is the 2nd biggest reason attracting students towards this, the biggest reason being the fantastic job role. 

Speech Recognition was a very niche domain before the advent of AI and ML, which has completely transformed it now. Before we understand how AI and ML made changes, let’s understand the nuances of what all these terminologies are. 

Artificial Intelligence 

Artificial Intelligence is the technology by which machines become capable of demonstrating intelligence like humans or animals. Initially, AI was only about memorizing data and producing results accordingly; however, now it is much more than that as machines perform various activities like Speech Recognition, Object Recognition, Translating Texts, and a lot more. 

Another latest addition to AI has been Deep Learning. With the help of Deep Learning, machines can process data and create patterns that help them make valuable decisions. This behavior of a machine through Deep Learning is similar to the behavior of a human brain. Deep Learning activities can be “Supervised,” “Semi-Supervised,” as well as “Unsupervised.” 

Machine Learning 

Machine Learning is a subdomain of AI which teaches machines to memorize past events and activities. Through ML, machines are trained to retain various data sets’ information and outputs and identify patterns in these decisions. It allows the machine to learn by itself without the help of any programming code. 

An example of Machine Learning is the e-Commerce websites suggesting products to you. The code, once written, allows machines to evolve on themselves and analyze user behavior and thus recommend products according to their preferences and past purchases. This involves Zero Human Interference and makes use of approaches like Artificial Neural Networks (ANN). 

Speech Recognition 

Speech Recognition is simply the activity of comprehending a user’s voice and converting that into text. It is chiefly of 3 types: 

  1. Automatic Speech Recognition (ASR) 
  2. Computer Speech Recognition (CSR) 
  3. Speech to Text (STT) 

Note: Speech Recognition and Voice Recognition are two different things. While the former comprehends a voice sample and converts it into a text sample, the sole purpose of the latter is to identify the voice and recognize to whom it belongs. Voice Recognition is often used for security and authenticity purposes. 

How Has AI and ML Affected the Future of Speech Recognition? 

The usage of Speech Recognition in our devices has grown considerably due to the developments in AI and ML technologies. Speech Recognition is now being used for tasks ranging from awakening your appliances and gadgets to monitoring your fitness, playing mood-booster songs, running queries on search engines, and even making phone calls. 

The global market for Speech Recognition, currently growing at a Cumulative Annual Growth Rate (CAGR) of 17.2%, is expected to breach the $25 billion mark by 2025. However, there were enormous challenges initially that have been tackled with the use of AI and ML now. 

When in its initial phase, some of the biggest challenges for Speech Recognition were Poor Voice Recording Devices, Huge Noise in the Voice Samples, Different Pitches in Speech of the Same User, etc. In addition to this, the changing dialects and grammatical factors like Homonyms were also a big challenge. 

With the help of AI programs capable of filtering sound, canceling noise, and identifying the meaning of words depending on the context, most of these challenges have been tackled. Today, Speech Recognition shows an efficiency of 95%, which stood at less than 20% around 30 years back from now. The only biggest challenge remaining now for programmers is making machines capable of understanding emotions and feelings and satisfactory progress in this part. 

The increasing efficiency in Speech Recognition is becoming an essential driving factor in its success, and top tech giants are leveraging these benefits. More than 20% of users searched on Google through Voice in 2016 only, and this number is expected to be far more prominent now. Businesses today are automating their services to make their operations efficient and introducing Speech Recognition facilities at the top of their to-do lists. 

Some of the key usages of Speech Recognition today are listed below. 

  • The most common use of Speech Recognition is to perform basic activities like Searching on Google, Setting Reminders, Scheduling Meetings, Playing Songs, Controlling Synced Devices, etc. 
  • Speech Recognition is now also being used in various financial transactions, with some banks and financial companies offering the feature of “Voice Transfer” to their users. 

Speech Recognition is no doubt one of the best innovations made by expanding technological developments. However, there is one thing to be noted if you are also planning to enter this sector. The domain is inter-mingled, and the mere knowledge provided by a Speech Recognition course won’t be enough for you to survive in this field. 

Therefore, it is essential that you also sharpen your skills in allied concepts like Data Science, Data Analytics, Machine Learning, Artificial Intelligence, Neural Networks, DevOps, and Deep Learning. So what are you waiting for now? Hurry up and join an online course in Speech Recognition now!

Posted on

Alexa and me: A stutterer’s struggle to be heard by voice recognition AI

By Sam Brooks for The Spinoff

The following scenario is not uncommon for me: I have to make a phone call, usually to the bank. They say my call may be recorded to improve customer service in the future (and I can almost certainly guarantee my voice is indeed on file in some call centres for training purposes). I’ll wait, impatiently, in the queue. I’ll listen to whatever banal Kiwi playlist they have piped in.

Then, a call centre employee picks up and goes: “Hello, you’re speaking with [name].” I immediately encounter a block – a gap in my speech. The call centre employee hears silence and, not unfairly, hangs up. I repeat this process until I finally get through. It used to feel humiliating, but at this point in my life, it’s been downgraded to merely frustrating. I don’t blame anyone when it happens, aware that we’re all just doing our best in this situation.

Still, I never thought I’d purposefully replicate that hellish experience in my own home. Which is why when I was sent an Alexa (specifically a fourth generation Echo Dot) last week, I was a little bit stoked, but mostly apprehensive. Not just about all the boring security and data issues, but that it’d be useless to me. Nevertheless, I set up the Alexa and asked it to do something an ideal flatmate would do: play ‘Hung Up’ by Madonna, at the highest audio quality possible.


Alexa’s little blue light lit up, indicating that it was ready to hear, and act on, my command.


I had a block. 

Alexa’s little blue light turned off.

Stutters are like snowflakes: they come in all shapes, sizes and severities. No one stutter is the same. My stutter does not sound like the one Colin Firth faked in The King’s Speech, or like any stutter you might’ve heard onscreen. I don’t repeat myself, but instead have halting stops and interruptions in my speech – it might sound like an intake of breath, or just silence. For listeners, it might feel like half a second. For me, it could feel like a whole minute.

I’m used to having a stutter – life would be truly hell if I wasn’t. None of my friends care, and 95% of the strangers I interact with either don’t notice it or do so with so little issue that I don’t notice it myself. In person, my stutter is easily recognisable. You can see when I’m stuttering because you see me stop talking. My mouth stays open, but no sound comes out. You wait for me to resume talking. It’s a blip, a bump in the conversation.

When I’m communicating solely with my voice, it’s a whole other ballgame. There are no visual cues, I can’t wave my hand or roll my eyes to signal I’m experiencing a block. All I’ve got is the silence.

Voice recognition has become markedly more common in the past decade, with the most popular assistants being Siri (Apple), Alexa (Amazon), Cortana (Microsoft) and Google Now (Google, obvs). At their most basic level, they allow the user access to music, news, weather and traffic reports with only a few words. At their most complex, they allow control over your home’s lighting and temperature levels; if you’re having trouble sleeping, you can ask them to snore. Because artificial snoring is apparently a comfort for some people?

They’re especially handy for those with certain physical disabilities. Voice recognition makes a range of household features, ones that might otherwise require assistance to use, much more immediately accessible.

This accessibility does not extend to those of us with dysfluency – those who have speech disabilities, or disabilities that lead to disordered speech. For non-disordered speech, a speech recognition rate of 90-95% is considered satisfactory. With disordered speech, the software will clearly recognise far less. Nearly 50,000 people in New Zealand have a stutter alone, and if you include other speech dysfluencies – or simply not being entirely fluent in English – that’s a huge section of the population who can’t access this technology.

For many people with disordered speech, a voice recognition assistant seems pointless – like a shiny new car for somebody who doesn’t have a driver’s licence. But the tech companies who make them are working to make the interface more accessible for people like me. 

In 2019, Google launched Project Euphonia, which collects voice data from people with impaired speech to remedy the AI bias towards fluency. The idea is that by collecting this data, Google can improve its algorithms, and integrate these updates into their assistant. In the same year, Amazon announced a similar integration with Alexa and Voiceitt, an Israeli startup that lets people with impaired speech train an algorithm to recognise their voice. (I considered using this with my own Alexa, but decided against it, out of pure stubbornness.)

Ironically, the intended purpose of voice recognition software is the exact one I’ve had my entire life: To have what I say be recognised, rather than the way I say it.

My first week with Alexa has been an interesting one. I’ve lived alone for about two months now and I generally don’t speak unless I have visitors over. It might be worth pointing out that I don’t stutter when I talk to myself; I also don’t stutter when I think, or when I sing (that last one would make an incredible story if I had an amazing singing voice, but I do not.)

My Alexa doesn’t care about any of that though. All it hears is my silence as I struggle in vain to get it to play ‘Time to Say Goodbye’ on repeat while I have a shower. My Alexa doesn’t know if I’m having a bad speech day or a good one. All it hears is me saying “Alexa” and then nothing. Alexa also expects perfection. It expects me to hit the “d” on “Play ‘I Like Dat’ by T-Pain and Kehlani”. I know I won’t meet that standard. I know I’ll probably stutter multiple times, and Alexa might pick up on that. 

My stutter has changed as I’ve aged, as has my speech. That’s not uncommon, especially with people who stutter the way I do. We find ways to avoid stuttering, and when one tic stops giving us a backdoor into fluency, we find another one to settle on. 

It took me a long time before I could stop thinking of stuttering as failing at being fluent. It’s not. It’s simply talking in a very different way. I changed my philosophy from “failing is a part of life” to “being different is a part of life”. Both are true, but one is less self-punishing than the other.

If I had an Alexa at a different point in my life, I would probably have thrown it out the window. I would be “failing” constantly in my own home, and I do that enough in public already. But coming to voice recognition in my 30s, when I’ve completely reframed my relationship to my speech, has been a surprisingly chill experience. (Also, I get to pretend I’m a captain on Star Trek, because yes, Alexa will respond to the command “Alexa belay that order!”)

Usually, I hate repeating myself to people, because chances are I’ll stutter a bit more the second time around. I don’t mind repeating myself to Alexa, which I admit is because I’m using it to perform a non-essential function: Nobody ever needed to play T-Pain’s amazing new song featuring Kehlani, and definitely not five times in a row.

Posted on

Addressing 9 Common Pitfalls in Dictation and Speech Recognition

By Philips Speech Blog

Dictation can be learned quickly and being aware of common pitfalls right from the get go will save you time and avoid unnecessary frustration. We have collected 9 tips to follow which will make dictation and speech recognition a breeze. Keep these in mind and working with your voice will become an indispensable tool in your daily work life.

1. Set your mind to ‘Dictation mode’

One of the most common misconceptions is that dictation and speech recognition works like a normal conversation – just with yourself. But it’s more than that. Approaching a voice recording with the intention in mind that the end result is a text document, helps to get into the right mindset. You can stop a recording when you need to think to avoid long gaps that potentially will only record background noise. Think before you speak and then speak fluently in full sentences. As a next step you can start the addition of saying punctuation out loud and the software will recognize those as ‘commands’. Practice speaking out loud as you would type text and you’ll soon get the hang of it.  

2. Invest in professional recording devices

Especially if you would like to work with speech recognition, it is crucial to use quality recording devices for satisfactory results. Professional dictation recorders, microphones and headsets are designed to achieve ideal voice recording quality. The industry-leading microphone – the Philips SpeechMike Premium series – for example is equipped with a decoupled microphone that allows for interference free recordings. Using your smartphone to dictate is certainly an option for shorter messages, but limitations are met quickly. You will likely need to save your smartphone batteries for phone calls and other functionalities without interrupting your recordings.

3. Eliminate background noise

Besides obvious distractions like conversations, radio, TV, etc., things like chewing a gum or rustling a piece of paper can be very disruptive while recording. Eliminating any possible background noises will not only make for better speech recognition results, but will also make anyone happy who has to listen and/or transcribe your recording.

4. Learn voice commands

Voice commands such as “period”, “new paragraph” or “comma” are very important to provide structure in your texts and will help format your document as you speak. As you learn more of those commands they can also be used for other formatting, for example making words or headlines bold. Voice commands save an enormous amount of time whether you create text with speech recognition and then correct it yourself or someone else types the recording for you.

5. Know the preferred end result

Many people have become familiar with speech-to-text technology in their everyday personal lives. You might be dictating your chat and text messages on your smartphone for a while already. However, in professional context a simple text message is rarely enough. So consider the following: What would you like to do with the recording? Do you need to create a formatted document? Is the text needed elsewhere, like a CRM or document management system? Professional speech-to-text solutions such as Philips SpeechLive offer a variety of options for efficiently processing voice files and text documents.

6. Working as a team matters

You might find a lot of apps that offer some kind of speech-to-text functionality on your smartphone. But what happens next? In our professional lives the vast majority of us are required to collaborate with team members. This is particularly true when it comes to dictation and document creation. Having a solution that allows you to collaborate seamlessly with others brings a lot of efficiencies to the process, allows for workload distribution and proper handling of all files.

7. Try to just talk 

Gestures and facial expressions work well on stage, but they are impractical while dictating. If you gesticulate a lot, the distance between the microphone and your mouth changes constantly which affects the recording quality. For those who need their hands free to express themselves, we suggest using professional headsets, like the Philips SpeechOne wireless headset that lets you talk while you pace around.

8. You speak in a dialect or have a strong accent

Dialects are unique and make our languages colorful and lively. Accents are also common in the increasingly international working world and even desired by many companies. However, when it comes to speech recognition strong accents aren’t software’s best friend. Try your best to pronounce words as clearly and accurately as possible. This might also be a reason why saving the voice recordings with any speech recognized text is so important, so you can re-listen to what was said at any time.

9. Stick with it

A mistake by dictation newbies is that they don’t give themselves time to learn. To get the most out of your dictation and speech recognition solution be patient with yourself and your work processes and approaches will significantly improve with time.