Posted on

AI outperforms humans in speech recognition

By Monika Landgraf, Karlsruhe Institute of Technology for Tech Xplore

Following a conversation and transcribing it precisely is one of the biggest challenges in artificial intelligence (AI) research. For the first time now, researchers of Karlsruhe Institute of Technology (KIT) have succeeded in developing a computer system that outperforms humans in recognizing such spontaneously spoken language with minimum latency. This is reported on arXiv.org.

“When people talk to each other, there are stops, stutterings, hesitations, such as ‘er’ or ‘hmmm,’ laughs and coughs,” says Alex Waibel, Professor for Informatics at KIT. “Often, words are pronounced unclearly.” This makes it difficult even for people to make accurate notes of a conversation. “And so far, this has been even more difficult for AI.” KIT scientists and staff of KITES, a start-up company from KIT, have now programmed a computer system that executes this task better than humans and quicker than other systems.

Waibel already developed an automatic live translator that directly translates university lectures from German or English into the languages spoken by foreign students. This “Lecture Translator” has been used in the lecture halls of KIT since 2012. “Recognition of spontaneous speech is the most important component of this system,” Waibel explains, “as errors and delays in recognition make the translation incomprehensible. On conversational speech, the human error rate amounts to about 5.5%. Our system now reaches 5.0%.” Apart from precision, however, the speed of the system to produce output is just as important so students can follow the lecture live. The researchers have now succeeded in reducing this latency to one second. This is the smallest reported latency reached by a speech recognition system of this quality to date, says Waibel.

Error rate and latency are measured using the standardized and internationally recognized, scientific “switchboard-benchmark” test. This benchmark (defined by US NIST) is widely used by international AI researchers in their competition to build a machine that comes close to humans in recognizing spontaneous speech under comparable conditions, or even outperforming them.

According to Waibel, fast, high accuracy speech recognition is an essential step for further downstream processing. It enables dialog, translation, and other AI modules to provide better voice based interaction with machines.

Posted on

7 Key Predictions for the Future Of Voice Assistants and AI

By Hardik Shah for Unite.AI

We live in an exciting time, especially in innovation, progress, technology, and commerce. Some of the latest tech inventions, such as artificial intelligence and machine learning, are making a tremendous impact on every industry.

Voice assistants powered by AI have already transformed the eCommerce trend. The eCommerce giants such as Amazon continue to fuel this trend as they compete for the market share. Voice interfacing is advancing at an exceptional rate in healthcare and banking industries to keep pace with consumers’ demands.

Reason to shift towards voice assistants and AI

The crux of the shift towards voice user interfaces is to change the users’ demands. Increased awareness and a higher level of comfort are explained specifically by millennial consumers. In this ever-changing digital world where speed, convenience, and efficiency are continually being optimized.

The adoption of AI in users’ lives is fueling the shift towards voice applications. Various IoT devices such as smart appliances, thermostats, and speakers give voice assistants more utility in a connected user’s life. You can see the usage of voice assistants in smart speakers; however, it only starts there.

According to the report, the voice recognition tech market reached close to 11 billion U.S. dollars in the year 2019 and is expected to increase by approximately 17% by 2025.

Voice apps in this technology are seen everywhere.

Here are 7 Key Predictions of Voice Assistants and AI

1. Streamlined Conversations

Both technology giants, Amazon and Google, announced that their assistants would no longer require the use of repeated “wake” words. Earlier, both assistants were dependent on a wake word such as Ok, Google, or Alexa to initiate a conversation line. For instance, one has to ask, “Alexa! What’s the current weather?” Then, for the next command, “Alexa” again before requesting that the voice assistant set the hallway thermostat to 25 degrees.”

With this new facility, it’s more convenient and natural for users. Simply put, you need to say “current weather condition” without requiring the wake word again.

Now users use voice assistants in particular locations, while multitasking and it can either be alone or amongst a group of people. Having devices that can use these contextual factors make a conversation more convenient and efficient. Still, it also shows that developers behind the technology are aiming to provide a more user-centric experience.

2. Voice Push Notifications

Push notifications are about sending a reminder when the due date is near for a task. It is a useful tool to engage users within the application. Since the onset of voice technology, this feature has got a unique touch. Voice push notifications are an effective way of increasing user engagement and retention and reminding users of the app and displaying relevant messages.

Both Google and Alexa allow users to get notifications compatible with third-party apps. Such notifications are related to calendar appointments from the main features.

3. Search Behaviors will change

Voice search or voice-enabled search allows users to use a voice command to search the intent, website, or an app.

  • Voice-based shopping is expected to rise to $40 billion in 2022.
  • The market share means that consumers’ spending via voice assistants is expected to reach 18% by 2022.
  • According to Juniper Research, voice-based ad revenues will reach $19 billion by 2022.

After going through all these statistics, it’s clear that there’s an unprecedented rise in mobile devices’ voice searches. Brands have transformed from touchpoints into listening points. Organic searches are the primary way for brands to gain visibility.

Therefore, by seeing the popularity and adaptability of voice search, marketers and advertising agencies are expecting more from voice goliaths like Amazon and Google, to update their platforms for paid messaging.

4. Personalized Experiences

The voice-enabled devices and digital assistants such as Google Home and Amazon’s Alexa enable customers to engage through the most natural communication form. The data shows that the online sales of voice-enabled devices grew 39% year-over-year.

Voice assistants provide more personalized experiences while improving the difference between voices. Google Home supports six user accounts and quickly identifies unique voices.

Customers may put queries like “What restaurant has the best lunch in LA?” or “What restaurants are open for lunch now?” The voice assistants are smart enough to specify every detail, like schedule appointments for any individual users. Assistants are enough to memorize even the nicknames, locations, payment information, and other information.

5. Security will be a focus.

According to the report, about 41% of voice devices users are concerned about trust and confidentiality. All you need to make it more secure and convenient for customers to make purchases, Amazon and Google have introduced some security measures such as speaker verification and ID to use voice assistant experience.

Furthermore, if a user books an appointment, reserves a restaurant, finds local cinema time, schedules an Uber, payment concerns, sensitive information, it has become more critical to make it more secure and convenient for customers to make purchases. Speaker verification and ID both are paramount as part of the voice assistant experience.

6. Touch Interaction

A new feature that has been launched by Google in CES2019, which is all about integrating the finest voice and visual display into a seamless experience. It has been illustrated as a link screen. The display can show details like local traffic information or events and weather information. This functionality adds visual and voice capabilities together, which allows users to communicate more with the helper.

7. Compatibility & Integration

Amazon is at the forefront when it’s about integrating voice technology with other products. People who know Alexa must be familiar with the fact that the voice assistants are already integrated into a vast array of products such as Samsung’s Family Hub refrigerator. Google also has announced its product named Google Assistant Connect. The underlying idea behind this is for manufacturers to create custom devices that serve specific functions and are integrated with the Assistant.

Speaking about the advancement of voice app development, the global voice eCommerce industry is expected to be worth $40 billion by 2020, according to the report. Not only this, in 2020, it’s expected to see a greater interest in the development of voice-enabled devices. It increases in mid-level devices that have some assistant functionality but are not full-blown smart speakers.

The takeaway

You have gone through a few key predictions of the future of voice assistants and AI. However, several barriers that need to be overcome before voice-enabled devices see mass adoption. The advancement in technology paves the future for voice app development, specifically in AI, NLP (Natural Language Processing), and machine learning.

The AI behind it has to handle better challenges like background noise and accents to build a robust speech recognition experience. Undeniably, consumers have become increasingly comfortable and reliant upon using voice to talk to their phones, smart home devices, vehicles, etc. Voice technology has become a key interface to the digital world, and voice interface design and voice app development will be more significant in demand.

Posted on

Top 10 Speech Recognition Companies to Watch in 2020

By Adalin Beatrice for Analytics Insight

Voice recognition market is estimated to reach US$31.82 billion by 2025

Technology is invading in every sector. New inventions, innovation and devices are making life easier for everyone. Voice recognition technology is one such amazing initiative to look for in the growing innovation era.

Voice recognition also known as speech recognition, is a computer software program or a hardware device with the ability to receive, interpreting and understanding voice and carry out commands. The technology unravels the feature to easily create and control documents by speaking, with the help of technology.

Voice and speech recognition features authorize contactless control to several devices and equipment that deliver input for automatic translation and generates print-ready diction. Voice commands are replied through speech recognition devices. According to a report by Grand View Research, Inc, the global speech and voice recognition market size are estimated to reach US$31.82 billion by 2025 with a CAGR of 17.2% during the forecast period.

The growth of the overall market is primarily driven by factors such as rising acceptance of advanced technology espoused with increasing consumer demand for smart devices, a growing sense of personal data safety and security, and increasing usage of voice-enabled payments and shopping by retailers.

The demand for related devices like voice-activated systems, voice-enabled devices and the voice-enabled virtual assistant system is also expected to spike with the growing invasion of speech-based technology in diverse industries. The major adoption is observed in the banking and automobile sectors. The reason behind this is to counter fraudulent activities and enhance security by embracing voice biometrics for authentication of users. It is expected that the growing Artificial Intelligence (AI)-based systems will trigger the market soon.

Analytics Insight presents the top 10 companies operating in the global speech and voice recognition market in 2020

Nuance Communication

Nuance Communication founded in 2001 provides speech recognition and artificial intelligence products which focus on server and embedded speech recognition, telephone call steering systems, automated telephone directory services, and medical transcription software and systems.

The Massachusetts based company features Nuance recognizer for contact centres, a software that consistently delivers a great consumer service experience while improving self-service system’s containment rate and Dragon Naturally Speaking that creates documents, spreadsheets and email simply by speaking. The company is a partner with 75% of fortune 100 companies and around thousands of healthcare organisations.

Google LLC

Google’s mother company Alphabet was founded in 1998. Google provides a variety of services ranging from search engines, cloud computing, online advertisement technologies, and computer hardware and software. The California headquartered company is a global pioneer in internet-based products and services. Currently, the good-for-all company is stepping into the speech recognition market. It provides a service to convert speech-to-text feature which accurately converts speech into text using an API powered by Google’s AI technology. Google has strong network coverage with 70 offices in 50 countries across the globe.

Amazon.com, Inc

Amazon headquartered at Washington was founded in 1994. The company functions through three core segments namely, North America, international and amazon web series segments in the retail sales of consumers products and subscription. Amazon focuses on advanced technologies like artificial intelligence, cloud computing, consumer electronics, e-commerce and digital streaming. Amazon transcribe makes it easy for developers to add speech to text capability the application.

Apple, Inc

Apple, Inc is a California headquartered company that is involved in sectors like manufacturing, marketing and selling mobile phones, media devices and computers to consumers worldwide. Apple was found in 1977. The company sells its products and services mostly through direct sales force, online and retail stores and through third-party cellular network carriers, resellers and wholesalers. The Apple speech recognition process involves capturing audio of the user’s voice and sending data to Apple’s servers for processing.

IBM Corporation

IBM Corporation was founded in 1911. The New York headquartered company functions through five key segments such as cognitive solutions, technology services and cloud platforms, global business services, systems and global financing. IBM also manufactures and sells software and hardware. It delivers numerous hosting and consulting services from mainframe processors to nanotechnology domains. IBM’s speech recognition enables systems and applications to understand and process human speech.

Microsoft Corporation

Microsoft Corporation found in 1975 is a pioneer as a technology company. The Redmond, Washington headquartered company is known for its software products that mainly include Internet Explorer, Microsoft Windows OS, Microsoft Office Suite and Edge Web Browser. The Microsoft speech recognition used in Windows 10 helps find the user’s voice by the system.

Agnitio

Agnitio was found in 2004 as a spin-off from the Biometric Recognition Group-ATVS at the Technical University of Madrid. The Madrid, Spain headquartered company is a biometrics technology company that uses unique biometric characteristics to verify an individual’s identity. Agnitio speech recognition program for windows lets the user control their computer by using voice.

Verint Voice Vault

Verint Systems was founded in 2002. The New York headquartered analytics company sells software and hardware products for customer engagement management, security, surveillance and business intelligence. Verint VoiceVault voice biometrics is a standardized approach to mobile identity verification.

iFLYTEK

iFLYTEK headquartered at Hefei, Anhui, China is an advanced enterprise dedicated to research and development of advanced technologies like intellectual speech and language technologies, speech information services, integration of e-governance systems and development of software and chip products. The company was founded in 1999. The market coverage of the company is spread across North America, Europe, Asia-Pacific, Latin America, Middle East and Africa. iFLYTEK speech recognition provides services such as speech synthesis, automatic speech recognition and speech expansion.

Baidu

Baidu headquartered at Beijing, China consists of two segments including Baidu Core and iQIYI. The company was founded in 2000. The company has a direct sales market in Beijing, Dongguan, Guangzhou, Shanghai, Shenzen and Suzhou. Baidu speech recognition provides services like streamlining multi-layer truncated attention model (SMLTA) for online automatic speech recognition (ASR).

Posted on

Four in Five Legal Firms Looking to Invest in Speech Recognition

By Lawyer Monthly

A full 82% of legal firms aim to invest in speech recognition technology going forward, according to a research report from Nuance Communications Inc.

Censuswide was commissioned to conduct a survey of 1,000 legal professionals and 20 IT decision-makers in the UK, which was carried out from 23 June to 25 June. Respondents were asked questions regarding their use of technology after the government recommended that offices close earlier this year, and whether they felt properly equipped to work remotely.

25% of legal professionals did not feel properly equipped for remote work when the government advice came down earlier this year. When asked in the Censuswide survey, 56% of respondents reported that they lacked the adequate productivity tools to do their jobs as effectively from home as they could in the office.

However, 80% of respondents who used speech recognition technology for document creation in some form during this period said that they felt properly equipped.

It was also discovered that, in cases where they did not utilise voice recognition software tools, 67% of legal professionals reportedly spent between 2 and 4 hours a day typing. Only 19% made use of internal typists, and 5% used external transcription services on a regular basis.

82% of organisations surveyed said that they were looking to invest further in voice recognition technology going forward, and 62% of legal professionals who were not currently using them said that they would in future.

“The pandemic has accelerated a trend that was already underway, as many modern legal firms move to embrace new ways of working and make the most of digitalisation. In this time of economic uncertainty, legal professionals are under more pressure than ever to deliver high quality outputs – including documents – at speed, all whilst upholding the highest standards of data security,” said Ed McGuiggan, General Manager at Nuance Communications.

McGuiggan noted that speech recognition was likely to become an essential tool in order to cope with the legal profession’s new demands. “While it is undeniable that recent months have brought challenges for the legal sector, they have also presented an opportunity to further reform some outdated methods and attitudes,” he said.

Posted on

10 hospital innovation leaders share the No. 1 tech device they couldn’t live without at work

Katie Adams for Becker’s Hospital Review

Hospital innovation executives know better than most people that smart applications of technology can save time and simplify processes — even to the point where users become reliant. 

Here, 10 digital innovation leaders from hospitals and health systems across the country share the tech device or software they reach for all day long at their jobs.

Editor’s note: Responses have been lightly edited for clarity and length.

Daniel Durand, MD, chief innovation officer at LifeBridge Health (Baltimore): It’s not a super new tech device, but automated speech recognition and dictation software. It is very important for my specialty and an increasing number of physicians and keeps getting better every year.

Omer Awan, chief data and digital officer at Atrium Health (Charlotte, N.C.): My iPhone.

Muthu Krishnan, PhD, chief digital transformation officer at IKS Health (Burr Ridge, Ill.): My work laptop. Our capability to connect from anywhere securely helps me keep my work (and meeting schedule) in sync with my colleagues, partners and clients.

Peter Fleischut, MD, senior vice president and chief transformation officer at NewYork-Presbyterian Hospital (New York City): My phone.

Nick Patel, MD, chief digital officer at Prisma Health (Columbia, S.C.): My tablet PC. It’s in my bag everywhere I go; I can do everything on it. I can access my EHR, my whole suite of Office 365 including Teams and Skype for business. I love it.

Aaron Martin, executive vice president and chief digital officer at Providence (Renton, Wash.): Probably my Macbook. 

John Brownstein, PhD, chief innovation officer at Boston Children’s Hospital: Zoom.

Tom Andriola, vice chancellor of IT and data at UC Irvine (Calif.): I would still say my laptop — sorry, I know that’s uninteresting.

Lisa Prasad, vice president and chief innovation officer at Henry Ford Health System (Detroit): My Mac.

Sara Vaezy, chief digital strategy and business development officer at Providence (Renton, Wash): My iPhone. I can do 85 percent of what I need to do for my job on it.

Posted on

Language Processing Tools Improve Care Delivery for Providers

By Amy Burroughs for HealthTech

The human voice — an ordinary, familiar sound — is easy to take for granted. 

But advances in natural language processing, or NLP, a branch of artificial intelligence that allows computers to understand spoken or typed remarks, are prompting healthcare organizations to leverage that field. 

In areas such as voice-activated assistants and speech recognition platforms, NLP is creating better experiences by expanding patient access to information, cutting transcription costs and delays, and improving the quality of health records. Providers also report the tools can lower stress and allow more face time during appointments. 

That’s because speech offers unique distinction. “It’s more detailed and nuanced, and it’s the more natural way to convey what you’re thinking,” says Dr. Genevieve Melton-Meaux, a professor of surgery and health informatics at the University of Minnesota. 

The notion helped drive development of Livi, a smart assistant for patients at Aurora, Colo.-based UCHealth. The tool is integrated with UCHealth records to deliver custom support, providing test results and managing appointments and secure messaging with physicians, among other duties.

“The way people are using the technology — chatbots, virtual assistants, natural language processing — it’s all changing so fast,” says Nicole Caputo, senior director of experience and innovation at UCHealth, which also serves southern Wyoming and western Nebraska. 

Livi, referred to using the female pronoun by UCHealth teams, responds to typed commands on computers, smartphones and tablets; wider voice functionality is being developed, set to join the many voice-driven efforts that could complement as many as half of all user experiences across industries by 2024, according to a recent IDC report.

Right now, Livi can accommodate several basic spoken queries as an Amazon Alexa skill, providing resources about UCHealth (finding the closest urgent care clinic to a person’s location, for example) and location-specific tips for healthy living via Amazon’s Echo family of smart speakers. 

“Say you’ve just gotten your knee replaced and you’re looking for places to start hiking again,” Caputo says. “Livi can help you with that, along with helping you with your exercises to get there.” 

Livi already has answered 255,500 queries for more than 80,000 users, with the ultimate goal of reducing burdens on UCHealth’s help desk and call center.

NLP Allows for Real-Time Records

On the provider side, natural language processing is transforming care through tools such as Nuance’s Dragon Medical One — a cloud-based, AI-powered platform that delivers real-time transcription to a patient’s electronic health record — and Dragon Medical Practice Edition, speech recognition software designed to serve the same function.

Concord Hospital, a 238-bed facility in New Hampshire, deployed Dragon technology as part of a move to Cerner’s Millennium EHR system. Clinicians can now provide dictation from any workstation or smartphone, says Garvin Eastman, an application analyst for the hospital.

Today, 610 Concord staffers, including about 130 nurses, use NLP tools — an adoption rate of nearly 90 percent. The use of phone-based transcription services, meanwhile, has dropped by 91 percent, saving more than $1 million.

Eastman attributes the success to clear expectations set by the leadership team and a thoughtful deployment that involved a pilot program followed by phased rollouts.

A need for efficiency fueled a similar initiative at Minneapolis-based Allina Health. Before adopting Dragon transcription tools, “it could be hours before your colleagues can read a note, know what you’re thinking and take action,” says Dr. David Ingham, medical director for information services at Allina.

As of December, more than 1,550 Allina providers and therapists were using NLP technologies, which saved about $250,000 in transcription costs that month alone.

Simple yet effective commands ­expedite workflows even more by executing common functions. “I can say, ‘Go to the most recent labs,’ and the computer will navigate there for me,” Ingham says. “I can say, ‘Order a basic metabolic panel,’ and it’ll tee that up.”

Adoption of NLP Can Require Some Adjustments

Despite their relative ease of use, voice-powered tools may require a pivot.

Concord Hospital uses a variety of virtual desktop infrastructure workstations, so implementation varied by location. “We really strove to get the nurses to do their work as similarly as possible so that we weren’t trying to come up with different workarounds,” Eastman says.

At Allina Health, a cloud-based NLP service runs inside the Citrix platform without needing extensive configurations, and it didn’t have to integrate with the Epic EHR solution — a major plus, Ingham says. Because users can access voice-driven functions inside Epic’s mobile apps, the experience is seamless.

The biggest challenges for UCHealth were organizing Livi’s back-end data and managing users’ expectations (retrieval of patient portal usernames and passwords is a common request that is currently under review).“You can research it, you can look and see how people are using other chatbots,” Caputo says, “but the best way to do it is to make sure you have your data set up, put it out there, see how people are asking questions and then pivot from there.

Voice-Powered Tools Help in Breaking Down Barriers

Measurable gains are important when assessing speech-driven tools, but providers say some of the most important value is personal. 

“If we are spending our time typing, it’s less time to see patients, less time to be thinking about a case and working out problems,” Ingham says. “Collectively, that all is a factor when it comes to burnout.”

Ingham was skeptical at first, anticipating delays and errors, but he soon embraced the strengths and speed of NLP. “And I found my notes were a little more detailed, appropriately so,” he adds.

Thoughtful user input is crucial, notes Melton-Meaux, also the chief analytics officer at M Health Fairview University of Minnesota Medical Center.

“We have systems that can collect a lot of information — such as laboratory information and vital signs — and while that’s important, the richest and most interesting information is contained in the clinical notes,” she says.

Still, the efficiencies speak loudly. Concord nurses once shared notes via phone when moving a patient from the emergency department, Eastman says. Now, an ED nurse can dictate a report that is quickly ready and waiting.

Such gains extend beyond better patient care. A survey of Concord nurses revealed nearly 90 percent said the NLP platform improved job satisfaction.

Posted on

How voice technology can help banks manage risk

By Tom Rimmer for Financier Worldwide

It comes as no surprise that the pandemic has taken its toll on financial institutions (FIs). Banks are always on the lookout for ways to cut costs to improve their operational margins. The lockdown, though, has seen FIs having to rapidly invest in technology to align with remote working practices. In some cases, it was the first time these businesses had all staff working remotely, which only added to their compliance challenges. With changing consumer demands and many still staying off the high streets because of social distancing measures, banks will increasingly need to adopt more solutions to engage remotely with customers, whether that is via the phone or through video conferencing tools.

As customers transition to a more digital-first banking approach, banks are going to have to weigh up the need for human interaction and the convenience of online channels when connecting with them. Ultimately, people like to speak to other humans, especially when something is going wrong.

With fewer physical places for customers to interact with their banks in the current climate, call centres will undoubtedly find their call volumes increasing. The Banking & Payments Federation Ireland (BPFI), for example, said member banks had experienced a 400 percent increase in calls at the beginning of the crisis. The challenge then comes in how technology can aid banks in ensuring customer churn is kept low, issues are flagged immediately, and compliance needs are met.

Regulatory needs

Since the 2008 financial crisis, the number of regulations FIs face has drastically increased. With more regulations being added every year, the financial services firms that deal with personal or sensitive information encounter increasing barriers in being able to deliver their products or services. These institutions need to implement new systems and solutions to manage risk to their customers’ personal data.

This is a major task – not to mention that the ramifications of non-compliance with regulations can have a direct impact on revenues. In 2019 alone, the Financial Conduct Authority (FCA) issued over £38bn worth of fines in the UK for compliance, legal and governance-related issues.

However, fines are not the only concern. The impact on brand reputation and share prices can have a more detrimental effect overall. If an organisation faces a regulatory breach, it runs the risk of discouraging potential new customers, and losing its existing customers in the process. As brand reputation can be lost in an instant, protecting the brand is one of the most important challenges businesses face when presented with a compliance fine.

The power of voice technology

Due to the immense volume of contact centre calls, compliance has become a significant, growing challenge. FIs’ contact centres have strict regulations to follow, such as protecting credit card data (the Payment Card Industry Data Security Standard (PCI DSS)) and protecting customer data (the General Data Protection Regulation(GDPR)). The FCA, through the COBS 11.8 regulation, also states that banks need to record all customer interactions. These organisations need to not only follow these rules but also need to be able to prove their compliance in case of audit.

The problem comes in where, unlike text, it is extremely challenging and time consuming to extract useful information from audio recordings. However, with the use of voice technology, FIs can easily locate and replay stored recordings automatically. They will then be able to evaluate and categorise every customer interaction into groups that are relevant to specific compliance regulations which can then be addressed appropriately.

Adding to that, as call recordings need to be easily accessible upon request, whether from a customer or the auditor, there needs to be a notetaking and more in-depth record keeping element. Through sophisticated voice technology, this is easy. The technology enables other capabilities to be facilitated, such as indexing of conversations, searchability and timestamping of calls.

Voice technology and RegTech

With an increasing amount of regulations to adhere to, the burden to understand, manage and protect customers’ voice data is more important now than ever before. Regulatory technology (RegTech) is set to make up 34 percent of all regulatory spending by the end of 2020, according to KPMG.

A key component of RegTech is voice technology, transforming the unstructured voice data into text. This can then be used to find insights and flag any compliance issues which is essential to FIs and their ability to remain compliant. Using speech recognition technology for regulatory compliance is about delivering monitoring at scale while protecting the business and its customers.

The technology not only ensures that historical archives of voice data are transcribed for analysis, but any issues or problems that happen on the call can be resolved in near real-time. The system will automatically transcribe and analyse the customer’s words, can offer prompts, information and can even suggest escalating the call to a senior staff member if needed. This capability minimises risk significantly to FIs.

Ultimately, voice technology has the potential to reduce fines, speed up investigations and protect the brand. It also saves time by enabling all voice data to be transcribed quickly and automatically, a process that before the advances in automatic speech recognition (ASR) was difficult and time consuming. This gives FIs a better understanding of their customer, which not only aids in mapping the customer journey, their interactions and changing sentiment, but also to comply with various regulations. This is essential for brand reputation, share price security and delivering a better customer service amid changing regulations.

Posted on

Nuance and MITRE Team Up to Fight Cancer with AI, Speech Recognition and Data Interoperability

From Find Biometrics

In the fight against cancer, data is key. Accurate, robust patient data that is interoperable between use cases not only helps researchers in their efforts to understand the disease, but it also aids oncologists in providing safe and effective treatments. That’s why a recently announced strategic partnership between Nuance Communications and R&D organization MITRE stands to make a difference in the healthcare world.

The collaboration will see Nuance’s Dragon Medical One speech recognition platform working in tandem with MITRE’s mCODE – a set of data elements that, by establishing baseline standards for oncology-related health records, aims to enhance the information available in the war on cancer.

“Every interaction between a clinician and a cancer patient provides high-quality data that could lead to safer care, improved outcomes, and lower costs,” said MITRE’s Chief Medical and Technology Officer, Dr. Jay Schnitzer. “But first, we need data that is standardized and collected in a computable manner so it can be aggregated with data from many other patients and analyzed for best practices. And it must be collected in a streamlined way that doesn’t burden the clinicians. The Nuance offering will enhance this effort.”

Nuance’s Dragon Medical One solution is already playing an important role in patient care. The cloud-based speech recognition technology transcribes medical notes by dictation in accordance with industry standards, while also offering frictionless record retrieval via voice command. This process ensures accurate patient records while relieving administrative pressure on clinics and hospitals without burdening increasingly time-poor doctors. Incorporating mCODE will further improve the solution’s efficacy in oncological use cases.

“Collecting clinical data specific to oncology treatment has traditionally been a difficult task to overcome,” said Diana Nole, EVP and GM of Healthcare at Nuance. “Combining Nuance’s AI expertise with the mCODE data standard provides oncologists with the ability to easily collect and gain access to critical outcome data by simply using their voice to securely dictate notes and search within the EHR using Nuance Dragon Medical One.”

Nuance is an active player in the healthcare space, and this partnership with MITRE is the most recent example of its commitment to the market. In June, the company teamed up with Wolters Kluwer to bring new search features to Dragon Medical One. And in July the company expanded its partnership with Cerner Corporation to encompass its virtual assistant technology.

Posted on

Researchers claim masks muffle speech, but not enough to impede speech recognition

By Kyle Wiggers for Venture Beat

Health organizations including the U.S. Centers for Disease Control and Prevention, the World Health Organization, and the U.K. National Health Service advocate wearing masks to prevent the spread of infection. But masks attenuate speech, which has implications for the accuracy of speech recognition systems like Google Assistant, Alexa, and Siri. In an effort to quantify the degree to which mask materials impact acoustics, researchers at the University of Illinois conducted a study examining 12 different types of face coverings in total. They found that transparent masks had the worst acoustics compared with both medical and cloth masks, but that most masks had “little effect” on lapel microphones, suggesting existing systems might be able to recognize muffled speech without issue.

While it’s intuitive to assume mask-distorted speech would prove to be challenging for speech recognition, the evidence so far paints a mixed picture. Research published by the Educational Testing Service (ETS) concluded that while differences existed between recordings of mask wearers and those who didn’t wear masks during an English proficiency exam, the distortion didn’t lead to “significant” variations in automated exam scoring. But in a separate study, scientists at Duke Kunshan University, Lenovo, and Wuhan University found an AI system could be trained to detect whether someone’s wearing a mask from the sound of their muffled speech.

A Google spokesperson told VentureBeat there hasn’t been a measurable impact on the company’s speech recognition systems since the start of the pandemic, when mask-wearing became more common. Amazon also says it hasn’t observed a shift in speech recognition accuracy correlated with mask-wearing.

The University of Illinois researchers looked at the acoustic effects of a polypropylene surgical mask, N95 and KN95 respirators, six cloth masks made from different fabrics, two cloth masks with transparent windows, and a plastic shield. They took measurements within an “acoustically-treated” lab using a head-shaped loudspeaker and a human volunteer, both of whom had microphones placed on and near their lapel, cheek, forehead, and mouth. (The head-shaped loudspeaker, which was made of plywood, used a two-inch driver with a pattern close to that of a human speaker.)

After taking measurements without face coverings to establish a baseline, the researchers set the loudspeaker on a turntable and rotated it to capture various angles of the tested masks. Then, for each mask, they had the volunteer speak in three 30-second increments at a constant volume.

The results show that most masks had “little effect” below a frequency of 1kHz but were muffled at higher frequencies in varying degrees. The surgical mask and KN95 respirator had peak attenuation of around 4dB, while the N95 attenuated at high frequencies by about 6dB. As for the cloth masks, material and weave proved to be the key variables — 100% cotton masks had the best acoustic performance, while masks made from tightly woven denim and bedsheets performed the worst. Transparent masks blocked between 8dB and 14dB at high frequencies, making them by far the worst of the bunch.

“For all masks tested, acoustic attenuation was strongest in the front. Sound transmission to the side of and behind the talker was less strongly affected by the masks, and the shield amplified sound behind the talker,” the researchers in a paper describing their work. “These results suggest that masks may deflect sound energy to the sides rather than absorbing it. Therefore, it may be possible to use microphones placed to the side of the mask for sound reinforcement.”

The researchers recommend avoiding cotton-spandex masks for the clearest and crispest speech, but they note that recordings captured by the lapel mic showed “small” and “uniform” attenuation — the sort of attenuation that recognition systems can easily correct for. For instance, Amazon recently launched Whisper Mode for Alexa, which taps AI trained on a corpus of professional voice recordings to respond to whispered (i.e., low-decibel) speech by whispering back. An Amazon spokesperson didn’t say whether Whisper Mode is being used to improve masked speech performance, but they told VentureBeat that when Alexa speech recognition systems’ signal-to-noise ratios are lower due to customers wearing masks, engineering teams are able to address fluctuations in confidence through an active learning pipeline.

In any case, assuming the results of the University of Illinois stand up to peer review, they bode well for smart speakers, smart displays, and other voice-powered smart devices. Next time you lift your phone to summon Siri, you shouldn’t have to ditch the mask.

Posted on

5 Ways Voice Technology Is Increasingly Making Itself Heard

By PYMNTS for PYMENTS.com

Voice technology is among the things that have gotten a big boost from the global pandemic as consumers shift to buying things online and avoiding touching items in public places for fear of contracting COVID-19.

“We’ve seen a huge increase in the use of voice in the home,” Tom Taylor, senior vice president of Amazon’s Alexa unit, told GeekWire in a recent interview. “People do build relationships with Alexa. We have something like 1 million marriage proposals and compliments of love to Alexa. I don’t think anybody has ever done that to a keyboard.”

Voice technology’s adoption was already on the rise in the pre-pandemic period, as  PYMNTS’ and Visa’s How We Will Pay report demonstrated in late 2019. Our study found that about 3 in 10 consumers owned voice assistants compared to just 1.4 out of 10 in 2017. Consumers who made purchases via their smart speakers also rose to 9.6 percent of all consumers as of late 2018 vs. just 7.7 percent a year earlier.

And all of that was before the pandemic made voice assistants even more valuable as their use cases expanded to meet new consumer needs. They’re now showing up in hotel rooms, enabling grocery shopping and beginning to sneak into quick-service restaurants (QSRs) as an enhancement to all those touchscreens that have been going up in recent years. Here are five changes the industry is seeing as these devices come into greater use:

Voice-Tech And Education 

Education has been on a wild ride for the past several months, and voice assistants are becoming part of an increasingly integrated hybridized educational model of online and in-person learning, according to recent reports.

Google Assistant and Alexa-enabled devices had increasingly been making classroom appearances prior to the pandemic as supplemental educational tools for teachers. But now, their use has reportedly expanded to becoming tools for contacting students, sending out information and assignments and even sending parents custom shopping lists with the equipment and tools necessary for home-based learning.

Improving Devices’ eCommerce Abilities 

Amazon has for the past year been developing Alexa Conversations, an enhancement designed to fuse various Alexa skills together into a single distinct user experience.

Amazon Alexa VP Prem Natarajan recently told VentureBeat that will make it possible to buy tickets, make dinner reservations and arrange Uber travel in a single interaction with Alexa instead of having to muddle through multiple ones.

He said smoothing out the voice-commerce experience so customers can make arrangements organically instead of through a series of distinct conversations with their assistants makes the process both more desirable and friction free.

Beefed Up Display Screens 

Google Director of Product Barak Turovsky told VentureBeat that for voice technology to really make it to the next level will require a “multimodal” experience that combines input from text, photos or video via screen.

He said voice alone has limits in surfacing the data that the user wants. Without a screen, there’s no infinite scroll or first page of Google search results, so responses are limited to perhaps three potential results at most, Turovsky said. That often means a voice assistant can miss what the user is looking for.

Published reports say such concerns have pushed both Amazon and Google to increase their smart-display design efforts, emphasizing artificial intelligence (AI) assistants that can both share visual content and respond with voice.

A Consolidating Field Of Competitors 

While Amazon and Google are expanding their offerings, the field of competitors could be growing smaller.

For example, Bixby — Samsung’s entrant into the great virtual voice assistant race — might be disappearing as part of a possible deal between the smartphone maker and Google.

The deal would see Bixby signing off, with Samsung promoting Google’s search site, Google Assistant and Google Play Store apps instead, according to Reuters. If that happens, Google will in one fell swoop get the maker of the world’s bestselling mobile devices into the Android ecosystem.

Samsung has maintained its independence and consistently made efforts to promote its own apps, but Bixby has always at best been a glitchy product that captured limited user interest. And now the pandemic has cooled sales, prompting Samsung to consider giving up on costly development projects like Bixby in search of more profitable prospects.

In a statement, Samsung said that while it is committed to its own services, it “closely works with Google and other partners to offer the best mobile experiences.”

A Big Alexa Upgrade 

Coming soon to a smartphone near you — a revamped Amazon Alexa that will include a customized home screen based on what each individual user does the most with the app.

The upgrade, scheduled for the end of August, aims to increase simplicity, as a frequent complaint about the current Alexa app is its random and superfluous prompts from the home screen, CNBC reported. Users also dislike friction-filled sifting through various menus to access some settings.

However, the planned update promises to put users’ most-used features at the easiest place to access. That means someone who primarily uses Spotify will have a different home screen from someone who uses an Audible book system or a shopping list.

Additionally, the button to open Alexa will now be located at the top of the screen as opposed to the bottom. The upgrade will also reportedly relocate reminders, routines, skills, settings and other features to easier-to-find places on the screen.

The Bottom Line 

Will the upgrades boost Alexa’s popularity and make it a stronger competitor to phones that already come with Apple Siri or the Google Assistant preinstalled? Only time will tell — and consumer preference will decide.

However, the race for voice-technology domination is clearly on, as consumers have opened their minds to the idea of shopping with a voice assistant. Providers are responding by rushing to bring them the rest of the way along on that journey.