Posted on

How do humans understand speech?

By Aaron Wagner for Penn State News

UNIVERSITY PARK, Pa. — New funding from the National Science Foundation’s Build and Broaden Program will enable a team of researchers from Penn State and North Carolina Agricultural and Technical State University (NC A&T) to explore how speech recognition works while training a new generation of speech scientists at America’s largest historically Black university.

Research has shown that speech-recognition technology performs significantly worse at understanding speech by Black Americans than by white Americans. These systems can be biased, and that bias may be exacerbated by the fact that few Americans of color work in speech-science-related fields.

Understanding how humans understand speech

Navin Viswanathan, associate professor of communication sciences and disorders, will lead the research team at Penn State.

“In this research, we are pursuing a fundamental question,” Viswanathan explained. “How human listeners perceive speech so successfully despite considerable variation across different speakers, speaking rates, listening situations, etc., is not fully understood. Understanding this will provide insight into how human speech works on a fundamental level. On an immediate, practical level, it will enable researchers to improve speech-recognition technology.”

Joseph Stephens, professor of psychology, will lead the research team at NC A&T.

“There are conflicting theories of how speech perception works at a very basic level,” Stephens said. “One of the great strengths of this project is that it brings together investigators from different theoretical perspectives to resolve this conflict with careful experiments.”

According to the research team, speech-recognition technology works in many aspects of people’s lives, but it is not as capable as a human listener at understanding speech, especially when the speech varies from norms established in the software. Speech-recognition technology can be improved using the same mechanisms that humans use, once those mechanisms are understood.

Building and broadening the field of speech science

Increasing diversity in speech science is the other focus of the project. 

“When a field lacks diversity among researchers, it can limit the perspectives and approaches that are used, which can lead to technologies and solutions being limited, as well,” Stephens said. “We will help speech science to become more inclusive by increasing the capacity and involvement of students from groups that are underrepresented in the field.”

The National Science Foundation’s Build and Broaden Program focuses on supporting research, offering training opportunities, and creating greater research infrastructure at minority-serving institutions. New awards for the Build and Broaden Program, which total more than $12 million, support more than 20 minority-serving institutions in 12 states and Washington, D.C. Nearly half of this funding came from the American Rescue Plan Act of 2021. These funds aim to bolster institutions and researchers who were impacted particularly hard by the COVID-19 pandemic.

Build and Broaden is funding this project in part because it will strengthen research capacity in speech science at NC A&T. The project will provide research training for NC A&T students in speech science, foster collaborations between researchers at NC A&T and Penn State, and enhance opportunities for faculty development at NC A&T.

By providing training in speech science at NC A&T, the research team will mentor a more diverse group of future researchers. Increasing the diversity in this field will help to decrease bias in speech-recognition technology and throughout the field.

Viswanathan expressed excitement about developing a meaningful and far-reaching collaboration with NC A&T.

“This project directly creates opportunities for students and faculty from both institutions to work together on questions of common interest,” Viswanathan said. “More broadly, we hope that this will be the first step towards building stronger connections across the two research groups and promoting critical conversations about fundamental issues that underlie the underrepresentation of Black scholars in the field of speech science.”

Ji Min Lee, associate professor of communications sciences and disorders; Anne Olmstead, assistant professor of communications sciences and disorders; Matthew Carlson, associate professor of Spanish and linguistics; Paola “Guili” Dussias, professor of Spanish, linguistics and psychology; Elisabeth Karuza, assistant professor of psychology; and Janet van Hell, professor of psychology and linguistics, will contribute to this project at Penn State. Cassandra Germain, assistant professor of psychology; Deana McQuitty, associate professor of speech communication; and Joy Kennedy, associate professor of speech communication, will contribute to the project at North Carolina Agricultural and Technical State University.

Posted on

The Race to Save Indigenous Languages, Using Automatic Speech Recognition

By Tanner Stening for News@Northeastern

Michael Running Wolf still has that old TI-89 graphing calculator he used in high school that helped propel his interest in technology. 

“Back then, my teachers saw I was really interested in it,” says Running Wolf, clinical instructor of computer science at Northeastern University. “Actually a couple of them printed out hundreds of pages of instructions for me on how to code” the device so that it could play games. 

What Running Wolf, who grew up in a remote Cheyenne village in Birney, Montana, didn’t realize at the time, poring over the stack of printouts at home by the light of kerosene lamps, was that he was actually teaching himself basic programming.

“I thought I was just learning how to put computer games on my calculator,” Running Wolf says with a laugh. 

But it hadn’t been his first encounter with technology. Growing up in the windy plains near the Northern Cheyenne Indian Reservation, Running Wolf says that although his family—which is part Cheyenne, part Lakota—didn’t have daily access to running water or electricity, sometimes, when the winds died down, the power would flicker on, and he’d plug in his Atari console and play games with his sisters. 

These early experiences would spur forward a lifelong interest in computers, artificial intelligence, and software engineering that Running Wolf is now harnessing to help reawaken endangered indigenous languages in North and South America, some of which are so critically at risk of extinction that their tallies of living native speakers have dwindled into the single digits. 

Running Wolf’s goal is to develop methods for documenting and maintaining these early languages through automatic speech recognition software, helping to keep them “alive” and well-documented. It would be a process, he says, that tribal and indigenous communities could use to supplement their own language reclamation efforts, which have intensified in recent years amid the threats facing languages. 

“The grandiose plan, the far-off dream, is we can create technology to not only preserve, but reclaim languages,” says Running Wolf, who teaches computer science at Northeastern’s Vancouver campus. “Preservation isn’t what we want. That’s like taking something and embalming it and putting it in a museum. Languages are living things.”

The better thing to say is that they’ve “gone to sleep,” Running Wolf says. 

And the threats to indigenous languages are real. Of the roughly 6,700 languages spoken in the world, about 40 percent are in danger of atrophying out of existence forever, according to UNESCO Atlas of Languages in Danger. The loss of these languages also represents the loss of whole systems of knowledge unique to a culture, and the ability to transmit that knowledge across generations.

While the situation appears dire—and is, in many cases—Running Wolf says nearly every Native American tribe is engaged in language reclamation efforts. In New England, one notable tribe doing so is the Mashpee Wampanoag Tribe, whose native tongue is now being taught in public schools on Cape Cod, Massachusetts. 

But the problem, he says, is that in the ever-evolving field of computational linguistics, little research has been devoted to Native American languages. This is partially due to a lack of linguistic data, but it is also because many native languages are “polysynthetic,” meaning they contain words that comprise many morphemes, which are the smallest units of meaning in language, Running Wolf says. 

Polysynthetic languages often have very long words—words that can mean an entire sentence, or denote a sentence’s worth of meaning. 

Further complicating the effort is the fact that many Native American languages don’t have an orthography, or an alphabet, he says. In terms of what languages need to keep them afloat, Running Wolf maintains that orthographies are not vital. Many indigenous languages have survived through a strong oral tradition in lieu of a robust written one.

But for scholars looking to build databases and transcription methods, like Running Wolf, written texts are important to filling in the gaps. What’s holding researchers back from building automatic speech recognition for indigenous languages is precisely that there is a lack of audio and textual data available to them.

Using hundreds of hours of audio from various tribes, Running Wolf has managed to produce some rudimentary results. So far, the automatic speech recognition software he and his team have developed can recognize single, simple words from some of the indigenous languages they have data for. 

“Right now, we’re building a corpus of audio and texts to start showing early results,” Running Wolf says. 

Importantly, he says, “I think we have an approach that’s scientifically sound.”

Eventually, Running Wolf says he hopes to create a way for tribes to provide their youth with tools to learn these ancient languages by way of technological immersion—through things like augmented or virtual reality, he says. 

Some of these technologies are already under development by Running Wolf and his team, made up of a linguist, a data scientist, a machine learning engineer, and his wife, who used to be a program manager, among others. All of the ongoing research and development is being done in consultation with numerous tribal communities, Running Wolf says.

“It’s all coming from the people,” he says. “They want to work with us, and we’re doing the best to respect their knowledge systems.”

Posted on

Three Ways AI Is Improving Assistive Technology

Wendy Gonzalez for Forbes

Artificial intelligence (AI) and machine learning (ML) are some of the buzziest terms in tech and for a good reason. These innovations have the potential to tackle some of humanity’s biggest obstacles across industries, from medicine to education and sustainability. One sector, in particular, is set to see massive advancement through these new technologies: assistive technology. 

Assistive technology is defined as any product that improves the lives of individuals who otherwise may not be able to complete tasks without specialized equipment, such as wheelchairs and dictation services. Globally, more than 1 billion people depend on assistive technology. When implemented effectively, assistive technology can improve accessibility and quality of life for all, regardless of ability. 

Here are three ways AI is currently improving assistive technology and its use-cases, which might give your company some new ideas for product innovation: 

Ensuring Education For All

Accessibility remains a challenging aspect of education. For children with learning disabilities or sensory impairments, dictation technology, more commonly known as speech-to-text or voice recognition, can help them to write and revise without pen or paper. In fact, 75 out of 149 participants with severe reading disabilities reported increased motivation in their schoolwork after a year of incorporating assistive technology.

This technology works best when powered by high-quality AI. Natural Language Processing (NLP) and machine learning algorithms have the capability to improve the accuracy of speech recognition and word predictability, which can minimize dictation errors while facilitating effective communication from student to teacher or among collaborating schoolmates. 

That said, according to a 2001 study, only 35% of elementary schools — arguably the most significant portion of education a child receives — provide any assistive technology. This statistic could change due to social impact of AI programs. These include Microsoft’s AI for Accessibility initiative, which invests in innovations that support people with neuro-diversities and disabilities. Its projects include educational AI applications that provide students with visual impairments the text-to-speech, speech recognition and object recognition tools they need to succeed in the classroom.  

Better Outcomes For Medical Technology

With a rapidly aging population estimated to top approximately 2 billion over the age of 60 by 2050, our plans to care for our loved ones could rely heavily on AI and ML in the future. Doctors and entrepreneurs are already paving the way; in the past decade alone, medical AI investments topped $8.5 billion in venture capital funding for the top 50 startups. 

Robot-assisted surgery is just one AI-powered innovation. In 2018, robot-assisted procedures accounted for 15.1% of all general surgeries, and this percentage is expected to rise as surgeons implement additional AI-driven surgical applications in operating rooms. When compared to traditional open surgery, robot-assisted surgeons tend to leave smaller incisions, which reduces overall pain and scarring, thereby leading to quicker recovery times.

AI-powered wearable medical devices, such as female fertility-cycle trackers, are another popular choice. Demand for products including diabetic-tracking sweat meters and respiratory patients’ oximeters have created a market that’s looking at a 23% CAGR by 2023. 

What’s more, data taken from medical devices could contribute to more than $7 billion in savings per year for the U.S. healthcare market. This data improves doctors’ understanding of preventative care and better informs post-recovery methods when patients leave after hospital procedures.

Unlocking Possibilities In Transportation And Navigation

Accessible mobility is another challenge that assistive technology can help tackle. Through AI-powered running apps and suitcases that can navigate through entire airports, assistive technology is changing how we move and travel. One example is Project Guideline, a Google project helping individuals who are visually impaired navigate their way through roads and paths with an app that combines computer vision and a machine-learning algorithm to aid the runner alongside a pre-designed path. 

Future runners and walkers may one day navigate roads and sidewalks unaccompanied by guide dogs or sighted guides, gaining autonomy and confidence while accomplishing everyday tasks and activities without hindrance. For instance, developed and spearheaded by Chieko Asakawa, a Carnegie Mellon Professor who is blind, CaBot is a navigation robot that uses sensor information to help avoid airport obstacles, alert someone to nearby stores and assist with required actions like standing in line at airport security checkpoints. 

The Enhancement Of Assistive Technology

These are just some of the ways that AI assistive technology can transform the way individuals and society move and live. To ensure assistive technologies are actively benefiting individuals with disabilities, companies must also maintain accurate and diverse data sets with annotation that is provided by well-trained and experienced AI teams. These ever-updating data sets need to be continually tested before, during and after implementation.

AI possesses the potential to power missions for the greater good of society. Ethical AI can transform the ways assistive technologies improve the lives of millions in need. What other types of AI-powered assistive technology have you come across and how could your company make moves to enter this industry effectively? 

Posted on

VA to move Nuance’s voice-enabled clinical assistant to the cloud: 5 details

By Katie Adams for Becker’s Hospital Review

The Department of Veterans Affairs is migrating to the cloud platform for Nuance’s automated clinical note-taking system, the health system said Sept. 8.

Five details:

  1. ​​The VA will use the Nuance Dragon Medical One speech recognition cloud platform and Nuance’s mobile microphone app, allowing physicians to use their voices to document patient visits more efficiently. The system is intended to allow physicians to spend more time with patients and less time on administrative work.
  2. The VA deployed Nuance Dragon Medical products systemwide in 2014. It is now upgrading to the system’s cloud offering so its physicians can utilize the added capabilities and mobile flexibility.
  3. To ensure Nuance’s products adhere to the government’s latest guidance on data security and privacy, the Federal Risk and Authorization Management Program approved the VA’s decision to adopt the technologies.
  4. “The combination of our cloud-based platforms, secure application framework and deep experience working with the VA health system made it possible for us to demonstrate our compliance with FedRAMP to meet the needs of the U.S. government. We are proving that meeting security requirements and delivering the outcomes and workflows that matter to clinicians don’t have to be mutually exclusive,” Diana Nole, Nuance’s executive vice president and general manager of healthcare, said in a news release.
  5. Nuance Dragon Medical One is used by more than 550,000 physicians.
Posted on

Voice AI Technology Is More Advanced Than You Might Think

For Annie Brown for Forbes

Systems that can handle repetitive tasks have supported global economies for generations. But systems that can handle conversations and interactions? Those have felt impossible, due to the complexity of human speech. Any of us who regularly use Alexa or Siri can attest to the deficiencies of machine learning in handling human messages. The average person has yet to interact with the next generation of voice AI tools, but what this technology is capable of has the potential to change the world as we know it.

The following is a discussion of three innovative technologies are accelerating the pace of progress in this sector.

Conversational AI for Ordering

Experts in voice AI have prioritized technology that can alleviate menial tasks, freeing humans up to engage in high-impact, creative endeavors. Drive-through ordering was early identified by developers as an area in which conversational AI could make an impact, and one company appears to have cracked the code.

Creating a conversational AI system that can handle drive-through restaurant ordering may sound simple: load in the menu, use chat-based AI, and you’ve done it. The actual solutions aren’t quite so easy. In fact, creating a system that works in an outdoor environment—handling car noises, traffic, other speakers—and one that has sophisticated enough speech recognition to decipher multiple accents, genders, and ages, presents immense challenges.

The co-founders of Hi Auto, Roy Baharav and Eyal Shapira, both have a background in AI systems for audio: Baharav in complex AI systems at Google and Shapira in NLP and chat interfacing.

Baharav describes the difficulties of making a system like this work: “Speech handling in general, for humans, is hard. You talk to your phone and it understands you – that is a completely different problem from understanding speech in an outdoor environment. In a drive-through, people are using unique speech patterns. People are indecisive – they’re changing their minds a lot.”

That latter issue illustrates what they call multi-turn conversation, or the back-and-forth we humans do so effortlessly. After years of practice, model training, and refinement, Hi Auto has now installed their conversational AI systems in drive-throughs around the country, and are seeing a 90% level of accuracy.

Shapira forecasts, “Three years from now, we will probably see as many as 40,000 restaurant locations using conversational AI. It’s going to become a mainstream solution.” 

“AI can address two of the critical problems in quick-serve restaurants,” comments Joe Jensen, a Vice President at Intel Corporation, “Order accuracy which goes straight to consumer satisfaction and then order accuracy also hits on staff costs in reducing that extra time staff spends.” 

Conversation Cloud for Intelligent Machines

A second groundbreaking innovation in the world of conversational AI is using a technique that turns human language into an input.

The CEO of Whitehead AI, Diwank Tomer, illustrates the historical challenges faced by conversational AI: “It turns out that, when we’re talking or writing or conveying anything in human language, we depend on background information a lot. It’s not just general facts about the world but things like how I’m feeling or how well defined something is.

“These are obvious and transparent to us but very difficult for AI to do. That’s why jokes are so difficult for AI to understand. It’s typically something ridiculous or impossible, framed in a way that seems otherwise. For humans, it’s obvious. For AI, not so much. AI only interprets things literally.”

So, how does a system incapable of interpreting nuance, emotion, or making inferences adequately communicate with humans? The same way a non-native speaker initially understands a new language: using context.

Context aware AI is building models that can use extra information, beyond the identity of the speaker or other facts. Chatbots are one area which are inherently lacking, and could benefit from this technology. For instance, if a chatbot could glean contextual information from a user’s profile, previous interactions, and other data points, that could be used to frame highly intelligent responses.

Tomer describes it this way, “We are building an infrastructure for manipulating natural language. Something new that we’ve built is chit chat API – when you say something and it can’t be understood, Alexa will respond with, ‘I’m sorry, I can’t understand that.’ It’s possible now to actually pick up or reply with witty answers.”

Tomer approaches the future of these technologies with high hopes: “Understanding conversation is powerful. Imagine having conversations with any computer: if you’re stuck in an elevator, you could scream and it would call for help. Our senses are extended through technology.”

Data Process Automation

Audio is just one form of unstructured data. When collected, assessed, and interpreted, the output of patterns and trends can be used to make strategic decisions or provide valuable feedback.

super.AI was founded by Brad Cordova. The company uses AI to automate the processing of unstructured data. Data Process Automation, or DPA, can be used to automate repetitive tasks that deal with unstructured data, including audio and video files. 

For example, in a large education company, children use a website to read sentences aloud. super.AI used a process automation application to see how many errors a child made. This automation process has a higher accuracy and faster response time than when done by humans, enabling better feedback for enhanced learning.

Another example has to do with personal information (PI), which is a key point of concern in today’s privacy-conscious world, especially when it comes to AI. super.AI has a system of audio reduction whereby it can remove PI from audio, including name, address, and social security numbers. It can also remove copyrighted material from segments of audio or video, ensuring GDPR or CCPA compliance.

It’s clear that the supportive qualities of super.AI are valuable, but when it comes to the people who currently do everything from quality assurance on website product listings to note taking at a meeting, the question is this: are we going too far to replace humans?

Cordova would say no, “Humans and machines are orthogonal. If you see the best chess players: they aren’t human or machine, they’re humans and machines working together. We know intuitively as humans what we’re put on this earth for. You feel good when you talk with people, feel empathy, and do creative tasks.

“There are a lot of tasks where you don’t feel great: tasks that humans shouldn’t be doing. We want humans to be more human. It’s not about taking humans’ jobs, it’s about allowing humans to operate where we’re best and machines aren’t.”

Voice AI is chartering unprecedented territory and growing at a pace that will inevitably transform markets. The adoption rates for this kind of tech may change most industries as we currently know them. The more AI is integrated, the more humans can benefit from it. As Cordova succinctly states, “AI is the next, and maybe the last technology we will develop as humans.” The capacity of AI to take on new roles in our society has the power to let humans be more human. And that is the best of all possible outcomes.

Posted on

Speech recognition technology translates brain waves into sentences

By DRS. NORBERT HERZOG AND DAVID NIESEL for Galveston County The Daily News

I remember many years ago when speech recognition software was introduced. It was astounding. Spoken words appeared on your computer screen without the assistance of a keyboard. This was an amazing innovation at the time.

Recent studies have taken this a step further: translating brain waves into complete sentences. Scientists report the error rate is as low as 3 percent, which is much better than my speech-to-text software many years ago.

Although we still don’t know much about how the brain works, we’re making some significant progress at understanding some of its complex functions. For example, scientists have been able to map our memories to precise regions of the brain.

In animal models, scientists identified the brain cells where specific memories were stored and then altered them by manipulating those cells. This is amazing, and it may sound to some like the beginnings of making “The Manchurian Candidate” a reality.

In other work, we’re beginning to be able to harness brain waves or signals for practical use to help people who’ve become incapacitated. Recall the media stories on paralyzed patients who can use their thoughts to control a sophisticated mechanical arm to feed themselves or move objects. This ability to use the brain to interact with the outside world holds great promise for humans in restoring lost functions.

Some recent work has achieved yet another leap forward. Scientists have developed a way of decoding sentences by examining brainwaves, also called neural signals. This is a challenge that scientists have been working on for many years.

Previous studies explored translating brain waves into words using the component sounds, or phonemes, that make up words which was subject to a high error rate. For this work, they used an approach that has been successfully used for years in the translation between different languages. This uses neural networks, which is an incredibly accurate system. It’s the same technology as the language translation apps on your smartphone.

For the study, the scientists used electrodes in subjects to read their brainwaves. The subjects read sentences while the electrodes recorded their brain waves into a computer. The scientists set the neural network in the computer to use the brainwaves as the first language, and they set the sentences the subjects read as the second language. Brilliant.

After this, the computer could translate brain waves just like another language. The accuracy of the translation was as good as what you could expect from professional language translators. In the study, the subjects read 50 sentences that had about 250 unique words.

The technique will have to be expanded to include more words and phrases. One promising attribute was that the machine learning was trainable, meaning that accuracy improved after pretraining the machine software. The machine learning also was transferable person to person, so a system could pre-learn brain waves that would work for different people.

As you can imagine, this would be a huge advance to help disabled people who have lost the ability to speak. Look for many more advances in this area soon.

Posted on

‘Hey Cerner’: Company seeks health systems to help test new Voice Assist tech

By Mike Miliard for Healthcare IT News

As Cerner gears up to launch its new natural language processing technology, Voice Assist, it is asking for healthcare providers to sign on as new testing partners.

The company says Voice Assist will enable easier interaction with Cerner electronic health records, enabling clinicians – by simply saying “Hey Cerner” – to query and retrieve patient data from the EHR, place orders, set up reminders and more.

The goal is time savings, burden reduction and improved provider experience, as EHR clinical end-users can more easily document while navigating the patient record

Cerner says Voice Assist – which is powered by Nuance speech recognition technology and should be available by 2021 – can respond to phrases such as “Remind me to call the patient in six months about their high cholesterol,” “Order Lipitor 40 mg oral tablet” and “What is the latest white blood cell count?”

New Jersey-based St. Joseph’s Health and Indiana University Health are two Cerner clients who are already exploring early versions of the new tool.

Cerner’s rival Epic launched its own ambient voice technology – known as Hey Epic, offering a similar range of capabilities – earlier this year.

And, as we showed in this special report, health systems such as Beth Israel Deaconess and Northwell Health have also been finding new and innovative use cases for an array of other voice assistant tools: Amazon’s Alexa, Apple’s Siri, Google Home and Assistant, Microsoft Cortana, and others.

“St. Joseph’s Health is excited to pilot Cerner’s Voice Assist technology, which will enable our clinicians to complete several tasks in the EHR via voice commands,” said Lisa Green, director of clinical information systems at St. Joseph’s Health, in a statement.

“We envision that this technology will be conducive to more meaningful clinician-patient interaction, since the clinicians will spend less time manually documenting. We hope to see improved efficiency, [and] clinician and patient satisfaction throughout this trial period,” she added.

“At IU Health, we’re creating designated innovations centers where we trial the latest new technologies in real clinical workflows,” said Cliff J. Hohban, vice president of IS, applications & PMO at IU Health. “This allows us to move new tools into our system rapidly and iteratively. We’re excited to pilot Cerner’s Voice Assist, which will allow our clinician’s to handle several tasks in the EHR with their voice.”

Posted on

7 Key Predictions for the Future Of Voice Assistants and AI

By Hardik Shah for Unite.AI

We live in an exciting time, especially in innovation, progress, technology, and commerce. Some of the latest tech inventions, such as artificial intelligence and machine learning, are making a tremendous impact on every industry.

Voice assistants powered by AI have already transformed the eCommerce trend. The eCommerce giants such as Amazon continue to fuel this trend as they compete for the market share. Voice interfacing is advancing at an exceptional rate in healthcare and banking industries to keep pace with consumers’ demands.

Reason to shift towards voice assistants and AI

The crux of the shift towards voice user interfaces is to change the users’ demands. Increased awareness and a higher level of comfort are explained specifically by millennial consumers. In this ever-changing digital world where speed, convenience, and efficiency are continually being optimized.

The adoption of AI in users’ lives is fueling the shift towards voice applications. Various IoT devices such as smart appliances, thermostats, and speakers give voice assistants more utility in a connected user’s life. You can see the usage of voice assistants in smart speakers; however, it only starts there.

According to the report, the voice recognition tech market reached close to 11 billion U.S. dollars in the year 2019 and is expected to increase by approximately 17% by 2025.

Voice apps in this technology are seen everywhere.

Here are 7 Key Predictions of Voice Assistants and AI

1. Streamlined Conversations

Both technology giants, Amazon and Google, announced that their assistants would no longer require the use of repeated “wake” words. Earlier, both assistants were dependent on a wake word such as Ok, Google, or Alexa to initiate a conversation line. For instance, one has to ask, “Alexa! What’s the current weather?” Then, for the next command, “Alexa” again before requesting that the voice assistant set the hallway thermostat to 25 degrees.”

With this new facility, it’s more convenient and natural for users. Simply put, you need to say “current weather condition” without requiring the wake word again.

Now users use voice assistants in particular locations, while multitasking and it can either be alone or amongst a group of people. Having devices that can use these contextual factors make a conversation more convenient and efficient. Still, it also shows that developers behind the technology are aiming to provide a more user-centric experience.

2. Voice Push Notifications

Push notifications are about sending a reminder when the due date is near for a task. It is a useful tool to engage users within the application. Since the onset of voice technology, this feature has got a unique touch. Voice push notifications are an effective way of increasing user engagement and retention and reminding users of the app and displaying relevant messages.

Both Google and Alexa allow users to get notifications compatible with third-party apps. Such notifications are related to calendar appointments from the main features.

3. Search Behaviors will change

Voice search or voice-enabled search allows users to use a voice command to search the intent, website, or an app.

  • Voice-based shopping is expected to rise to $40 billion in 2022.
  • The market share means that consumers’ spending via voice assistants is expected to reach 18% by 2022.
  • According to Juniper Research, voice-based ad revenues will reach $19 billion by 2022.

After going through all these statistics, it’s clear that there’s an unprecedented rise in mobile devices’ voice searches. Brands have transformed from touchpoints into listening points. Organic searches are the primary way for brands to gain visibility.

Therefore, by seeing the popularity and adaptability of voice search, marketers and advertising agencies are expecting more from voice goliaths like Amazon and Google, to update their platforms for paid messaging.

4. Personalized Experiences

The voice-enabled devices and digital assistants such as Google Home and Amazon’s Alexa enable customers to engage through the most natural communication form. The data shows that the online sales of voice-enabled devices grew 39% year-over-year.

Voice assistants provide more personalized experiences while improving the difference between voices. Google Home supports six user accounts and quickly identifies unique voices.

Customers may put queries like “What restaurant has the best lunch in LA?” or “What restaurants are open for lunch now?” The voice assistants are smart enough to specify every detail, like schedule appointments for any individual users. Assistants are enough to memorize even the nicknames, locations, payment information, and other information.

5. Security will be a focus.

According to the report, about 41% of voice devices users are concerned about trust and confidentiality. All you need to make it more secure and convenient for customers to make purchases, Amazon and Google have introduced some security measures such as speaker verification and ID to use voice assistant experience.

Furthermore, if a user books an appointment, reserves a restaurant, finds local cinema time, schedules an Uber, payment concerns, sensitive information, it has become more critical to make it more secure and convenient for customers to make purchases. Speaker verification and ID both are paramount as part of the voice assistant experience.

6. Touch Interaction

A new feature that has been launched by Google in CES2019, which is all about integrating the finest voice and visual display into a seamless experience. It has been illustrated as a link screen. The display can show details like local traffic information or events and weather information. This functionality adds visual and voice capabilities together, which allows users to communicate more with the helper.

7. Compatibility & Integration

Amazon is at the forefront when it’s about integrating voice technology with other products. People who know Alexa must be familiar with the fact that the voice assistants are already integrated into a vast array of products such as Samsung’s Family Hub refrigerator. Google also has announced its product named Google Assistant Connect. The underlying idea behind this is for manufacturers to create custom devices that serve specific functions and are integrated with the Assistant.

Speaking about the advancement of voice app development, the global voice eCommerce industry is expected to be worth $40 billion by 2020, according to the report. Not only this, in 2020, it’s expected to see a greater interest in the development of voice-enabled devices. It increases in mid-level devices that have some assistant functionality but are not full-blown smart speakers.

The takeaway

You have gone through a few key predictions of the future of voice assistants and AI. However, several barriers that need to be overcome before voice-enabled devices see mass adoption. The advancement in technology paves the future for voice app development, specifically in AI, NLP (Natural Language Processing), and machine learning.

The AI behind it has to handle better challenges like background noise and accents to build a robust speech recognition experience. Undeniably, consumers have become increasingly comfortable and reliant upon using voice to talk to their phones, smart home devices, vehicles, etc. Voice technology has become a key interface to the digital world, and voice interface design and voice app development will be more significant in demand.

Posted on

5 Ways Voice Technology Is Increasingly Making Itself Heard


Voice technology is among the things that have gotten a big boost from the global pandemic as consumers shift to buying things online and avoiding touching items in public places for fear of contracting COVID-19.

“We’ve seen a huge increase in the use of voice in the home,” Tom Taylor, senior vice president of Amazon’s Alexa unit, told GeekWire in a recent interview. “People do build relationships with Alexa. We have something like 1 million marriage proposals and compliments of love to Alexa. I don’t think anybody has ever done that to a keyboard.”

Voice technology’s adoption was already on the rise in the pre-pandemic period, as  PYMNTS’ and Visa’s How We Will Pay report demonstrated in late 2019. Our study found that about 3 in 10 consumers owned voice assistants compared to just 1.4 out of 10 in 2017. Consumers who made purchases via their smart speakers also rose to 9.6 percent of all consumers as of late 2018 vs. just 7.7 percent a year earlier.

And all of that was before the pandemic made voice assistants even more valuable as their use cases expanded to meet new consumer needs. They’re now showing up in hotel rooms, enabling grocery shopping and beginning to sneak into quick-service restaurants (QSRs) as an enhancement to all those touchscreens that have been going up in recent years. Here are five changes the industry is seeing as these devices come into greater use:

Voice-Tech And Education 

Education has been on a wild ride for the past several months, and voice assistants are becoming part of an increasingly integrated hybridized educational model of online and in-person learning, according to recent reports.

Google Assistant and Alexa-enabled devices had increasingly been making classroom appearances prior to the pandemic as supplemental educational tools for teachers. But now, their use has reportedly expanded to becoming tools for contacting students, sending out information and assignments and even sending parents custom shopping lists with the equipment and tools necessary for home-based learning.

Improving Devices’ eCommerce Abilities 

Amazon has for the past year been developing Alexa Conversations, an enhancement designed to fuse various Alexa skills together into a single distinct user experience.

Amazon Alexa VP Prem Natarajan recently told VentureBeat that will make it possible to buy tickets, make dinner reservations and arrange Uber travel in a single interaction with Alexa instead of having to muddle through multiple ones.

He said smoothing out the voice-commerce experience so customers can make arrangements organically instead of through a series of distinct conversations with their assistants makes the process both more desirable and friction free.

Beefed Up Display Screens 

Google Director of Product Barak Turovsky told VentureBeat that for voice technology to really make it to the next level will require a “multimodal” experience that combines input from text, photos or video via screen.

He said voice alone has limits in surfacing the data that the user wants. Without a screen, there’s no infinite scroll or first page of Google search results, so responses are limited to perhaps three potential results at most, Turovsky said. That often means a voice assistant can miss what the user is looking for.

Published reports say such concerns have pushed both Amazon and Google to increase their smart-display design efforts, emphasizing artificial intelligence (AI) assistants that can both share visual content and respond with voice.

A Consolidating Field Of Competitors 

While Amazon and Google are expanding their offerings, the field of competitors could be growing smaller.

For example, Bixby — Samsung’s entrant into the great virtual voice assistant race — might be disappearing as part of a possible deal between the smartphone maker and Google.

The deal would see Bixby signing off, with Samsung promoting Google’s search site, Google Assistant and Google Play Store apps instead, according to Reuters. If that happens, Google will in one fell swoop get the maker of the world’s bestselling mobile devices into the Android ecosystem.

Samsung has maintained its independence and consistently made efforts to promote its own apps, but Bixby has always at best been a glitchy product that captured limited user interest. And now the pandemic has cooled sales, prompting Samsung to consider giving up on costly development projects like Bixby in search of more profitable prospects.

In a statement, Samsung said that while it is committed to its own services, it “closely works with Google and other partners to offer the best mobile experiences.”

A Big Alexa Upgrade 

Coming soon to a smartphone near you — a revamped Amazon Alexa that will include a customized home screen based on what each individual user does the most with the app.

The upgrade, scheduled for the end of August, aims to increase simplicity, as a frequent complaint about the current Alexa app is its random and superfluous prompts from the home screen, CNBC reported. Users also dislike friction-filled sifting through various menus to access some settings.

However, the planned update promises to put users’ most-used features at the easiest place to access. That means someone who primarily uses Spotify will have a different home screen from someone who uses an Audible book system or a shopping list.

Additionally, the button to open Alexa will now be located at the top of the screen as opposed to the bottom. The upgrade will also reportedly relocate reminders, routines, skills, settings and other features to easier-to-find places on the screen.

The Bottom Line 

Will the upgrades boost Alexa’s popularity and make it a stronger competitor to phones that already come with Apple Siri or the Google Assistant preinstalled? Only time will tell — and consumer preference will decide.

However, the race for voice-technology domination is clearly on, as consumers have opened their minds to the idea of shopping with a voice assistant. Providers are responding by rushing to bring them the rest of the way along on that journey.