Posted on

How do humans understand speech?

By Aaron Wagner for Penn State News

UNIVERSITY PARK, Pa. — New funding from the National Science Foundation’s Build and Broaden Program will enable a team of researchers from Penn State and North Carolina Agricultural and Technical State University (NC A&T) to explore how speech recognition works while training a new generation of speech scientists at America’s largest historically Black university.

Research has shown that speech-recognition technology performs significantly worse at understanding speech by Black Americans than by white Americans. These systems can be biased, and that bias may be exacerbated by the fact that few Americans of color work in speech-science-related fields.

Understanding how humans understand speech

Navin Viswanathan, associate professor of communication sciences and disorders, will lead the research team at Penn State.

“In this research, we are pursuing a fundamental question,” Viswanathan explained. “How human listeners perceive speech so successfully despite considerable variation across different speakers, speaking rates, listening situations, etc., is not fully understood. Understanding this will provide insight into how human speech works on a fundamental level. On an immediate, practical level, it will enable researchers to improve speech-recognition technology.”

Joseph Stephens, professor of psychology, will lead the research team at NC A&T.

“There are conflicting theories of how speech perception works at a very basic level,” Stephens said. “One of the great strengths of this project is that it brings together investigators from different theoretical perspectives to resolve this conflict with careful experiments.”

According to the research team, speech-recognition technology works in many aspects of people’s lives, but it is not as capable as a human listener at understanding speech, especially when the speech varies from norms established in the software. Speech-recognition technology can be improved using the same mechanisms that humans use, once those mechanisms are understood.

Building and broadening the field of speech science

Increasing diversity in speech science is the other focus of the project. 

“When a field lacks diversity among researchers, it can limit the perspectives and approaches that are used, which can lead to technologies and solutions being limited, as well,” Stephens said. “We will help speech science to become more inclusive by increasing the capacity and involvement of students from groups that are underrepresented in the field.”

The National Science Foundation’s Build and Broaden Program focuses on supporting research, offering training opportunities, and creating greater research infrastructure at minority-serving institutions. New awards for the Build and Broaden Program, which total more than $12 million, support more than 20 minority-serving institutions in 12 states and Washington, D.C. Nearly half of this funding came from the American Rescue Plan Act of 2021. These funds aim to bolster institutions and researchers who were impacted particularly hard by the COVID-19 pandemic.

Build and Broaden is funding this project in part because it will strengthen research capacity in speech science at NC A&T. The project will provide research training for NC A&T students in speech science, foster collaborations between researchers at NC A&T and Penn State, and enhance opportunities for faculty development at NC A&T.

By providing training in speech science at NC A&T, the research team will mentor a more diverse group of future researchers. Increasing the diversity in this field will help to decrease bias in speech-recognition technology and throughout the field.

Viswanathan expressed excitement about developing a meaningful and far-reaching collaboration with NC A&T.

“This project directly creates opportunities for students and faculty from both institutions to work together on questions of common interest,” Viswanathan said. “More broadly, we hope that this will be the first step towards building stronger connections across the two research groups and promoting critical conversations about fundamental issues that underlie the underrepresentation of Black scholars in the field of speech science.”

Ji Min Lee, associate professor of communications sciences and disorders; Anne Olmstead, assistant professor of communications sciences and disorders; Matthew Carlson, associate professor of Spanish and linguistics; Paola “Guili” Dussias, professor of Spanish, linguistics and psychology; Elisabeth Karuza, assistant professor of psychology; and Janet van Hell, professor of psychology and linguistics, will contribute to this project at Penn State. Cassandra Germain, assistant professor of psychology; Deana McQuitty, associate professor of speech communication; and Joy Kennedy, associate professor of speech communication, will contribute to the project at North Carolina Agricultural and Technical State University.

Posted on

Three Ways AI Is Improving Assistive Technology

Wendy Gonzalez for Forbes

Artificial intelligence (AI) and machine learning (ML) are some of the buzziest terms in tech and for a good reason. These innovations have the potential to tackle some of humanity’s biggest obstacles across industries, from medicine to education and sustainability. One sector, in particular, is set to see massive advancement through these new technologies: assistive technology. 

Assistive technology is defined as any product that improves the lives of individuals who otherwise may not be able to complete tasks without specialized equipment, such as wheelchairs and dictation services. Globally, more than 1 billion people depend on assistive technology. When implemented effectively, assistive technology can improve accessibility and quality of life for all, regardless of ability. 

Here are three ways AI is currently improving assistive technology and its use-cases, which might give your company some new ideas for product innovation: 

Ensuring Education For All

Accessibility remains a challenging aspect of education. For children with learning disabilities or sensory impairments, dictation technology, more commonly known as speech-to-text or voice recognition, can help them to write and revise without pen or paper. In fact, 75 out of 149 participants with severe reading disabilities reported increased motivation in their schoolwork after a year of incorporating assistive technology.

This technology works best when powered by high-quality AI. Natural Language Processing (NLP) and machine learning algorithms have the capability to improve the accuracy of speech recognition and word predictability, which can minimize dictation errors while facilitating effective communication from student to teacher or among collaborating schoolmates. 

That said, according to a 2001 study, only 35% of elementary schools — arguably the most significant portion of education a child receives — provide any assistive technology. This statistic could change due to social impact of AI programs. These include Microsoft’s AI for Accessibility initiative, which invests in innovations that support people with neuro-diversities and disabilities. Its projects include educational AI applications that provide students with visual impairments the text-to-speech, speech recognition and object recognition tools they need to succeed in the classroom.  

Better Outcomes For Medical Technology

With a rapidly aging population estimated to top approximately 2 billion over the age of 60 by 2050, our plans to care for our loved ones could rely heavily on AI and ML in the future. Doctors and entrepreneurs are already paving the way; in the past decade alone, medical AI investments topped $8.5 billion in venture capital funding for the top 50 startups. 

Robot-assisted surgery is just one AI-powered innovation. In 2018, robot-assisted procedures accounted for 15.1% of all general surgeries, and this percentage is expected to rise as surgeons implement additional AI-driven surgical applications in operating rooms. When compared to traditional open surgery, robot-assisted surgeons tend to leave smaller incisions, which reduces overall pain and scarring, thereby leading to quicker recovery times.

AI-powered wearable medical devices, such as female fertility-cycle trackers, are another popular choice. Demand for products including diabetic-tracking sweat meters and respiratory patients’ oximeters have created a market that’s looking at a 23% CAGR by 2023. 

What’s more, data taken from medical devices could contribute to more than $7 billion in savings per year for the U.S. healthcare market. This data improves doctors’ understanding of preventative care and better informs post-recovery methods when patients leave after hospital procedures.

Unlocking Possibilities In Transportation And Navigation

Accessible mobility is another challenge that assistive technology can help tackle. Through AI-powered running apps and suitcases that can navigate through entire airports, assistive technology is changing how we move and travel. One example is Project Guideline, a Google project helping individuals who are visually impaired navigate their way through roads and paths with an app that combines computer vision and a machine-learning algorithm to aid the runner alongside a pre-designed path. 

Future runners and walkers may one day navigate roads and sidewalks unaccompanied by guide dogs or sighted guides, gaining autonomy and confidence while accomplishing everyday tasks and activities without hindrance. For instance, developed and spearheaded by Chieko Asakawa, a Carnegie Mellon Professor who is blind, CaBot is a navigation robot that uses sensor information to help avoid airport obstacles, alert someone to nearby stores and assist with required actions like standing in line at airport security checkpoints. 

The Enhancement Of Assistive Technology

These are just some of the ways that AI assistive technology can transform the way individuals and society move and live. To ensure assistive technologies are actively benefiting individuals with disabilities, companies must also maintain accurate and diverse data sets with annotation that is provided by well-trained and experienced AI teams. These ever-updating data sets need to be continually tested before, during and after implementation.

AI possesses the potential to power missions for the greater good of society. Ethical AI can transform the ways assistive technologies improve the lives of millions in need. What other types of AI-powered assistive technology have you come across and how could your company make moves to enter this industry effectively? 

Posted on

There’s Nothing Nuanced About Microsoft’s Plans For Voice Recognition Technology

By Enrique Dans for Forbes

Several media have already reported on Microsoft’s advanced talks over an eventual acquisition of Nuance Communications, a leader in the field of voice recognition, with a long and troubled history of mergers and acquisitions. The deal, which was finally announced on Monday, was estimated to be worth as much as $16 billion, which would make it Microsoft’s second-largest acquisition after LinkedIn in June 2016 for $26.2 billion, but has ended up closing at $19.7 billion, up 23% from the company’s share price on Friday.

After countless mergers and acquisitions, Nuance Communications has ended up nearly monopolizing the market in speech recognition products. It started out as Kurzweil Computer Products, founded by Ray Kurzweil in 1974 to develop character recognition products, and was then acquired by Xerox, which renamed it ScanSoft and subsequently spun it off. ScanSoft was acquired by Visioneer in 1999, but the consolidated company retained the ScanSoft name. In 2001, ScanSoft acquired the Belgian company Lernout & Hauspie, which had previously acquired Dragon Systems, creators of the popular Dragon NaturallySpeaking, to try to compete with Nuance Communications, which had been publicly traded since 1995, in the speech recognition market. Dragon was the absolute leader in speech recognition technology accuracy through the use of Hidden Markov models as a probabilistic method for temporal pattern recognition. Finally, in September 2005, ScanSoft decided to acquire Nuance and take its name.

Since then, the company has grown rapidly through acquisitions, buying as many as 52 companies in the field of speech technologies, in all kinds of industries and markets, creating a conglomerate that has largely monopolized related commercial developments, licensing its technology to all kinds of companies: Apple’s Siri was originally based on Nuance technology — although it is unclear how dependent on the company it remains.

The Microsoft purchase reveals the company’s belief in voice as an interface. The pandemic has seen videoconferencing take off, triggering an explosion in the use of technologies to transcribe voice: Zoom, for example, incorporated automatic transcription in April last year using, so that at the end of each of my classes, I automatically receive not only the video of them, but also their full transcript (which works infinitely better when the class is online than when it takes place in face-to-face mode in a classroom).

Microsoft, which is in the midst of a process of strong growth through acquisitions, had previously collaborated with Nuance in the healthcare industry, and many analysts feel that the acquisition intends to deepen even further into this collaboration. However, Microsoft could also be planning to integrate transcription technology into many other products, such as Teams, or throughout its cloud, Azure, allowing companies to make their corporate environments fully indexable by creating written records of meetings that can be retrieved at a later date. 

Now, Microsoft will try to raise its voice — it has almost twenty billion reasons to do so — and use it to differentiate its products via voice interfaces. According to Microsoft, a pandemic that has pushed electronic and voice communications to the fore is now the stimulus for a future with more voice interfaces, so get ready to see more of that. No company plans a twenty billion dollar acquisition just to keep doing the same things they were doing before.

Posted on

Voice AI Technology Is More Advanced Than You Might Think

For Annie Brown for Forbes

Systems that can handle repetitive tasks have supported global economies for generations. But systems that can handle conversations and interactions? Those have felt impossible, due to the complexity of human speech. Any of us who regularly use Alexa or Siri can attest to the deficiencies of machine learning in handling human messages. The average person has yet to interact with the next generation of voice AI tools, but what this technology is capable of has the potential to change the world as we know it.

The following is a discussion of three innovative technologies are accelerating the pace of progress in this sector.

Conversational AI for Ordering

Experts in voice AI have prioritized technology that can alleviate menial tasks, freeing humans up to engage in high-impact, creative endeavors. Drive-through ordering was early identified by developers as an area in which conversational AI could make an impact, and one company appears to have cracked the code.

Creating a conversational AI system that can handle drive-through restaurant ordering may sound simple: load in the menu, use chat-based AI, and you’ve done it. The actual solutions aren’t quite so easy. In fact, creating a system that works in an outdoor environment—handling car noises, traffic, other speakers—and one that has sophisticated enough speech recognition to decipher multiple accents, genders, and ages, presents immense challenges.

The co-founders of Hi Auto, Roy Baharav and Eyal Shapira, both have a background in AI systems for audio: Baharav in complex AI systems at Google and Shapira in NLP and chat interfacing.

Baharav describes the difficulties of making a system like this work: “Speech handling in general, for humans, is hard. You talk to your phone and it understands you – that is a completely different problem from understanding speech in an outdoor environment. In a drive-through, people are using unique speech patterns. People are indecisive – they’re changing their minds a lot.”

That latter issue illustrates what they call multi-turn conversation, or the back-and-forth we humans do so effortlessly. After years of practice, model training, and refinement, Hi Auto has now installed their conversational AI systems in drive-throughs around the country, and are seeing a 90% level of accuracy.

Shapira forecasts, “Three years from now, we will probably see as many as 40,000 restaurant locations using conversational AI. It’s going to become a mainstream solution.” 

“AI can address two of the critical problems in quick-serve restaurants,” comments Joe Jensen, a Vice President at Intel Corporation, “Order accuracy which goes straight to consumer satisfaction and then order accuracy also hits on staff costs in reducing that extra time staff spends.” 

Conversation Cloud for Intelligent Machines

A second groundbreaking innovation in the world of conversational AI is using a technique that turns human language into an input.

The CEO of Whitehead AI, Diwank Tomer, illustrates the historical challenges faced by conversational AI: “It turns out that, when we’re talking or writing or conveying anything in human language, we depend on background information a lot. It’s not just general facts about the world but things like how I’m feeling or how well defined something is.

“These are obvious and transparent to us but very difficult for AI to do. That’s why jokes are so difficult for AI to understand. It’s typically something ridiculous or impossible, framed in a way that seems otherwise. For humans, it’s obvious. For AI, not so much. AI only interprets things literally.”

So, how does a system incapable of interpreting nuance, emotion, or making inferences adequately communicate with humans? The same way a non-native speaker initially understands a new language: using context.

Context aware AI is building models that can use extra information, beyond the identity of the speaker or other facts. Chatbots are one area which are inherently lacking, and could benefit from this technology. For instance, if a chatbot could glean contextual information from a user’s profile, previous interactions, and other data points, that could be used to frame highly intelligent responses.

Tomer describes it this way, “We are building an infrastructure for manipulating natural language. Something new that we’ve built is chit chat API – when you say something and it can’t be understood, Alexa will respond with, ‘I’m sorry, I can’t understand that.’ It’s possible now to actually pick up or reply with witty answers.”

Tomer approaches the future of these technologies with high hopes: “Understanding conversation is powerful. Imagine having conversations with any computer: if you’re stuck in an elevator, you could scream and it would call for help. Our senses are extended through technology.”

Data Process Automation

Audio is just one form of unstructured data. When collected, assessed, and interpreted, the output of patterns and trends can be used to make strategic decisions or provide valuable feedback.

super.AI was founded by Brad Cordova. The company uses AI to automate the processing of unstructured data. Data Process Automation, or DPA, can be used to automate repetitive tasks that deal with unstructured data, including audio and video files. 

For example, in a large education company, children use a website to read sentences aloud. super.AI used a process automation application to see how many errors a child made. This automation process has a higher accuracy and faster response time than when done by humans, enabling better feedback for enhanced learning.

Another example has to do with personal information (PI), which is a key point of concern in today’s privacy-conscious world, especially when it comes to AI. super.AI has a system of audio reduction whereby it can remove PI from audio, including name, address, and social security numbers. It can also remove copyrighted material from segments of audio or video, ensuring GDPR or CCPA compliance.

It’s clear that the supportive qualities of super.AI are valuable, but when it comes to the people who currently do everything from quality assurance on website product listings to note taking at a meeting, the question is this: are we going too far to replace humans?

Cordova would say no, “Humans and machines are orthogonal. If you see the best chess players: they aren’t human or machine, they’re humans and machines working together. We know intuitively as humans what we’re put on this earth for. You feel good when you talk with people, feel empathy, and do creative tasks.

“There are a lot of tasks where you don’t feel great: tasks that humans shouldn’t be doing. We want humans to be more human. It’s not about taking humans’ jobs, it’s about allowing humans to operate where we’re best and machines aren’t.”

Voice AI is chartering unprecedented territory and growing at a pace that will inevitably transform markets. The adoption rates for this kind of tech may change most industries as we currently know them. The more AI is integrated, the more humans can benefit from it. As Cordova succinctly states, “AI is the next, and maybe the last technology we will develop as humans.” The capacity of AI to take on new roles in our society has the power to let humans be more human. And that is the best of all possible outcomes.

Posted on

What Is The Difference Between Speech Recognition And Voice Recognition?

 By Ratnesh Shinde for Tech Notification

This article has made to let you know a difference between two technologies that are similar but distinct, namely speech recognition and voice recognition technology.

Even though both voice recognition and speech recognition seem like they mean the same thing, they are two very distinct technologies.

Digital assistants such as Amazon’s Alexa, Microsoft’s Cortana, and Apple’s Siri have helped to make these words more widely known throughout the world. In addition to speech recognition, these assistants make use of voice recognition technology as well.

By 2024, the overall number of digital voice assistants in use will reach 8.4 billion units, which is more than the whole population of the world, according to Statista research.

However, there are still many individuals who have questions that need to be answered, so let’s take a deeper look at speech recognition and voice recognition.

What is Speech Recognition?

Because speech recognition is intertwined with voice recognition, if a certain voice is recognized, the speech recognition software may then identify the speech. What is the procedure? Speech recognition can transcribe or caption the words that are coming out of the speaker’s lips by utilizing a variety of speech pattern algorithms and language models. High-quality audio is required for the program to accurately transcribe the speech and achieve high accuracy in the transcription.

The following are the requirements for high accuracy voice recognition:

  • There is only one speaker.
  • There is no background noise.
  • It is advisable to use a high-quality microphone.

When is it necessary to use voice recognition?

To take notes, the text can be transcribed using speech recognition software, which can be used to assist in taking notes.

Auto-generated subtitles, dictaphones, and text relays for deaf and hard-of-hearing individuals are all used to make films more accessible to people with impairments. These services may make it easier for individuals with disabilities to interact with the media and the rest of the world.

What is Voice recognition?

As we all know, speech and voice recognition are two distinct technologies, yet they are interconnected in many ways.

If you train the program to identify a certain voice, it can recognize almost any voice. A variety of phrases are practiced by the user, and the program then utilizes these phrases to identify the speaker, their delivery style, and tone of voice, all of which are important factors in speech recognition. This is the method that is used by default by the vast majority of virtual assistants and voice-to-text apps.

The following are the limitations of speech recognition:

  • The job that has been asked to be executed has limited capability.
  • If the statements are not properly understood, the virtual assistant might request that they be repeated.
  • If a few words are left out, the result might be quite different.
  • When any change in the tone or delivery of the voice is recognized, the accuracy of speech recognition suffers a significant decrease.

When is it necessary to use speech recognition?

Users may verify their identities by speaking aloud as a password. This enhances security while saving money on biometrics.

Operations that are more effective and efficient – The ability to correctly communicate with technology via speech minimizes the need for error scanning and instead enables more accurate tasks to be completed at a faster pace.

Virtual assistants are made possible by the use of voice and speech recognition technology.

What is the significance of the term “Smart Technology” and what are the obstacles to the widespread use of voice technology?

In the smart device business, virtual assistants have emerged as a critical component, since they have become fundamental to how customers engage with their gadgets. And as the industry progresses and its technology improves to a higher degree, businesses are increasingly looking for ways to make greater and better use of “Smart technology.”

However, there are still significant obstacles to the widespread use of speech technology throughout the world.

Accuracy was thought to be one of the most significant obstacles to the widespread adoption of speech technology.

Some people believe that difficulties with an accent (the way you speak) or dialect-related detection will make speech technology adoption more difficult.

So, that’s all there is to know about voice recognition and speech recognition, thank you very much.


Voice recognition and facial recognition are both working their way up to the top of the technological food chain.

Furthermore, these technologies have more potential than just being used as assistants, and audio-to-text Softwares are assisting many industries that are not particularly technologically oriented, such as the healthcare industry, educational facilities, the financial industry, the government, and other similar organizations, among others.

People are becoming increasingly enthusiastic about the prospect of integrating virtual assistants with their software to spur innovation. Are you looking forward to it as well? Well, and other resources may assist you in turning creative ideas into successful projects.

Posted on

Alexa Introduces Voice Profiles for Kids and New AI Reading Tutor


Amazon has augmented Alexa’s voice profile feature with a version aimed specifically at children. Parents and guardians can use the new Alexa Voice Profiles for Kids tool to enable a personalized experience for up to four children per account. The profiles have debuted alongside Reading Sidekick, a new AI-powered tutor to encourage and help children become literate.


Reading Sidekick is the central part of the kid-focused profiles at the moment. Designed for those between the ages of six and nine, Reading Sidekick uses Alexa to help teach a kid to read any of the several hundred titles in its library of supported books, both in digital and physical form. It just required an Echo smart speaker or smart display and an Amazon Kids+ subscription. Amazon Kids+ is what Amazon renamed FreeTime and FreeTime Unlimited and offers exclusive Alexa Skills and other content for $3 a month for Prime members and $5 a month for non-prime members. When a child says, “Alexa, let’s read,” the voice assistant asks what book they want to read and how much they want to read, with choices of taking turns, a little, or a lot. Taking turns means Alexa and the child will trade reading sections, while a little or a lot shifts the ratio one way or the other. Regardless, Alexa will praise their success and even prompt them with the next word if they get stuck.

“With the arrival of Reading Sidekick, we are hopeful we can make reading fun for millions of kids to set them up for a lifetime of learning and a love of reading,” Alexa Education and Learning head Marissa Mierow said. “Alexa provides a welcoming, no-judgment zone and is always ready to help and to read.”


Amazon first debuted voice profiles for Alexa users back in 2017, enabling Alexa to respond differently to the same query based on who is speaking without switching accounts. This made it easier for a family or roommates to share an Alexa device. Third-party developers were given permission to integrate that element into their Alexa skill in 2019, and the voice assistant began applying user contact information to personalize interactions with Alexa last year. The voice recognition feature even expanded to Amazon’s call center platform in December. The voice profiles created for children function largely the same way but with a narrower range of functions.

It would be an impressive feat for Amazon to have Alexa understand children as well as it does adults. The difficulties involved are why children’s speech recognition tech startup SoapBox Labs were formed. SoapBox, which new Voice Activity Detection (VAD) and Custom Wakeword tools in May, builds on a database of thousands of hours of children’s speech and its own deep learning technology to understand the unique patterns and inflections of children’s speech. There’s no denying that there’s a growing demand for kid-focused voice AI, however. Earlier this year, Google released its own reading tutor for kids, but that feature doesn’t have the personalized touch of Amazon’s new profiles. However, it will almost inevitably be included in the cluster of lawsuits Amazon faces over whether Alexa violates children’s privacy. The new features also meant teaching Alexa to understand better how kids speak and the many variations based on location, age, background, and other factors. The microphones in an Echo are also adjusted when the kids’ profile is engaged as they may be farther away or sitting behind a book when using Reading Sidekick.

Posted on

Ditching Keyboards for Voice Technology

By Philips Speech Blog

Consumers and professionals are ditching keyboards for voice technology

We’re big believers in everyone using the power of their voice to get more done in less time and truly making every minute count.

The trend of ditching keyboards for voice has been on the rise for years. The growth in popularity of voice-activated assistants, such as Siri, Alexa and Google, shows that consumers are recognizing the efficiency of using their voice to access, capture and share information.72% of people who use voice search devices claim they have become part of their daily routines. It’s not just consumers adopting the technology either. Lawyers, doctors, insurance agents and professionals across all industries save several hours a week when they implement voice technology into their workflows and document creation processes.

A large part of this growing adoption could be attributed to the improved speed and accuracy of speech recognition technology. In fact, an experiment at Stanford University found that entering text into a mobile device with your voice was three-times faster than typing.

Harnessing the power of your voice – from your desktop or on-the-go

Working with your voice – from quick notes to extensive documents – allows for a number of ways to efficiently collaborate and achieve accurate documentation. This efficiency and accuracy is extremely important in many service industries where accurate and efficient documentation goes hand in hand with speedy service, and therefore happy clients.

Professional recording devices such as the Philips SpeechOne headset and the Philips SpeechMike Premium Air microphone free professionals from their desks while delivering all the capabilities needed to stay productive. The AirBridge Wireless Adaptor, at the size of a quarter, allows for mobile capabilities to help quickly move around the office while maintaining connectivity to your dictation processes.

Voice technology is a versatile tool that is ready to use wherever you are, whenever you need it. Philips SpeechLive is a browser-based dictation and transcription solution that quickly and easily converts your voice to text. Whether you are in the office, at home, on-the-go, or have a unique IT setup, the Philips SpeechLive cloud solution will help to increase flexibility, efficiency and productivity. However you prefer to record your voice, using your smartphone or a handheld recorder, SpeechLive allows you to have plenty of options.

No keyboard? No problem. Typing all day in order to be productive is a thing of the past.

Posted on

5 Ways Voice Technology Is Increasingly Making Itself Heard


Voice technology is among the things that have gotten a big boost from the global pandemic as consumers shift to buying things online and avoiding touching items in public places for fear of contracting COVID-19.

“We’ve seen a huge increase in the use of voice in the home,” Tom Taylor, senior vice president of Amazon’s Alexa unit, told GeekWire in a recent interview. “People do build relationships with Alexa. We have something like 1 million marriage proposals and compliments of love to Alexa. I don’t think anybody has ever done that to a keyboard.”

Voice technology’s adoption was already on the rise in the pre-pandemic period, as  PYMNTS’ and Visa’s How We Will Pay report demonstrated in late 2019. Our study found that about 3 in 10 consumers owned voice assistants compared to just 1.4 out of 10 in 2017. Consumers who made purchases via their smart speakers also rose to 9.6 percent of all consumers as of late 2018 vs. just 7.7 percent a year earlier.

And all of that was before the pandemic made voice assistants even more valuable as their use cases expanded to meet new consumer needs. They’re now showing up in hotel rooms, enabling grocery shopping and beginning to sneak into quick-service restaurants (QSRs) as an enhancement to all those touchscreens that have been going up in recent years. Here are five changes the industry is seeing as these devices come into greater use:

Voice-Tech And Education 

Education has been on a wild ride for the past several months, and voice assistants are becoming part of an increasingly integrated hybridized educational model of online and in-person learning, according to recent reports.

Google Assistant and Alexa-enabled devices had increasingly been making classroom appearances prior to the pandemic as supplemental educational tools for teachers. But now, their use has reportedly expanded to becoming tools for contacting students, sending out information and assignments and even sending parents custom shopping lists with the equipment and tools necessary for home-based learning.

Improving Devices’ eCommerce Abilities 

Amazon has for the past year been developing Alexa Conversations, an enhancement designed to fuse various Alexa skills together into a single distinct user experience.

Amazon Alexa VP Prem Natarajan recently told VentureBeat that will make it possible to buy tickets, make dinner reservations and arrange Uber travel in a single interaction with Alexa instead of having to muddle through multiple ones.

He said smoothing out the voice-commerce experience so customers can make arrangements organically instead of through a series of distinct conversations with their assistants makes the process both more desirable and friction free.

Beefed Up Display Screens 

Google Director of Product Barak Turovsky told VentureBeat that for voice technology to really make it to the next level will require a “multimodal” experience that combines input from text, photos or video via screen.

He said voice alone has limits in surfacing the data that the user wants. Without a screen, there’s no infinite scroll or first page of Google search results, so responses are limited to perhaps three potential results at most, Turovsky said. That often means a voice assistant can miss what the user is looking for.

Published reports say such concerns have pushed both Amazon and Google to increase their smart-display design efforts, emphasizing artificial intelligence (AI) assistants that can both share visual content and respond with voice.

A Consolidating Field Of Competitors 

While Amazon and Google are expanding their offerings, the field of competitors could be growing smaller.

For example, Bixby — Samsung’s entrant into the great virtual voice assistant race — might be disappearing as part of a possible deal between the smartphone maker and Google.

The deal would see Bixby signing off, with Samsung promoting Google’s search site, Google Assistant and Google Play Store apps instead, according to Reuters. If that happens, Google will in one fell swoop get the maker of the world’s bestselling mobile devices into the Android ecosystem.

Samsung has maintained its independence and consistently made efforts to promote its own apps, but Bixby has always at best been a glitchy product that captured limited user interest. And now the pandemic has cooled sales, prompting Samsung to consider giving up on costly development projects like Bixby in search of more profitable prospects.

In a statement, Samsung said that while it is committed to its own services, it “closely works with Google and other partners to offer the best mobile experiences.”

A Big Alexa Upgrade 

Coming soon to a smartphone near you — a revamped Amazon Alexa that will include a customized home screen based on what each individual user does the most with the app.

The upgrade, scheduled for the end of August, aims to increase simplicity, as a frequent complaint about the current Alexa app is its random and superfluous prompts from the home screen, CNBC reported. Users also dislike friction-filled sifting through various menus to access some settings.

However, the planned update promises to put users’ most-used features at the easiest place to access. That means someone who primarily uses Spotify will have a different home screen from someone who uses an Audible book system or a shopping list.

Additionally, the button to open Alexa will now be located at the top of the screen as opposed to the bottom. The upgrade will also reportedly relocate reminders, routines, skills, settings and other features to easier-to-find places on the screen.

The Bottom Line 

Will the upgrades boost Alexa’s popularity and make it a stronger competitor to phones that already come with Apple Siri or the Google Assistant preinstalled? Only time will tell — and consumer preference will decide.

However, the race for voice-technology domination is clearly on, as consumers have opened their minds to the idea of shopping with a voice assistant. Providers are responding by rushing to bring them the rest of the way along on that journey.

Posted on

AGEWISE: Can voice recognition systems help older adults?

For The Winston-Salem Journal

Q: I use a voice recognition system at my home and am wondering if setting one up for my Mom and Dad would be helpful. Can you provide some insight?


Answer: Voice command technology, such as Alexa and Google Home devices, are tools that listen and respond to voice directions. Voice technology requires only an internet connection and a place to plug in. As voice command technology continues to become more popular, we can easily see how it has the potential to be a great resource for our loved ones as they age. These devices can be used to do a variety of tasks to assist older adults.

Each device comes with a set of instructions that can be followed step by step for easy installation. Some devices only have a speaker while others also include a video component. Those voice recognition systems with video, such as an Echo Show, do typically cost more over speaker only systems like the Echo Dot. Prices range from about $50 to $200 with and without video. Keep in mind you may need to purchase additional products if you wish to connect your system to other devices such as televisions, computers, lights, temperature controls, faucets, security systems and audible streaming to name a few.

Technology such as this is especially helpful for seniors with physical limitations and those who live alone. A voice recognition system can be set up to remind a person to take medications, turn off the oven, start the dishwasher, and even help with prompts for activities of daily living. Reminders can be programmed for doctor’s appointments, upcoming visits, and the system can even be used to keep a grocery list up-to-date. If your parents have difficulty seeing texts and emails voice assisted technology can read these messages to them. Another important benefit is the ability to call for help should a loved one fall or otherwise be unable to reach a phone for assistance.

Voice recognition technology provides updates to the weather and can alert you to hurricanes, tornadoes and thunderstorms. Reminders can be set to pay bills and automatic payments can even be set up with these systems. Devices like Alexa can answer random questions such as how many ounces are in a cup, too.

Since loneliness is sometimes a concern for older adults these devices can help people more easily connect without having to log in or enter a phone number. Speaking a command such “Alexa, call Linda at home” can immediately connect your loved one to other friends and family. Connecting visually is a great advantage but, each person would need to have the same device to see each other.

There are a number of virtual assistants available from Apple, Microsoft, Amazon, and Google. Some are geared for assisting seniors specifically. These include Phillo which focuses on health needs and medications. Another assistant, Orbita, aims to introduce voice recognition to health care and ElliQ communicates with seniors by suggesting activities and allowing family members to check in on their loved one. It can even read the person’s body language.

Many of these devices come with a free trial period for you to see what will be the best fit for your situation. Voice recognition technology is here to stay and could possibly help seniors remain in their homes longer living independently. It can also provide peace of mind for caregivers like yourself. A recent article in the New York Times reviewed a number of devices and the benefits for seniors. Visit for more information.

Posted on

Speech recognition vs. voice recognition: What’s the difference?

By Jon Arnold for Search Unified Communications

The topic of speech recognition vs. voice recognition is a great example of two technology terms that appear to be interchangeable at face value but, upon closer inspection, are distinctly different.

The words speech and voice can absolutely be used interchangeably without causing confusion, although it’s also true they have separate meanings. Speech is obviously a voice-based mode of communication, but there are other modes of voice expression that aren’t speech-based, such as laughter, inflections or nonverbal utterances.

Things become more nuanced when you add recognition to both speech and voice. Now, we enter the world of automatic speech recognition (ASR), which is where we tap into applications expressly tailored to extract specific forms of business value from the spoken word. I’ll briefly explain speech recognition vs. voice recognition to illustrate the differences between the two.

Speech recognition focuses on translating what’s said

Speech recognition is where ASR provides rich business value, both for collaboration and contact center applications. The key application here would be speech to text, where the objective is to accurately translate spoken language into written form — a common use case. In its most basic form, ASR’s role is to accurately capture — literally — what was said into text.

More advanced forms of ASR — namely, those harnessing natural language understanding and machine learning — inject AI to support features that go beyond literal accuracy. The objective here is to mitigate the ambiguity that naturally occurs in speech to ascribe intent, where the context of the conversation helps clarify what is being said. Without this, even the most accurate speech-to-text applications can easily generate output that is laughably off the mark from what the speaker is actually talking about.

Voice recognition pinpoints who says what

In a narrow sense, speech recognition could also be referred to as voice recognition, and that description is perfectly acceptable so long as the underlying meaning is clearly understood. However, for those working in speech technology circles, there is a critical distinction between speech recognition vs. voice recognition. Whereas speech recognition pertains to the content of what is being said, voice recognition focuses on properly identifying speakers, as well as ensuring that whatever they say is accurately attributed. In terms of collaboration, this capability is invaluable for conferencing, especially when multiple people are speaking at the same time. Whether the use case is for captioning so remote attendees can follow who is saying what in real time or for transcripts to be reviewed later, accurate voice recognition is now a must-have for unified communications.

In addition to collaboration, voice recognition is playing a growing role in verifying the identity of a speaker. This is a critical consideration when determining who can join a conference call, whether they have permission to access computer programs or restricted files or are authorized to enter a facility or controlled spaces. In cases like these, voice recognition is not concerned with speech itself or the content of what is being said; rather, it’s about validating the speaker’s identity. To that end, it might be more accurate to think of voice recognition as being about speaker recognition, as this is an easier way to distinguish it from speech recognition.