Posted on

Top 10 Speech Recognition Companies to Watch in 2020

By Adalin Beatrice for Analytics Insight

Voice recognition market is estimated to reach US$31.82 billion by 2025

Technology is invading in every sector. New inventions, innovation and devices are making life easier for everyone. Voice recognition technology is one such amazing initiative to look for in the growing innovation era.

Voice recognition also known as speech recognition, is a computer software program or a hardware device with the ability to receive, interpreting and understanding voice and carry out commands. The technology unravels the feature to easily create and control documents by speaking, with the help of technology.

Voice and speech recognition features authorize contactless control to several devices and equipment that deliver input for automatic translation and generates print-ready diction. Voice commands are replied through speech recognition devices. According to a report by Grand View Research, Inc, the global speech and voice recognition market size are estimated to reach US$31.82 billion by 2025 with a CAGR of 17.2% during the forecast period.

The growth of the overall market is primarily driven by factors such as rising acceptance of advanced technology espoused with increasing consumer demand for smart devices, a growing sense of personal data safety and security, and increasing usage of voice-enabled payments and shopping by retailers.

The demand for related devices like voice-activated systems, voice-enabled devices and the voice-enabled virtual assistant system is also expected to spike with the growing invasion of speech-based technology in diverse industries. The major adoption is observed in the banking and automobile sectors. The reason behind this is to counter fraudulent activities and enhance security by embracing voice biometrics for authentication of users. It is expected that the growing Artificial Intelligence (AI)-based systems will trigger the market soon.

Analytics Insight presents the top 10 companies operating in the global speech and voice recognition market in 2020

Nuance Communication

Nuance Communication founded in 2001 provides speech recognition and artificial intelligence products which focus on server and embedded speech recognition, telephone call steering systems, automated telephone directory services, and medical transcription software and systems.

The Massachusetts based company features Nuance recognizer for contact centres, a software that consistently delivers a great consumer service experience while improving self-service system’s containment rate and Dragon Naturally Speaking that creates documents, spreadsheets and email simply by speaking. The company is a partner with 75% of fortune 100 companies and around thousands of healthcare organisations.

Google LLC

Google’s mother company Alphabet was founded in 1998. Google provides a variety of services ranging from search engines, cloud computing, online advertisement technologies, and computer hardware and software. The California headquartered company is a global pioneer in internet-based products and services. Currently, the good-for-all company is stepping into the speech recognition market. It provides a service to convert speech-to-text feature which accurately converts speech into text using an API powered by Google’s AI technology. Google has strong network coverage with 70 offices in 50 countries across the globe., Inc

Amazon headquartered at Washington was founded in 1994. The company functions through three core segments namely, North America, international and amazon web series segments in the retail sales of consumers products and subscription. Amazon focuses on advanced technologies like artificial intelligence, cloud computing, consumer electronics, e-commerce and digital streaming. Amazon transcribe makes it easy for developers to add speech to text capability the application.

Apple, Inc

Apple, Inc is a California headquartered company that is involved in sectors like manufacturing, marketing and selling mobile phones, media devices and computers to consumers worldwide. Apple was found in 1977. The company sells its products and services mostly through direct sales force, online and retail stores and through third-party cellular network carriers, resellers and wholesalers. The Apple speech recognition process involves capturing audio of the user’s voice and sending data to Apple’s servers for processing.

IBM Corporation

IBM Corporation was founded in 1911. The New York headquartered company functions through five key segments such as cognitive solutions, technology services and cloud platforms, global business services, systems and global financing. IBM also manufactures and sells software and hardware. It delivers numerous hosting and consulting services from mainframe processors to nanotechnology domains. IBM’s speech recognition enables systems and applications to understand and process human speech.

Microsoft Corporation

Microsoft Corporation found in 1975 is a pioneer as a technology company. The Redmond, Washington headquartered company is known for its software products that mainly include Internet Explorer, Microsoft Windows OS, Microsoft Office Suite and Edge Web Browser. The Microsoft speech recognition used in Windows 10 helps find the user’s voice by the system.


Agnitio was found in 2004 as a spin-off from the Biometric Recognition Group-ATVS at the Technical University of Madrid. The Madrid, Spain headquartered company is a biometrics technology company that uses unique biometric characteristics to verify an individual’s identity. Agnitio speech recognition program for windows lets the user control their computer by using voice.

Verint Voice Vault

Verint Systems was founded in 2002. The New York headquartered analytics company sells software and hardware products for customer engagement management, security, surveillance and business intelligence. Verint VoiceVault voice biometrics is a standardized approach to mobile identity verification.


iFLYTEK headquartered at Hefei, Anhui, China is an advanced enterprise dedicated to research and development of advanced technologies like intellectual speech and language technologies, speech information services, integration of e-governance systems and development of software and chip products. The company was founded in 1999. The market coverage of the company is spread across North America, Europe, Asia-Pacific, Latin America, Middle East and Africa. iFLYTEK speech recognition provides services such as speech synthesis, automatic speech recognition and speech expansion.


Baidu headquartered at Beijing, China consists of two segments including Baidu Core and iQIYI. The company was founded in 2000. The company has a direct sales market in Beijing, Dongguan, Guangzhou, Shanghai, Shenzen and Suzhou. Baidu speech recognition provides services like streamlining multi-layer truncated attention model (SMLTA) for online automatic speech recognition (ASR).

Posted on

Nuance and MITRE Team Up to Fight Cancer with AI, Speech Recognition and Data Interoperability

From Find Biometrics

In the fight against cancer, data is key. Accurate, robust patient data that is interoperable between use cases not only helps researchers in their efforts to understand the disease, but it also aids oncologists in providing safe and effective treatments. That’s why a recently announced strategic partnership between Nuance Communications and R&D organization MITRE stands to make a difference in the healthcare world.

The collaboration will see Nuance’s Dragon Medical One speech recognition platform working in tandem with MITRE’s mCODE – a set of data elements that, by establishing baseline standards for oncology-related health records, aims to enhance the information available in the war on cancer.

“Every interaction between a clinician and a cancer patient provides high-quality data that could lead to safer care, improved outcomes, and lower costs,” said MITRE’s Chief Medical and Technology Officer, Dr. Jay Schnitzer. “But first, we need data that is standardized and collected in a computable manner so it can be aggregated with data from many other patients and analyzed for best practices. And it must be collected in a streamlined way that doesn’t burden the clinicians. The Nuance offering will enhance this effort.”

Nuance’s Dragon Medical One solution is already playing an important role in patient care. The cloud-based speech recognition technology transcribes medical notes by dictation in accordance with industry standards, while also offering frictionless record retrieval via voice command. This process ensures accurate patient records while relieving administrative pressure on clinics and hospitals without burdening increasingly time-poor doctors. Incorporating mCODE will further improve the solution’s efficacy in oncological use cases.

“Collecting clinical data specific to oncology treatment has traditionally been a difficult task to overcome,” said Diana Nole, EVP and GM of Healthcare at Nuance. “Combining Nuance’s AI expertise with the mCODE data standard provides oncologists with the ability to easily collect and gain access to critical outcome data by simply using their voice to securely dictate notes and search within the EHR using Nuance Dragon Medical One.”

Nuance is an active player in the healthcare space, and this partnership with MITRE is the most recent example of its commitment to the market. In June, the company teamed up with Wolters Kluwer to bring new search features to Dragon Medical One. And in July the company expanded its partnership with Cerner Corporation to encompass its virtual assistant technology.

Posted on

Speech recognition vs. voice recognition: What’s the difference?

By Jon Arnold for Search Unified Communications

The topic of speech recognition vs. voice recognition is a great example of two technology terms that appear to be interchangeable at face value but, upon closer inspection, are distinctly different.

The words speech and voice can absolutely be used interchangeably without causing confusion, although it’s also true they have separate meanings. Speech is obviously a voice-based mode of communication, but there are other modes of voice expression that aren’t speech-based, such as laughter, inflections or nonverbal utterances.

Things become more nuanced when you add recognition to both speech and voice. Now, we enter the world of automatic speech recognition (ASR), which is where we tap into applications expressly tailored to extract specific forms of business value from the spoken word. I’ll briefly explain speech recognition vs. voice recognition to illustrate the differences between the two.

Speech recognition focuses on translating what’s said

Speech recognition is where ASR provides rich business value, both for collaboration and contact center applications. The key application here would be speech to text, where the objective is to accurately translate spoken language into written form — a common use case. In its most basic form, ASR’s role is to accurately capture — literally — what was said into text.

More advanced forms of ASR — namely, those harnessing natural language understanding and machine learning — inject AI to support features that go beyond literal accuracy. The objective here is to mitigate the ambiguity that naturally occurs in speech to ascribe intent, where the context of the conversation helps clarify what is being said. Without this, even the most accurate speech-to-text applications can easily generate output that is laughably off the mark from what the speaker is actually talking about.

Voice recognition pinpoints who says what

In a narrow sense, speech recognition could also be referred to as voice recognition, and that description is perfectly acceptable so long as the underlying meaning is clearly understood. However, for those working in speech technology circles, there is a critical distinction between speech recognition vs. voice recognition. Whereas speech recognition pertains to the content of what is being said, voice recognition focuses on properly identifying speakers, as well as ensuring that whatever they say is accurately attributed. In terms of collaboration, this capability is invaluable for conferencing, especially when multiple people are speaking at the same time. Whether the use case is for captioning so remote attendees can follow who is saying what in real time or for transcripts to be reviewed later, accurate voice recognition is now a must-have for unified communications.

In addition to collaboration, voice recognition is playing a growing role in verifying the identity of a speaker. This is a critical consideration when determining who can join a conference call, whether they have permission to access computer programs or restricted files or are authorized to enter a facility or controlled spaces. In cases like these, voice recognition is not concerned with speech itself or the content of what is being said; rather, it’s about validating the speaker’s identity. To that end, it might be more accurate to think of voice recognition as being about speaker recognition, as this is an easier way to distinguish it from speech recognition.

Posted on

AI is capturing the legal industry’s attention

Mark Geremia for Nuance

The mystique around AI technology is driving a tendency for lawyers, especially those working in small practices, to believe that it’s inaccessible. And, it is a luxury that only larger legal firms and departments can afford. The increased adoption and availability of AI solutions is proving that it is not only accessible but may be necessary.

Firms of all sizes are turning to AI-powered technologies to steer them towards innovative approaches to meet their clients’ needs and automate business processes. Legal tech spending hit $1 billion last year, with lawyers embracing new tools like case management software such as eDiscovery, mandated in some states. AI-based solutions are now being used to automate processes like patent tracking and are extending into services like live video-streaming to better connect with clients.

As with every innovation, the legal industry finds both pros and cons to adopting new technologies. In a recent Forbes article, legal professionals are battling both the benefits of deploying AI-based solutions into their practices, as well as the human toll these may have. The fear that these solutions can eliminate positions, like paralegal and legal research and reduce the number of billable hours a lawyer can charge is substantial.

Regardless of this, law office productivity software, in particular, continues to be in great demand. For a profession that is highly document-based, tools like customized legal speech recognition offer many benefits when it comes to creating and managing legal documentation. The ability to easily dictate or transcribe audio files gives lawyers tremendous flexibility in ensuring comprehensive and accurate data is captured and distributed within critical practice and case management systems.

AI can seem intimidating. As new tools and technologies emerge faster than ever, it can feel hard to keep up. Legal professionals realize that this trend will continue and they need to embrace solutions that will empower them to be more productive and meet the evolving expectations of clients.

Posted on

How Law Enforcement Benefits from Speech Recognition Tech

By Ed McGuiggan for State Tech

The value and importance of police reports cannot be understated. From traffic and collision reports to those documenting theft, injuries and arrests, police reports are not only highly scrutinized by prosecutors, courts, media and insurance companies, they’re also essential to ongoing investigations. Police officers spend a significant amount of the workday managing these reports and other such documentation. 

It’s now possible to deploy voice-enabled technologies to provide officers with an alternative to the traditional, manual methods of creating reports. Officers simply speak to create detailed, accurate incident reports, using the power of their voices in place of typing. Reports created this way have been shown to take a third of the time that manual data entry would typically require and provide enormous safety benefits for officers out in the field.

Public Safety Officers Are Consumed with Incident Reporting 

One law enforcement survey conducted by Nuance uncovered just how extensive the documentation burden is: More than half of the surveyed public safety officials reported spending more than three hours of each shift on paperwork. 

In addition, more than 70 percent of survey respondents said they spend at least one hour in their patrol vehicles to complete a single incident report. Human memories can be faulty, unfortunately, and over that hour, it can be easy to forget to include details that may factor into a case or outcome. Add in multiple incidents and calls, and both memory recall and the ability to decipher hastily prepared handwritten notes can fade.

In other words, officers are dedicating too much of their days to documentation and administrative work — and it’s all time they would prefer to spend on more mission-critical, proactive policework that improves the safety and security of their communities.

Spending extra time capturing accurate, comprehensive and detailed information may mean officers are “heads down” in the field, a scenario that diminishes situational awareness and can have negative consequences for their own safety and that of the public. Consider even the seemingly routine task of entering data into a records management system; if officers lose focus on their surroundings, they can be more prone to an accident or ambush.

Voice-Powered Tools Give Officers Control of Their Time

Law enforcement professionals are ready for solutions to help them regain command of their time while having a positive impact on safety, community service and report quality. Voice-enabled technologies can be the answer, and can make incident reporting faster, safer and more efficient. 

Although they’re certainly not a new technology (the first speech recognition platforms were developed in the 1950s), they have reached an inflection point in recent years, culminating in a wide range of applications and devices for use at home and at work.

Today’s speech recognition solutions continue to push the boundaries of what’s possible. Deep learning technologies help advanced speech engines achieve high levels of accuracy, even accounting for speakers’ accents and environments with background noise. Specialized platforms purpose-built for healthcare, financial services and other industries have emerged, and the same is true for law enforcement. 

The voice-enabled process can also help departments reduce their dependence on outsourced transcription services — reducing the costs associated with this process while avoiding the typical turnaround times, helping ensure that reports are available in central systems in real time. Because there’s simply no room for inaccurate, incomplete or delayed reports, police departments that use speech recognition are in a better position to meet reporting deadlines and keep criminal proceedings on track.

Some speech-enabled platforms can be integrated with departmental computer-aided dispatch and records management systems. In this way, officers can use their voices to enter incident details into the system, conduct license plate lookups and otherwise navigate within and between forms while more quickly delivering critical information out in the field.

Good police work is often reflected in good police reports. By leveraging speech recognition rather than traditional keyboard entry, officers can create detailed incident reports up to three times more quickly without sacrificing any level of detail or specificity. They’ll spend less time tethered to computers — either at the station or on patrol — and more time keeping communities safe.