Posted on

Four in Five Legal Firms Looking to Invest in Speech Recognition

By Lawyer Monthly

A full 82% of legal firms aim to invest in speech recognition technology going forward, according to a research report from Nuance Communications Inc.

Censuswide was commissioned to conduct a survey of 1,000 legal professionals and 20 IT decision-makers in the UK, which was carried out from 23 June to 25 June. Respondents were asked questions regarding their use of technology after the government recommended that offices close earlier this year, and whether they felt properly equipped to work remotely.

25% of legal professionals did not feel properly equipped for remote work when the government advice came down earlier this year. When asked in the Censuswide survey, 56% of respondents reported that they lacked the adequate productivity tools to do their jobs as effectively from home as they could in the office.

However, 80% of respondents who used speech recognition technology for document creation in some form during this period said that they felt properly equipped.

It was also discovered that, in cases where they did not utilise voice recognition software tools, 67% of legal professionals reportedly spent between 2 and 4 hours a day typing. Only 19% made use of internal typists, and 5% used external transcription services on a regular basis.

82% of organisations surveyed said that they were looking to invest further in voice recognition technology going forward, and 62% of legal professionals who were not currently using them said that they would in future.

“The pandemic has accelerated a trend that was already underway, as many modern legal firms move to embrace new ways of working and make the most of digitalisation. In this time of economic uncertainty, legal professionals are under more pressure than ever to deliver high quality outputs – including documents – at speed, all whilst upholding the highest standards of data security,” said Ed McGuiggan, General Manager at Nuance Communications.

McGuiggan noted that speech recognition was likely to become an essential tool in order to cope with the legal profession’s new demands. “While it is undeniable that recent months have brought challenges for the legal sector, they have also presented an opportunity to further reform some outdated methods and attitudes,” he said.

Posted on

10 hospital innovation leaders share the No. 1 tech device they couldn’t live without at work

Katie Adams for Becker’s Hospital Review

Hospital innovation executives know better than most people that smart applications of technology can save time and simplify processes — even to the point where users become reliant. 

Here, 10 digital innovation leaders from hospitals and health systems across the country share the tech device or software they reach for all day long at their jobs.

Editor’s note: Responses have been lightly edited for clarity and length.

Daniel Durand, MD, chief innovation officer at LifeBridge Health (Baltimore): It’s not a super new tech device, but automated speech recognition and dictation software. It is very important for my specialty and an increasing number of physicians and keeps getting better every year.

Omer Awan, chief data and digital officer at Atrium Health (Charlotte, N.C.): My iPhone.

Muthu Krishnan, PhD, chief digital transformation officer at IKS Health (Burr Ridge, Ill.): My work laptop. Our capability to connect from anywhere securely helps me keep my work (and meeting schedule) in sync with my colleagues, partners and clients.

Peter Fleischut, MD, senior vice president and chief transformation officer at NewYork-Presbyterian Hospital (New York City): My phone.

Nick Patel, MD, chief digital officer at Prisma Health (Columbia, S.C.): My tablet PC. It’s in my bag everywhere I go; I can do everything on it. I can access my EHR, my whole suite of Office 365 including Teams and Skype for business. I love it.

Aaron Martin, executive vice president and chief digital officer at Providence (Renton, Wash.): Probably my Macbook. 

John Brownstein, PhD, chief innovation officer at Boston Children’s Hospital: Zoom.

Tom Andriola, vice chancellor of IT and data at UC Irvine (Calif.): I would still say my laptop — sorry, I know that’s uninteresting.

Lisa Prasad, vice president and chief innovation officer at Henry Ford Health System (Detroit): My Mac.

Sara Vaezy, chief digital strategy and business development officer at Providence (Renton, Wash): My iPhone. I can do 85 percent of what I need to do for my job on it.

Posted on

How voice technology can help banks manage risk

By Tom Rimmer for Financier Worldwide

It comes as no surprise that the pandemic has taken its toll on financial institutions (FIs). Banks are always on the lookout for ways to cut costs to improve their operational margins. The lockdown, though, has seen FIs having to rapidly invest in technology to align with remote working practices. In some cases, it was the first time these businesses had all staff working remotely, which only added to their compliance challenges. With changing consumer demands and many still staying off the high streets because of social distancing measures, banks will increasingly need to adopt more solutions to engage remotely with customers, whether that is via the phone or through video conferencing tools.

As customers transition to a more digital-first banking approach, banks are going to have to weigh up the need for human interaction and the convenience of online channels when connecting with them. Ultimately, people like to speak to other humans, especially when something is going wrong.

With fewer physical places for customers to interact with their banks in the current climate, call centres will undoubtedly find their call volumes increasing. The Banking & Payments Federation Ireland (BPFI), for example, said member banks had experienced a 400 percent increase in calls at the beginning of the crisis. The challenge then comes in how technology can aid banks in ensuring customer churn is kept low, issues are flagged immediately, and compliance needs are met.

Regulatory needs

Since the 2008 financial crisis, the number of regulations FIs face has drastically increased. With more regulations being added every year, the financial services firms that deal with personal or sensitive information encounter increasing barriers in being able to deliver their products or services. These institutions need to implement new systems and solutions to manage risk to their customers’ personal data.

This is a major task – not to mention that the ramifications of non-compliance with regulations can have a direct impact on revenues. In 2019 alone, the Financial Conduct Authority (FCA) issued over £38bn worth of fines in the UK for compliance, legal and governance-related issues.

However, fines are not the only concern. The impact on brand reputation and share prices can have a more detrimental effect overall. If an organisation faces a regulatory breach, it runs the risk of discouraging potential new customers, and losing its existing customers in the process. As brand reputation can be lost in an instant, protecting the brand is one of the most important challenges businesses face when presented with a compliance fine.

The power of voice technology

Due to the immense volume of contact centre calls, compliance has become a significant, growing challenge. FIs’ contact centres have strict regulations to follow, such as protecting credit card data (the Payment Card Industry Data Security Standard (PCI DSS)) and protecting customer data (the General Data Protection Regulation(GDPR)). The FCA, through the COBS 11.8 regulation, also states that banks need to record all customer interactions. These organisations need to not only follow these rules but also need to be able to prove their compliance in case of audit.

The problem comes in where, unlike text, it is extremely challenging and time consuming to extract useful information from audio recordings. However, with the use of voice technology, FIs can easily locate and replay stored recordings automatically. They will then be able to evaluate and categorise every customer interaction into groups that are relevant to specific compliance regulations which can then be addressed appropriately.

Adding to that, as call recordings need to be easily accessible upon request, whether from a customer or the auditor, there needs to be a notetaking and more in-depth record keeping element. Through sophisticated voice technology, this is easy. The technology enables other capabilities to be facilitated, such as indexing of conversations, searchability and timestamping of calls.

Voice technology and RegTech

With an increasing amount of regulations to adhere to, the burden to understand, manage and protect customers’ voice data is more important now than ever before. Regulatory technology (RegTech) is set to make up 34 percent of all regulatory spending by the end of 2020, according to KPMG.

A key component of RegTech is voice technology, transforming the unstructured voice data into text. This can then be used to find insights and flag any compliance issues which is essential to FIs and their ability to remain compliant. Using speech recognition technology for regulatory compliance is about delivering monitoring at scale while protecting the business and its customers.

The technology not only ensures that historical archives of voice data are transcribed for analysis, but any issues or problems that happen on the call can be resolved in near real-time. The system will automatically transcribe and analyse the customer’s words, can offer prompts, information and can even suggest escalating the call to a senior staff member if needed. This capability minimises risk significantly to FIs.

Ultimately, voice technology has the potential to reduce fines, speed up investigations and protect the brand. It also saves time by enabling all voice data to be transcribed quickly and automatically, a process that before the advances in automatic speech recognition (ASR) was difficult and time consuming. This gives FIs a better understanding of their customer, which not only aids in mapping the customer journey, their interactions and changing sentiment, but also to comply with various regulations. This is essential for brand reputation, share price security and delivering a better customer service amid changing regulations.

Posted on

Researchers claim masks muffle speech, but not enough to impede speech recognition

By Kyle Wiggers for Venture Beat

Health organizations including the U.S. Centers for Disease Control and Prevention, the World Health Organization, and the U.K. National Health Service advocate wearing masks to prevent the spread of infection. But masks attenuate speech, which has implications for the accuracy of speech recognition systems like Google Assistant, Alexa, and Siri. In an effort to quantify the degree to which mask materials impact acoustics, researchers at the University of Illinois conducted a study examining 12 different types of face coverings in total. They found that transparent masks had the worst acoustics compared with both medical and cloth masks, but that most masks had “little effect” on lapel microphones, suggesting existing systems might be able to recognize muffled speech without issue.

While it’s intuitive to assume mask-distorted speech would prove to be challenging for speech recognition, the evidence so far paints a mixed picture. Research published by the Educational Testing Service (ETS) concluded that while differences existed between recordings of mask wearers and those who didn’t wear masks during an English proficiency exam, the distortion didn’t lead to “significant” variations in automated exam scoring. But in a separate study, scientists at Duke Kunshan University, Lenovo, and Wuhan University found an AI system could be trained to detect whether someone’s wearing a mask from the sound of their muffled speech.

A Google spokesperson told VentureBeat there hasn’t been a measurable impact on the company’s speech recognition systems since the start of the pandemic, when mask-wearing became more common. Amazon also says it hasn’t observed a shift in speech recognition accuracy correlated with mask-wearing.

The University of Illinois researchers looked at the acoustic effects of a polypropylene surgical mask, N95 and KN95 respirators, six cloth masks made from different fabrics, two cloth masks with transparent windows, and a plastic shield. They took measurements within an “acoustically-treated” lab using a head-shaped loudspeaker and a human volunteer, both of whom had microphones placed on and near their lapel, cheek, forehead, and mouth. (The head-shaped loudspeaker, which was made of plywood, used a two-inch driver with a pattern close to that of a human speaker.)

After taking measurements without face coverings to establish a baseline, the researchers set the loudspeaker on a turntable and rotated it to capture various angles of the tested masks. Then, for each mask, they had the volunteer speak in three 30-second increments at a constant volume.

The results show that most masks had “little effect” below a frequency of 1kHz but were muffled at higher frequencies in varying degrees. The surgical mask and KN95 respirator had peak attenuation of around 4dB, while the N95 attenuated at high frequencies by about 6dB. As for the cloth masks, material and weave proved to be the key variables — 100% cotton masks had the best acoustic performance, while masks made from tightly woven denim and bedsheets performed the worst. Transparent masks blocked between 8dB and 14dB at high frequencies, making them by far the worst of the bunch.

“For all masks tested, acoustic attenuation was strongest in the front. Sound transmission to the side of and behind the talker was less strongly affected by the masks, and the shield amplified sound behind the talker,” the researchers in a paper describing their work. “These results suggest that masks may deflect sound energy to the sides rather than absorbing it. Therefore, it may be possible to use microphones placed to the side of the mask for sound reinforcement.”

The researchers recommend avoiding cotton-spandex masks for the clearest and crispest speech, but they note that recordings captured by the lapel mic showed “small” and “uniform” attenuation — the sort of attenuation that recognition systems can easily correct for. For instance, Amazon recently launched Whisper Mode for Alexa, which taps AI trained on a corpus of professional voice recordings to respond to whispered (i.e., low-decibel) speech by whispering back. An Amazon spokesperson didn’t say whether Whisper Mode is being used to improve masked speech performance, but they told VentureBeat that when Alexa speech recognition systems’ signal-to-noise ratios are lower due to customers wearing masks, engineering teams are able to address fluctuations in confidence through an active learning pipeline.

In any case, assuming the results of the University of Illinois stand up to peer review, they bode well for smart speakers, smart displays, and other voice-powered smart devices. Next time you lift your phone to summon Siri, you shouldn’t have to ditch the mask.

Posted on

Why Voice Tech Will Be the Post-Crisis Standard — and Not Just for Ordering Pizza

Shafin Tejani for Entrepreneur

My kids, ages 8 and 5, are showing me the future. When I want to watch a movie or turn out the lights, I instinctively reach for the remote or flick a switch. My children find it far more natural to just ask Siri for Peppa Pig, or tell Alexa to darken the room. Tapping a keyboard or clicking a mouse? Lame and old-fashioned. Why not just talk to the machines around us like we talk to each other?

Of course, right now talking — rather than touching — also has serious safety upsides. Voice tech adoption has accelerated as the coronavirus pandemic makes everyone touchy about how sanitary it is to poke buttons and screens. But the reality is the 2020s were poised to be the decade of voice technology well before the crisis hit.  

Indeed, thanks to a convergence of technology, necessity and demographic shifts, voice is in the unique position to become not just increasingly popular but the dominant user interface going forward. Before long, we’ll all be conversing with our devices pretty much non-stop, and to do much more than just set timers and fetch weather reports. 

And much like the desktop software industry back in the day and smartphone apps after that, a multibillion-dollar business ecosystem is about to surge around voice tech — at least, for entrepreneurs and businesses ready to ride the voice wave.   

How voice tech went from talk to action

Getting to the point where we can casually ask our Apple Watches for nearby dinner recommendations is no small feat. It required the integration of decades of advancements in AI-driven natural-language-processing, speech recognition, computing horsepower, and wireless networking, to name just a few building blocks. 

And yet, we’re just starting to grasp the potential of these technologies. Voice is the ultimate user interface because it’s not really a UI, but part of what we are as humans and how we communicate. There’s almost no learning curve required like there is when people take typing classes. Voice-enabled machines learn to adapt to our natural behaviors rather than the other way around. My kids love joking with Siri — nobody clowns around with a keyboard.

The business model around voice tech is crystallizing, as well. Developing AI and related technologies is complex and costly, so mega-capitalized giants like Google, Apple and Amazon have built an insurmountable first-mover advantage and dug a moat behind them. But they’ve also created countless lucrative niches in their ecosystems for other companies. 

Just as the iPhone gave birth to a $6.3 trillion dollar mobile app economy, platforms like Alexa and Google Assistant have already created opportunities for developers to create more than 100,000 Alexa “skills” and 4000 Google Assistant apps or actions. In the years ahead, that ecosystem will likely grow to rival traditional apps in number and value.  

The coronavirus pandemic is further boosting the adoption of voice-enabled technology, with 36% of U.S. smart-speaker owners reporting they’ve increased their use of their devices for news and information. And hygienic concerns are bringing contactless technologies like voice-controlled elevators out of the realm of fiction (and sketch comedies) and into offices and public spaces, so people don’t have to touch the same buttons and keypads as countless strangers.

How voice can take us “back to the future” in terms of human interaction

Yet for all the advances we’ve achieved, we’re still in the Voice 1.0 era. We’re mostly just commanding our devices to execute simple tasks, like setting alarms or telling us sports scores. In reality, this is just the beginning of what’s possible.

Machine learning underpins voice technology, and the AI gets smarter as we feed it more data. The number of voice-enabled devices in use is soaring — sales of smart speakers increased by 70% between 2018 and 2019 — flooding computers with more data to learn from. And that doesn’t count the billions of smartphone users talking to Siri and Google Assistant. Machines are growing much smarter, much faster.

Amazon and Google may soon take machines’ conversational skills to a deeper level. Both companies have filed patents for technology to read emotions in people’s voices. Marketers might salivate over the prospect of advertising products that suit how customers are feeling at the moment (“You sound hangry — how about a takeout pizza?”), but the applications for emotionally attuned bots don’t have to be so crassly commercial.

Spike Jonze’s movie Her, for example, tells the story of a lonely writer who develops a passionate relationship with his computer operating system, Samantha, as Samantha learns to become more conscious, self-aware and emotionally intelligent.

Robotic companionship seemed far-fetched when the film came out in 2013, but when this year’s pandemic locked millions down into isolation, hundreds of thousands downloaded Replika, a chatbot phone app that provides friendship and human-like conversation. People can develop genuine attachment to conversant machines, as seniors do with Zora, a human-controlled robot caregiver.

Why the booming voice market is just beginning

Coming months and years will see not only improved tech, but an expansion of voice to nearly all areas of business and life. Ultimately, voice technology isn’t a single industry, after all. Rather, it’s a transformative technology that disrupts nearly every industry, like smartphones and the internet did before. The voice and speech recognition market is expected to grow at a 17.2% compound annualized rate to reach $26.8 billion by 2025. Meanwhile, AI — the technology that underpins voice, and in many respects parallels its true potential — is estimated to add $5.8 trillion in value annually.

But unlike other technological advances that have radically changed how we live, voice technologies promise to make machines and people alike behave more like humans. In terms of adoption rates, applications and market, the possibilities are enough to leave one, well, speechless.

Posted on

Speech recognition vs. voice recognition: What’s the difference?

By Jon Arnold for Search Unified Communications

The topic of speech recognition vs. voice recognition is a great example of two technology terms that appear to be interchangeable at face value but, upon closer inspection, are distinctly different.

The words speech and voice can absolutely be used interchangeably without causing confusion, although it’s also true they have separate meanings. Speech is obviously a voice-based mode of communication, but there are other modes of voice expression that aren’t speech-based, such as laughter, inflections or nonverbal utterances.

Things become more nuanced when you add recognition to both speech and voice. Now, we enter the world of automatic speech recognition (ASR), which is where we tap into applications expressly tailored to extract specific forms of business value from the spoken word. I’ll briefly explain speech recognition vs. voice recognition to illustrate the differences between the two.

Speech recognition focuses on translating what’s said

Speech recognition is where ASR provides rich business value, both for collaboration and contact center applications. The key application here would be speech to text, where the objective is to accurately translate spoken language into written form — a common use case. In its most basic form, ASR’s role is to accurately capture — literally — what was said into text.

More advanced forms of ASR — namely, those harnessing natural language understanding and machine learning — inject AI to support features that go beyond literal accuracy. The objective here is to mitigate the ambiguity that naturally occurs in speech to ascribe intent, where the context of the conversation helps clarify what is being said. Without this, even the most accurate speech-to-text applications can easily generate output that is laughably off the mark from what the speaker is actually talking about.

Voice recognition pinpoints who says what

In a narrow sense, speech recognition could also be referred to as voice recognition, and that description is perfectly acceptable so long as the underlying meaning is clearly understood. However, for those working in speech technology circles, there is a critical distinction between speech recognition vs. voice recognition. Whereas speech recognition pertains to the content of what is being said, voice recognition focuses on properly identifying speakers, as well as ensuring that whatever they say is accurately attributed. In terms of collaboration, this capability is invaluable for conferencing, especially when multiple people are speaking at the same time. Whether the use case is for captioning so remote attendees can follow who is saying what in real time or for transcripts to be reviewed later, accurate voice recognition is now a must-have for unified communications.

In addition to collaboration, voice recognition is playing a growing role in verifying the identity of a speaker. This is a critical consideration when determining who can join a conference call, whether they have permission to access computer programs or restricted files or are authorized to enter a facility or controlled spaces. In cases like these, voice recognition is not concerned with speech itself or the content of what is being said; rather, it’s about validating the speaker’s identity. To that end, it might be more accurate to think of voice recognition as being about speaker recognition, as this is an easier way to distinguish it from speech recognition.

Posted on

Why Investing in EHR Solutions Featuring Speech Recognition is Beneficial For Providers

By MarketScale

The healthcare industry is constantly in search of new software and EHR solutions to increase patient care and minimize the amount of time clinicians spend on these clerical tasks, promoting better outcomes.

According to the American Medical Association, about 44% of physicians experience burnout in their practice, and part of that is attributed to administrative duties and data entry. In order to help combat this burnout, we’re going to explore the wide variety of benefits speech recognition can provide.

Today, voice recognition is a critical technology within electronic health record (EHR) solutions. Voice recognition can save critical time and money to further help improve productivity.

The Benefits of Investing in Speech Recognition

EHR speech recognition allows physicians to capture notes at the time of care or in between appointments, which can maximize efficiency.

There are many benefits to investing in EHR speech recognition, including:

  • Reduced Overhead Costs
    The average physician can save anywhere between $30,000-$50,000 a year by eliminating transcription services and using EHR voice dictation.
  • Faster Turnaround Times
    Less time spent on patient documentation means more time spent face-to-face with patients.
  • Reduced Manual Labor and Data Entry
    EHR voice dictation provides quicker outputs and minimizes stress.
  • Improved Accuracy
    EHR speech recognition produces higher quality documentation and gets better with time.
  • Better Overall Communication
    The enhanced technology allows for better communication with referring physicians and insurance companies, which leads to higher reimbursement rates.
  • Increased Clinician Satisfaction
    The reduced workload has led to an overall positive experience with caregivers.

EHR solutions that are paired with voice recognition technology eliminate the need for dictation and transcription services altogether. Using artificial intelligence, voice recognition technology is programmed to receive command-based responses from physicians about patient symptoms, procedures and treatment plans.

The software is then able to process and capture documentation narratives, including medical terminology and medications, as well as detect various accents and dialects to deliver automated outputs into specific data fields.

With conventional EHR solutions that don’t utilize voice recognition, physicians can spend, in some cases, up to 12 minutes navigating the system and manually entering data to process notes for one patient.

With voice dictation, that time gets significantly reduced to 90 seconds or less, because it’s easier to use and requires less error-management.

Case Studies Highlight Real-World Performance

A recent KLAS Performance Report gathered feedback from customers of three speech recognition providers.

Overall findings showed that customers were highly satisfied, noting “positive usability, engaged support, price, integration, and functionality.”

Posted on

CMU researcher says voice recognition can spot COVID-19 cough

By Jim Nash for Bio Metric Update

At a time when alleged treatments for the novel coronavirus are multiplying like fungi in a Petri dish, it can be difficult to take at face value a report that voice recognition systems might be able to diagnose COVID-19. However, voices have been analyzed by biometrics algorithms in the past to diagnose injury and illness with some success.

The magazine Futurism is quoting a Carnegie Mellon University researcher as saying a team he was on has created a prototype biometrics application that can tell after they speak into a smartphone if someone has COVID-19.

The COVID-19 voice detector, reportedly live as of April 3, is not available now, as the site is “undergoing construction and [an] approval process.”

According to a report at Android Authority, users are prompted to cough several times, and recite several vowels and the alphabet. A score is displayed illustrating how likely you are to have COVID-19, as judged by the algorithm.

In fairness to the Carnegie Mellon research team, people using the algorithm when it re-emerges should be sure of only one thing. They will be training an algorithm to better identify the dry, repetitive cough that along with a fever of 100 degrees or above are the surest signs of illness.

The researcher said that the algorithm is “still highly experimental” and not approved by any U.S. government health agencies.

This is not a new avenue of thought.

The U.S. Army funded research into using voice recognition software to diagnose long-term battlefield illnesses and injuries including post-traumatic stress disorder and traumatic brain injury. Researchers found that their algorithm found 18 telltale voice features indicating illness, and it was correct 89.1 percent of the time.

Posted on

How Law Enforcement Benefits from Speech Recognition Tech

By Ed McGuiggan for State Tech

The value and importance of police reports cannot be understated. From traffic and collision reports to those documenting theft, injuries and arrests, police reports are not only highly scrutinized by prosecutors, courts, media and insurance companies, they’re also essential to ongoing investigations. Police officers spend a significant amount of the workday managing these reports and other such documentation. 

It’s now possible to deploy voice-enabled technologies to provide officers with an alternative to the traditional, manual methods of creating reports. Officers simply speak to create detailed, accurate incident reports, using the power of their voices in place of typing. Reports created this way have been shown to take a third of the time that manual data entry would typically require and provide enormous safety benefits for officers out in the field.

Public Safety Officers Are Consumed with Incident Reporting 

One law enforcement survey conducted by Nuance uncovered just how extensive the documentation burden is: More than half of the surveyed public safety officials reported spending more than three hours of each shift on paperwork. 

In addition, more than 70 percent of survey respondents said they spend at least one hour in their patrol vehicles to complete a single incident report. Human memories can be faulty, unfortunately, and over that hour, it can be easy to forget to include details that may factor into a case or outcome. Add in multiple incidents and calls, and both memory recall and the ability to decipher hastily prepared handwritten notes can fade.

In other words, officers are dedicating too much of their days to documentation and administrative work — and it’s all time they would prefer to spend on more mission-critical, proactive policework that improves the safety and security of their communities.

Spending extra time capturing accurate, comprehensive and detailed information may mean officers are “heads down” in the field, a scenario that diminishes situational awareness and can have negative consequences for their own safety and that of the public. Consider even the seemingly routine task of entering data into a records management system; if officers lose focus on their surroundings, they can be more prone to an accident or ambush.

Voice-Powered Tools Give Officers Control of Their Time

Law enforcement professionals are ready for solutions to help them regain command of their time while having a positive impact on safety, community service and report quality. Voice-enabled technologies can be the answer, and can make incident reporting faster, safer and more efficient. 

Although they’re certainly not a new technology (the first speech recognition platforms were developed in the 1950s), they have reached an inflection point in recent years, culminating in a wide range of applications and devices for use at home and at work.

Today’s speech recognition solutions continue to push the boundaries of what’s possible. Deep learning technologies help advanced speech engines achieve high levels of accuracy, even accounting for speakers’ accents and environments with background noise. Specialized platforms purpose-built for healthcare, financial services and other industries have emerged, and the same is true for law enforcement. 

The voice-enabled process can also help departments reduce their dependence on outsourced transcription services — reducing the costs associated with this process while avoiding the typical turnaround times, helping ensure that reports are available in central systems in real time. Because there’s simply no room for inaccurate, incomplete or delayed reports, police departments that use speech recognition are in a better position to meet reporting deadlines and keep criminal proceedings on track.

Some speech-enabled platforms can be integrated with departmental computer-aided dispatch and records management systems. In this way, officers can use their voices to enter incident details into the system, conduct license plate lookups and otherwise navigate within and between forms while more quickly delivering critical information out in the field.

Good police work is often reflected in good police reports. By leveraging speech recognition rather than traditional keyboard entry, officers can create detailed incident reports up to three times more quickly without sacrificing any level of detail or specificity. They’ll spend less time tethered to computers — either at the station or on patrol — and more time keeping communities safe.

Posted on

Evolution of Speech Recognition Technology

By Sahil Chauhan for Read Write

Communication plays an essential role in our lives. Humans started with signs, symbols, and then made progress to a stage, where they began communicating with languages. Later computing and communication technologies came. Machines began communicating with humans and in some cases, with themselves also. The communication created the world of the internet, or as we technically know the Internet of Things(IoT). Here is the evolution of speech recognition technology that involves machine learning.

The Evolution of Speech Recognition Technology and Machine Learning

The internet gave rise to new ways of using data. Using this, we can communicate directly or indirectly with machines by training them, which is known as Machine Learning. Before this, we have to access a computer to communicate with machines.

Research and development are beginning to eliminate some of the use of computers to a great extent. We know this technology as Automatic Speech Recognition. Based on Natural Language Processing (NLP), it allows us to interact with machines using our natural language in which we speak.

The initial research in the field of Speech Recognition has been successful. Since then, speech scientists and engineers aim to optimize the speech recognition engines correctly. The ultimate goal is to optimize the machine’s interaction according to the situations so that error rates can be reduced and efficiency can be increased.

Automatic Speech Recognition and its Applications

Automatic Speech Recognition(ASR) technology is a combination of two different branches – Computer Science and Linguistics. Computer Science to design algorithms and to program and Linguistics to create a dictionary of words, sentences, and phrases.

Generating Speech Transcriptions

The first stage of development starts with speech transcriptions, where the audio is converted into text, i.e., speech to text conversion. After this, the system removes unwanted signals or noise by filtering. We have different voice speeds while saying a word or sentence, so the general model of speech recognition is designed to account for those rate changes.

Later the signals are further divided to identify phonemes. Phonemes are the letters that have the same level of airflow, like ‘b’ and ‘p.’ After this, the program tries to match the exact word by making a comparison with words and sentences that are stored in the linguistics dictionary. Then, the speech recognition algorithm uses statistical and mathematical modeling to determine the exact word.

Speech Recognition systems are of two types, at present.

One type of system is accomplished with learning mode and other as a human dependent system. With developments in Artificial Intelligence(AI) and Big Data, speech recognition technology achieved the next level. A specific neural architecture called long short – term memory bought a significant improvement in this field. Globally, organizations are leveraging the power of speech at their premises at different levels for a wide variety of tasks.

Speech to text software can be used for converting audio files to text files.

Speech to text software includes timestamps and confidence score for each word. Many countries do not have their language embedded keyboards, and a majority of people do not have an idea of using a specific language keyboard, though they are verbally good at it. In such cases, speech transcriptionshelp them to convert speech into text in any language.

Real-time Captioning System — Captions on the go.

The other use of this technology is in real-time. Tech done in real-time is known as Computer Assisted Real-Time translation. It is basically a speech to text system which operates on a real-time basis. Organizations all over the world perform meetings and conferences.

For maximum participation by global audiences, they leverage the power of live captioning systems. The real-time captioning system converts the speech to text and displays it on the output screen. It translates the speech in one language to the text of other languages and also helps in making notes of a presentation or a speech. These systems convert speech to text that is also understood by hearing-impaired people.

Voice Biometric System — A Smart way to Authenticate

Apart from speech to text, the technology spreads its branch into the biometric system, which created voice biometrics for authentication of users. Voice biometric systems analyze the voice of the speaker, which depends on factors like modulation, pronunciations, and other elements.  

In these systems, the sample voice of the speaker is analyzed and stored as a template. Whenever the user speaks the phrase or sentence, the voice biometrics system compares them with the stored template and provides authentication. However, these systems are facing a lot of challenges. Our voice is always affected by physical factors or emotional state.

The recent developments in biometric voice systems operate by matching the phrase with the sample. After this, it analyzes the voice patterns by taking psychological and behavioral voice signal into consideration. Also, the developments in voice biometrics technology are going to help enterprises where data security is a significant concern.

Using Speech for Analytics

Analytics play an essential role in the development of speech recognition technology. Big data analysis created a need for storing voice data. Call centers started using the recorded calls for training their employees. Since customer satisfaction is now the primary focus of organizations around the globe. Now, organizations want to track and analyze the conversation between executives and customers.

With Call Analytics applications, organizations can monitor and measure the performance and analytics of call. This call analytical solution enhances the performance of services provided by call centers. Through this, one can classify their customers and can serve them better by giving faster and favorable responses.

Way Ahead For Speech Recognition Technology

Research in speech recognition technology has a long way to go. Until now, the program can act on instructions only. Human communication feel does not exist entirely with machines. Researchers are trying to inculcate the human responsiveness into machines. They have a long way to go in the innovation of speech recognition technology.

The primary feature of research concentrates on how to make speech recognition technology more accurate. For human language understanding, we need more accuracy. For example, a person raised a question, “how do I change camera light settings?” This question technically means that the individual wants to adjust the camera flash. So significant concentration is on understanding the free form language of humans before answering specific questions.

So overall, machine learning with speech recognition technology has already made its way into the organizations globally and started providing effective and efficient results. Very soon we might be seeing a day where the automated stenographer would get promoted and start taking an active part in organizing the meetings and presentations.