By Mike Sandler for Forbes
Many organizations with an accessibility mindset (or mandate) consider real-time transcription services a must-have for live events. How else to ensure every attendant is able to follow what’s happening, particularly those who are deaf or hard of hearing?
There are other reasons to transcribe speech beyond accessibility, such as keeping records of the proceedings of courts and legislative bodies. However, converting speech to text in real time isn’t easy. Accordingly, those who do it well command premium prices and are in high demand. Where does that leave organizations without the budget to hire such professionals or those unable to find any available the day of their event?
As ever, technology has put forward a solution — in this case, automatic real-time transcription. The question on the minds of many is whether this solution can deliver what it promises.
Real-Time Vs. Post-Production
Whenever there’s talk of transcription — machine powered or otherwise — clarification is in order. Many companies have used what are called post-production transcription services. Such services work like this: You submit a media file, which is put through automatic transcription software or passed on to a human to transcribe. In either case, the final deliverable is a transcript, the quality of which will depend on the software’s capabilities or the transcriptionist’s diligence.
Most transcription solutions available today are for post-production and thus unsuitable for live events. Machines require a packaged media file to work. Humans, able to rewind speech and polish transcripts to perfection, will deliver work of a much higher quality than if tasked with the same job in real time. Yet post-production services and solutions dominate, deepening the difficulties of companies in the market for live transcription.
AI-Powered Live Transcription
Automatic transcription solutions that function in real time do exist. Their development has benefited considerably from the popularity of speech-enabled services. For example, search engines now let you verbalize queries rather than type them, the former being often preferable on mobile devices with cramped touchscreens. Of course, there are also smart devices, like Alexa and Google Home, that are able to comprehend a wide range of commands.
The technology that enables such services is the same that powers automatic transcription solutions. The difference is that the latter outputs its conclusions of the likely meaning of human speech as text.
Google, Amazon, IBM and other big names have released speech recognition APIs that developers can use to build automatic real-time transcription solutions. These APIs present an avenue for addressing the cost and supply barriers that a good number of companies face in securing live transcription services. However, the practicality of this comes down to their performance compared to humans.
This is a topic several colleagues and I researched in early 2020 for NAB Show’s 74th annual Broadcast Engineering and Information Technology (BEIT) conference. The results provide a clear picture of the state of these cutting-edge APIs and a compelling answer to the question of whether automatic transcription technology is ready for real time.
Evaluating Real-Time Readiness
Our research compared the performance of three leading speech recognition APIs — Amazon Transcribe, Google Cloud Speech-to-Text and IBM Watson Speech to Text — to a generalized measure of human transcription performance.
I’ll spare you the methodological details (our paper discusses these in-depth). Most compelling are our results, which suggest these speech recognition engines can perform similarly to humans in real-time settings while charging far less per hour of transcription (under $10 USD for the APIs versus $60 to $200 USD for humans).
These results challenge an impression of AI-powered transcription that many organizations might hold — namely, that machines can’t come close to humans in terms of performance. I suspect this idea comes from past experiences with traditional transcription services, which resulted in pristine transcripts composed without the pressure of time.
Human transcriptionists still outdo machines in some respects — for example, when it comes to inferring the meaning of garbled speech. However, the overall performance gap is narrowing and will continue to do so with time. AI also offers distinct advantages today over traditional live transcription services when it comes to cost, availability and consistency.
It’s worth emphasizing that our testing took place in January 2020. If we ran the same tests under the same conditions today, the APIs would only perform better since, as pieces of machine learning, they improve over time.
A New Era Of Live Transcription
What do these results mean for businesses and other organizations in need of live transcription services? Affordable and widely available transcription is now possible for conference presentations, university lectures, church sermons and other events that unfold in real time.
Solutions like these already exist, offering a remedy for long-standing live transcription challenges. For instance, it will no longer be necessary to transcribe only part of a conference — say, the keynote speech — rather than the entire program for budgetary reasons.
If you’re ready to add automatic live transcription to your events, what should you look for in a solution? Always keep in mind what you wish to achieve. For a typical live event, you’ll likely want to be able to display real-time transcriptions on in-room monitors or bring in audio from a venue’s PA system. Not every solution can do this or makes it easy. A mobile app, for instance, is more suitable for transcribing a one-on-one interview than for making a conference more accessible.
Also, be mindful of solution complexity. Who will be responsible for the solution on event day, and what’s their level of technical expertise? Purpose-built devices in particular can make setup and operation very simple.
More Availability Means More Accessibility
It’s not only organizations that stand to gain from more affordable and available live transcription. The real winners are those with hearing challenges or whose native language differs from that primarily spoken at an event they’re attending — or anyone at a live event seated near someone who’s prone to chatter. In cases like these and more, live transcription can only enhance understanding and enjoyment.