The state-of-the-art technology for speech recognition

by Mar 21, 2020Automated Transcription

Many different companies today are developing speech recognition technology. Tech giants such as Google, Microsoft and IBM are in the forefront bringing these technologies to their platforms and services.

At the same time, they offer the technology to be used as part of other companies’ solutions. Most of the automated transcription services today are based on one of the above mentioned tech giants’ technologies.


Comparing speech recognition technologies


One research made in Florida Institute of Technology (Këpuska & Bohouta 2017) compared different speech recognition technologies based on their performance in certain voice recognition tasks.

Three technologies selected for their research were Google API, Microsoft API and Sphinx4.


Google API and Microsoft API


The two biggest players in the speech recognition industry are Google and Microsoft, and the comparison between those two is especially interesting.

Google’s main interests in developing speech recognition technology are voice input for mobile phones, voice search on desktop, and Youtube transcription and translation.

Google API’s error rate was 8 % in 2015 which was 23 % lower than in 2013. To read more about the rapid improvement of the overall accuracy in speech recognition technology, check our other blog post.

Microsoft develops its speech API especially for implementing the technology to its operating system Windows. Microsoft recently announced that their speech recognition technology has also reached the threshold accuracy of human-made transcription of 95 %.

Interesting fact about Microsoft Speech API is that the development of the technology started already in 1993.


The most accurate speech recognition technology


Researchers tested the three technologies with several different audio tapes and calculated the WER (Word Error Rate) for each technology. The goal for WER value is therefore to get as close to 0 % as possible.

Google API showed WER of 8 %, Microsoft API 18 % and Sphinx4 37 %. Differences in the results between the voice recognition technologies were relatively high.

As it can be seen from the figures, “Google API is superior” as stated also by the researchers.

And of course we at Spoken ONLINE are happy about the research results as the core of our automated transcription service is the very same Google Speech API. But as stated earlier, the technology improvement within speech recognition is fast – therefore we are constantly following the technology providers and their achievements within the field.

Source: Comparing Speech Recognition Systems by Këpuska & Bohouta, Florida Institute of Technology, March 2017