Media Intelligence

How accurate are speech recognition results?

- Estimating speech recognition accuracy without references -

Abstract

The performance of an automatic speech recognition (ASR) system is measured in terms of the recognition rate (accuracy), which is calculated by aligning recognition results and manually transcribed references. However, the cost of manual transcription is very high. In this research, we propose a technique for estimating the recognition rate without using references. To this end, we have developed an error type classification (ETC) technique, which probabilistically classifies each word in a recognition result into one of four categories; correct (C), substitution error (S), insertion error (I) or deletion error (D). With the proposed ETC, the recognition rate can be estimated very accurately. This technique can be used in the development of practical ASR application systems and for refining basic ASR algorithms, etc.