Media Intelligence

How accurate are speech recognition results?

- Estimating speech recognition accuracy without references -

Abstract

The performance of an automatic speech recognition (ASR) system is measured in terms of the recognition rate (accuracy), which is calculated by aligning recognition results and manually transcribed references. However, the cost of manual transcription is very high. In this research, we propose a technique for estimating the recognition rate without using references. To this end, we have developed an error type classification (ETC) technique, which probabilistically classifies each word in a recognition result into one of four categories; correct (C), substitution error (S), insertion error (I) or deletion error (D). With the proposed ETC, the recognition rate can be estimated very accurately. This technique can be used in the development of practical ASR application systems and for refining basic ASR algorithms, etc.

Photos

Poster


Please click the thumbnail image to open the full-size PDF file.

Map

Presentor

Atsunori Ogawa
Atsunori Ogawa
Media Information Laboratory
Espi Miquel
Espi Miquel
Media Information Laboratory
Marc Delcroix
Marc Delcroix
Media Information Laboratory
Masakiyo Fujimoto
Masakiyo Fujimoto
Media Information Laboratory
Takaaki Hori
Takaaki Hori
Media Information Laboratory