Science of Media Information

Reading designed words appearing in scenes

- Optical word recognition with CNN features and WFST decoding -

Abstract

Text in natural scene images usually contains a lot of semantic value and recognizing the texts is an important step for understanding the scene. Unlike the printed documents, text in a natural scene is more difficult due to large variations in geographical placement, backgrounds, textures, fonts, and illumination conditions. In this work, we propose a method which first detects and recognizes characters by utilizing the Convolutional Neural Network (CNN), and then decodes a series of recognized characters into words with a Weight Finite State Transducer (WFST). WFST has been successfully utilized in the speech recognition field, where it is shown that it can efficiently incorporate lexicon or high order language model in the word labelling tasks. In the experiments, we have shown that the proposed algorithm can robustly recognize words in the scene images from the public datasets such ICDAR 2003, and SVT-WORD.

Photos

Poster

Please click the thumbnail image to open the full-size PDF file.

Presenters

Xinhao Liu
Media Information Laboratory

Xiaomeng Wu
Media Information Laboratory

Nobuyoshi Matsumoto
Media Information Laboratory

Takahito Kawanishi
Media Information Laboratory

Kunio Kashino
Media Information Laboratory