Science of Media Information

Reading designed words appearing in scenes

- Optical word recognition with CNN features and WFST decoding -

Abstract

Text in natural scene images usually contains a lot of semantic value and recognizing the texts is an important step for understanding the scene. Unlike the printed documents, text in a natural scene is more difficult due to large variations in geographical placement, backgrounds, textures, fonts, and illumination conditions. In this work, we propose a method which first detects and recognizes characters by utilizing the Convolutional Neural Network (CNN), and then decodes a series of recognized characters into words with a Weight Finite State Transducer (WFST). WFST has been successfully utilized in the speech recognition field, where it is shown that it can efficiently incorporate lexicon or high order language model in the word labelling tasks. In the experiments, we have shown that the proposed algorithm can robustly recognize words in the scene images from the public datasets such ICDAR 2003, and SVT-WORD.

Photos

Poster


Please click the thumbnail image to open the full-size PDF file.

Presenters

Liu Xinhao
Xinhao Liu
Media Information Laboratory
Xiaomeng Wu
Xiaomeng Wu
Media Information Laboratory
Nobuyoshi Matsumoto
Nobuyoshi Matsumoto
Media Information Laboratory
Takahito Kawanishi
Takahito Kawanishi
Media Information Laboratory
Kunio Kashino
Kunio Kashino
Media Information Laboratory