Now you are here: Home > Research Interests > Cognitive developmental approach to human-like visual scene understanding [ English ] [ Japanese ]

Cognitive developmental approach to human-like visual scene understanding

How we humans understand visual scenes so easily and quickly? It is very difficult to answer the question. However, human babies naturally aqcuire the ability to do it. Thus, imitating typical actions of human babies may be a promising approach to the aqcuisition of the ability of human-like visual scene understanding. For example, focusing salient regions that may be objects or something meaningful, modeling objects from the appearance, and obtaining/correcting additional information of objects from parents through interactions between babies and parents.

Based on the above discussions, we propose a new framework of human-like visual scene understanding composed of the following 6 procedures:

  1. Visual attention estimation identifying positions that are visually salient from an input video taken from a webcam or something.   >> Click here for details
  2. Object-like region extraction from the estimated positions based on color distributions and spatial continuity. At the initial state, the system does not have any knowledge, and therefore it has to rely on visual saliency to detect object-like regions.   >> Click here for details
  3. Visual feature extraction from the extracted regions.
  4. Automatic annotation retrieving and showing annotated information of image regions whose features are similar to those of the object-like regions.
  5. Annotation request asking users to reinforce and/or correct annotated information provided from the system.
  6. Model learning constructing a model that represents relationships between visual features and annotated information by machine learning.
We delelop a prototype system of human-like visual scene understanding based on the above framework. The whole procedure can be execute within near real-time (about 7 fps @ 320x240 pixels) on mobile-type PCs with the help of stream processing by NVIDIA CUDA tecnology.

Demo movie

Automatic detection and learning of unregisterd objects based on visual attention model

Selected publications

Akisato Kimura, Kunio Kashino, Ken Fukuchi, Kouji Miyazato, Kazuma Akamine, Shigeru Takagi
"Cognitive developmental approach towards the realization of human-like visual scene understanding: Framework and core technologies",
IEEE Workshop on Computer vision for Humanoid Robots in Real Environments,
Kyoto, Japan, September 2009.
[ short talk ] [ poster ]
[ copyright notice : The authors hold the copyright of the material. ]

Put on NVIDIA CUDA Zone. [ Details ]