NTT Communication Science Laboratories Media Information Laboratory_02

title_research_e.gif

index_e.gif

title_media_icon.gif
title_media_e.gif
title_media_1_e.gif

Robust Object Tracking Technology

Real-time and online tracking utilizing history

Object tracking is a technology that estimates the positions and poses of targeted objects, moment by moment, in video images captured by a camera. It is considered a basic element of media information processing; for example, when a computer tries to recognize various objects and events in the external world as we humans do, it is very important to incorporate temporal continuity and consistency by tracking objects over time rather than just by analyzing one frame of an image shot by a camera. Furthermore, object tracking is a key function for supporting and achieving smooth and safe interactions among humans or between humans and machines. In our research effort, we have placed emphasis on robustness against rapid target movement and occlusions, which is quite important for applying tracking technology to real-world problems. As a solution, we developed a method called "Memory-based Particle Filter (M-PF)". M-PF utilizes the long-term tracking results for targets, based on models of individual targets' dynamics derived from the reappearance probability of past state stored in memory. M-PF exhibits extraordinary robust tracking accuracy.

media_2_1e.jpg

■How will it be Used in the Future?

M-PF is a generic technique that can be applied to various applications. Generally, object tracking is desired in various situations where machines are required to know positions , poses, and movement of interested objects, such as object extraction from video images, robot vision, traffic measurements, and driving assistance. Considering its principle and characteristics, M-PF is particularly suitable for online or real-time applications. Such applications will include so-called "communication scene analysis" systems as tools for a basic research as well as real systems that will use detailed information about people’s positions and poses extracted from real-time video input to achieve smooth and safe human-human or human-computer interactions.

■Robust object tracker using M-PF

M-PF is an extension of particle filter (PF) techniques. PFs are broadly used for object tracking; they determine the position and pose of a target by posterior distribution estimation and prior distribution prediction at each time step. A PF employs a quite simple dynamics model, such as a random walk dynamics or a linear uniform motion model, for prior distribution prediction. As a result, predicted prior distribution becomes inaccurate when the tracked target moves abruptly or is momentarily occluded. In contrast, M-PF predicts prior distribution on the basis of a past long-term history.

media_2_2e.jpg

The basic version of M-PF estimates two parameters of a tracking target: position and pose. To this end, it stores a history of the positions and poses at each time step in memory and uses them for predicting prior distribution in future. We have verified that M-PF achieves robust object tracking under complex dynamics, including abrupt movement and occlusion.
As an extension of M-PF, we have also developed a tracker named M-PFAP (M-PF with Appearance Prediction), also with the memory-based approach. Assuming that a past appearance reappears stochastically, M-PFAP maintains an appearance history, in addition to position and pose, and predicts joint probability of position/pose and appearance by utilizing their history. With this approach, robustness against changes in appearance according to changes in pose is obtained, and as a consequence, more robust tracking under wider pose variations is achieved.