NTT Communication Science Laboratories Human Information Science Laboratory06




Human Behavior Recognition from Multimodal Information

Analyzing empathy/antipathy among people in face-to-face communications from facial expressions and eye gazes

Various information systems assume accurate interaction between humans and the system itself. This demands Ambient Intelligence, where the system accurately perceives the user's situation and intent and provides the needed information and responses. To that end, the system must recognize the user’s status and choose an appropriate course of action based on it, instead of acting unilaterally without regard for the user’s status.
The recognition of human status is an essential step toward Ambient Intelligence. We aim to develop a system that can recognize human status, especially invisible internal states, from visible behavior. To do this, we have been focusing on the multimodal information present in multimedia data and the recognition of human behavior by detecting, extracting, and integrating such multimodal data as images and acoustic signals. We need the following component technologies for modeling, recognition/estimation, and integration:

  • ・Face detection from images or videos
  • ・Speech activity detection from audio signals
  • ・Facial expression recognition from images or videos
  • ・Face pose estimation in meeting situations
  • ・Speaker detection from the direction of an arriving voice
  • ・Emotion recognition from facial expressions, gestures, and voices
  • ・Recognition of human behavior from images or videos
  • ・Estimation of conversation structures from speech activity

■How will it be used in the future?

The recognition of user’s behavior and status will be useful information for designing and controlling interaction strategies between ambient intelligence and human users. Archiving and retrieving past meetings is also important as well as enhancing the effectiveness of telecommunication services.


media_10_1j.jpgAs a critical element in communication scene analysis, we focus on empathy and antipathy among people in face-to-face conversations and are developing mathematical models and automatic techniques for estimating interpersonal emotions from non-verbal behaviors including facial expressions and eye gazes.


media_10_2e.jpgWe have proposed a mathematical model that can represent the relationship between empathy/antipathy and behaviors including facial expressions (FEs) and gaze (see right). To cope with the uncertainty of human emotions, empathy and antipathy are represented as probability distributions and estimated by probabilistic inference. From a communicative point of view, i.e., how emotion is perceived and interpreted by its recipients, this model is based on the distribution of multiple observers’ interpretations of empathy/antipathy in the form of the co-occurrence of the participants’ FEs and gazes (see matrix on right).

■Facial Expression Analysis

media_10_3e.jpgParticipant FEs are automatically classified into such categories as neutral, smile, and wry smile from the movement of facial landmark points and changes in facial appearance. We manually annotated FEs and empathy/antipathy by multiple observers.

■Estimation result and evaluation scheme

media_10_4e.jpgThe figure on the right shows an example of the estimated probabilities of empathy/antipathy. Each bar chart indicates how likely observers are to ascribe empathy/antipathy to the pair. To quantitatively evaluate the results, we proposed an evaluation scheme based on the distribution of multiple human interpretations.