NTT Communication Science Laboratories Innovative Communication Laboratory02

title_research_e.gif

index_e.gif

title_innovative_icon.gif
title_innovative_e.gif
title_innovative_1_e.gif

Statistical machine learning and data mining

Automatic interpretation of a massive amount of data

Data mining methods for analyzing a massive amount of data available from anywhere over the world are helpful for business decisions as well as various daily activities. Statistical machine learning techniques are effective for processing such noisy and partially observable data. Our research focuses on extracting latent topics from observed data and the global structures of relational data, identifying a subset of data related to user interests or queries, and constructing knowledge structures from images and videos.

■Research background

The amount of data that we can utilize today is so huge that we cannot even look through all of it by ourselves. We may have difficulties when interpreting such sensor data as acceleration data, even though we can intuitively understand text documents, images, and sounds. We need automatic ways of analyzing and interpreting such huge amounts of data. Based on statistical machine learning techniques, we are inventing and designing data mining methods and algorithms whose programs are run on computers. Such automatic interpretation of data by computers may exceed human capability and provide essentially new findings.

■Extracting latent topics

We are developing methods for extracting intrinsic structures (latent topics) from large-scale and complex data. The topic models are probabilistic generative models for texts. We are extending the topic models for applying to visualization, recommendation systems, and purchasing log analysis.

innovative_2_1e.jpg

■Analysis of time-varying relational networks

We are developing a relational data clustering technique to understand and analyze the behaviors of such time-varying relations as friend networks in SNSs, transactions between companies, and hyperlinks in WWW. Our technique can extract hidden communities in relational networks and track network changes more easily.

innovative_2_2e.jpg

■Fast similarity search

We are studying a similarity search method using a neighborhood graph as an index to realize fast similarity search in a variety of data sets where various similarity measures are defined, such as documents, images, symbol sequences, and speech and audio signals. For intuitive understanding of a search result, the method can also directly display the relationship among its data objects as a graph by a graph index.

innovative_2_3e.jpg

■Knowledge acquisition and content understanding from web-scale images/videos

One of our researches is focusing on automatically building dictionaries for image/video content understanding and retrieval from complex and various scenes with only the help of a small amount of prior knowledge and supervised information.

innovative_2_4e.jpg