[Japanese version]
NTT Communication Science Laboratories
Innovacative Communication Labotratory
Learning and Intelligent Systems Research Group


  Members

Akinori Fujino(Group Leader)
Takayuki Suyama
Kazuo Aoyama
Shin Mizutani
Jun Muramatsu
Yoshinari Shirai
Yasue Kishino
Hitoshi Shimizu
Takashi Hattori
Masakazu Ishihata
Takuma Otsuka
Yoichi Chikahara
Yutaka Yanagisawa(Visiting Researcher)

  Affiliates

Naonori Ueda (NTT Fellow, Ueda Research Laboratory)
Takeshi Yamada (Director of NTT Communication Science Laboratories)
Futoshi Naya (Research Planning Section)
Hiroshi Sawada (Executive Manager)
Tomoharu Iwata (Ueda Research Laboratory)
Koh Takeuchi (Ueda Research Laboratory)
Mathieu Blondel (Ueda Research Laboratory)
Naoki Marumo (Ueda Research Laboratory)
Katsuhiko Ishiguro (Mirai Translate, Inc.)
Kenichi Arai (Signal Processing Research Group)
Akisato Kimura(Media Information Laboratory)
Machiko Toyoda (NTT Software Innovation Center)
Tatsushi Matsubayashi (NTT EV Labs.)
Akinori Abe (Chiba University)
Albert (Ching-man) Au Yeung (Hong Kong Applied Science and Technology Research Institute)
Noriaki Kawamae (NTT COMWARE)
Shuhei Kuwata (NTT DOCOMO)
Kazumi Saito (University of Shizuoka)
Hitoshi 'keen' Sakano (Shimane University)
Yasushi Sakurai(Kumamoto University)
Kazuhiko Shinozawa (Osaka Kyoiku University)
Koichi Fujiwara (Kyoto University)
Takuya Maekawa (Osaka University)
Yasuko Matsubara (Kumamoto University)
Daichi Mochihashi (The Institute of Statistical Mathematics)
Makoto Yamada(Kyoto University)

  Research Overview

The Learning and Intelligent Systems Research Group is committed to the research of analyzing complex phenomena emergent in the real and cyber worlds by using statistical machine learning, data mining, data stream analysis and sensor networks.

Some concrete research contents are listed below.

Smart city sensing using municipal vehicles| Spatio-temporal city event detection via car-mounted sensors

We are researching and developing techniques for city event detection using environmental data collected via car-mounted sensors. Car-mounted sensors provide significantly more detailed data both in space and time than fixed monitoring stations. Such fine-grained environmental data help to detect more in-depth spatio-temporal city events, such as emergence of air pollution hot spots, increase in ambient noise, and accumulation of household garbage. We have been conducting field trials to evaluate our technologies in Fujisawa City. We have mounted several environmental sensors on garbage trucks to collect fine-grained data, and have investigated several event detection techniques using them.
openhouse 2017 exhibition 1


Optimization of real-time collective navigation| Finding efficient navigation by Bayesian optimization

We are developing a technology for finding efficient navigation of moving crowds of people or vehicles. This technology predicts upcoming risks of congestion caused by the crowds and searches for the collectively optimal navigation to avoid the congestion. It is difficult for humans to figure out when, where, and how they should navigate the moving crowds to ease congestion. We present an algorithm for deriving a collectively optimal navigation using Bayesian optimization that evaluates which navigation contributes to solving congestion by various simulations. We further envision an advanced and adaptive navigation by incorporating real-time sensing data of people and vehicles. Our technology can navigate people on the fly and establish secure and comfortable event operations as well as stabilized infrastructures.
openhouse 2017 exhibition 3

Picture books: a child's first textbook| Relation between picture book corpus and child vocabulary

We constructed a large-scale corpus of Japanese picture books and analyzed the relationship between words that appear in those books and child vocabulary acquisition. Although previous studies have suggested that picture book reading promotes child vocabulary acquisition, the effect has not been fully examined due to lack of picture book corpora. In our study, we found a strong correlation between the age of acquisition for basic-level nouns such as animal names and the frequency of the word in picture books. Verbs required a larger frequency for acquisition compared to basic-level nouns. The relationship between age of acquisition and frequency of words in picture books can indicate the difficulty of acquisition for a given word. Based on these findings, we can improve educational support systems such as recommending picture books according to an individualfs vocabulary level.
openhouse 2017 exhibition 10

Learning from a large number of feature combinations| CFM: low-rank regression with global optimality guarantees

Convex Factorization Machines (CFM) are a high-accuracy regression model that can handle a large number of feature combinations. CFM is general-purpose and can be applied to a wide range of tasks: e.g., house price prediction, recommender systems and genome analysis. The proposed method can handle a large number of feature combinations by using a low-rank constraint. Moreover, it is guaranteed to obtain a global optimum. In future work, to further improve predictive accuracy, we plan to support higher-order feature combinations. Besides recommender systems, applications include predicting combinations of genes that are responsible for diseases, which would help find effective cures.
openhouse 2016 exhibition 1

Find a good number of salient patterns in a matrix| Infinite plaid models for infinite bi-clustering

Our goal is to find salient bi-clusters from a given relational data matrix automatically. Salient bi-clusters are sub-matrices that have distinct values from other entries of the data matrix. Such bi-clusters often corresponds to informative subsets of the data; e.g. ggood customer groups with best-selling items for themh, and gspecific gene clusters that are reactive for a specific treatment/chemicalsh . Conventionally, bi-clustering requires us to specify the number of bi-clusters to be extracted before the analysis. Howeverit is generally difficult to know the number of bi-clusters before conducting an actual analysis. Our proposed model enables us to forget about this specification of the number of bi-clusters. The model automatically infer an appropriate number of bi-clusters (up to infinite!) for the given data matrix, and performs effective bi-clustering. This model will help users to conduct geasy-to-goh bi-clustering for several situations.
openhouse 2016 exhibition 2

Navigate people with comfortable traveling route| Dynamic migration scheduling for greater visitor satisfaction

We have developed a technology to dynamically compute and recommend comfortable migration schedules for customers in such as amusement parks based on spatio-temporal prediction of near-future congestion levels and resource demands by using real-time observation of the people flow and preferred attraction information. This technology is aimed at equalizing waiting time at attraction queues in a venue and maximizing customer satisfaction by real-time processing of spatio-temporal prediction of people flow and mathematical optimization of visitorfs migration schedules. It is also expected to support stable control of infrastructure and optimal resource management in and around venues such as leisure spots, airports, and commercial facilities.
openhouse 2016 exhibition 4

Error correction, lossy compression ... as you like it| Multipotential coding method achieving the Shannon limit

We are developing methods of error correction and data compression. Conventional practical methods were developed independently depending on the purpose and achieve fundamental limits only for specific channels/sources. To solve these problems, we developed a theory of coding method based on the constrained numbers (CoCoNuTS*) and proved mathematically that we can achieve the limits for any channels/sources by using this method. By using this unified theory, we can construct practical codes achieving the limits for many scenarios of transmission/storage. In future, this technology can be applied for highly reliable broadband transmission on a channel such as an optical fiber, a wireless LAN, and a mobile phone, and for high quality compact format for sounds and images on media/storage such as CD, DVD, BD, and a flush memory.
openhouse 2016 exhibition 8

Pitarie: Find a picture book just right for a child| Picture-book search with interdisciplinary approach

PITARIE is a system that finds the appropriate picture books depending on the developmental stage of children. Reading timely and apposite picture books not only helps advance the child's vocabulary and emotional development but also promotes the communication with the child. We aim to provide evidence-based assistance for education in infancy and childhood by presenting apposite picture books based on the human science as well as the information science; we not only utilize the latest advances in the research fields of similarity search and natural language processing, but also introduce the knowledge in developmental psychology. We are conducting joint experiments with a local government and an NPO to increase the availability of PITARIE.
openhouse 2016 exhibition 12

Automatic tailor-made data analysis| Generating probabilistic models using structure information

Probabilistic latent variable models have successfully captured the intrinsic characteristics of various data. Understanding them is helpful for discovering latent rules and facts behind data. However, it is nontrivial to design appropriate models for given data because both machine learning and domain-speci?c knowledge are required. We propose an automatic model generation method for data with hierarchical structure. Our method constructs an appropriate model for given data by extracting important hierarchies and preserves hierarchical and sequential information if needed or desired. We automatically extract latent structures of given data and discover hidden rules and facts behind the data.
openhouse 2015 exhibition 1

Finding various factors hidden in data| Advanced and fast high-dimensional multiple factorization

Our research focuses on finding various factors hidden in data. Suppose that we have a dataset of purchase records in which each purchase is represented by hashed user ID, user attribute, purchased good, purchase time, and store. Our method analyses the dataset and finds some specific tendencies, e.g., business persons tend to buy burgers for lunch at convenience stores, in an efficient way and in a precise manner. For efficient computations, we optimize the data structure and algorithm so that sparse data entries are aligned and accessed sequentially. For preciseness, we introduce some constraints that represents the relationship between a user ID and its attribute. Our proposed method contributes to discovering new insights from ever increasing behavior logs of humans and machines.
openhouse 2015 exhibition 2

Infinite data analysis beyond big data| Stochastic process models for infinite-dimensional matrices

We address the problem of infinite data analysis. Big data analysis has been a very active area of research in machine learning, but so far, as implied by the term ebig dataf itself, the amount of input data has been assumed to be finite. However, many types of data often grow infinitely in size, and therefore, the observed data must be a part of a potentially infinite amount of data. This is the reason why we think the machine learning systems must be able to handle unlimited size of data. As an example of our results, this presentation deals with relational data represented by matrices, where the rows indicate instances and the columns represent values attributed to the instances.
openhouse 2015 exhibition 3

Agile environmental sensing| CILIX: a virtual machine for wireless sensor network applications

We propose agile software development and maintenance for environmental sensing based on CILIX, our virtual machine. CILIX has three essential characteristics: (1) enables programmers to develop sensor node software using familiar programming language, (2) replaces sensor node software by wireless networks, (3) requires no large program memory. Using CILIX, sensor node software programmers can quickly develop a minimum set of software using familiar programming language and iteratively updates it depending on practical situations. We conducted several field-sensing experiments to investigate our technologies.
openhouse 2015 exhibition 6

Satisfying Visitors with Collective Navigation While Predicting People Flow| Spatio-temporal data analysis for controlling massive people flow

We are researching and developing techniques that allow movements of entire groups to be efficiently navigated by making near-future predictions of people flow and congestion levels based on such information as event information and real-time observations of people and traffic flows. We help event-management businesses perform smart venue administration by providing comfortable personalized navigation tools based on a visiting plan suited to customer preferences. Our goal is proactive risk management to prevent serious impact on social infrastructure at sporting events, concerts, and exhibitions where there are many people moving around, generating large road traffic and network traffic fluctuations.
openhouse 2015 exhibition 7


Positions available: We recruit talented researchers who are interested in Statistical Machine Learning and Data Mining. We have positions for new graduates and mid-careers as well as contract positions (RA = Post-doc and RS). For more details, please visit Career Opportunities page.

top of page