Head's Talk

I want to know more about you

- Getting closer to humans with AI and brain science -

Takeshi Yamada
Vice President,
Head of NTT Communication Science Laboratories

Abstract

It has already been 35 years since the privatization of Nippon Telegraph and Telephone Public Corporation and the founding of NTT in 1985. NTT Communication Science Laboratories was founded here in Keihanna Science City in 1991, so next year will mark its 30-year anniversary.

  Since the founding of NTT Communication Science Laboratories, we have undertaken fundamental research ahead of the times based not only on the principle of “conveying information accurately and efficiently” but also on the idea of “deepening mutual understanding, sharing feelings, and making heartfelt contact.” In the beginning, our main research theme was simply person-to-person communication, but today, our aim is to achieve “heart-touching communication” in both a person-to-person and person-to-computer format. To this end, we are undertaking the formulation of basic theories and the creation of innovative technologies.

  In 1985, the main role of the telephone was to provide a means of communication to find out something about a person one knows as in “Where are you now?” and “What are you doing?” as reflected by the words of a popular song at that time [1]. Today, social media has to some extent taken up this role, but it unfortunately extends to learning about people that one may not even be close to. The smartphone, meanwhile, as a device that people use day in and day out, is inherently capable of obtaining all of that information about the user, perhaps in more detail than the user. The traditional black telephone, on the other hand, had a magical sense of presence and a feeling of warmth. If technology continues to develop from here on, how will communication change? What will “heart-touching communication” mean? We are devoting our daily research to answering these questions while also posing these questions to society at large through collaboration with our business partners.

Research areas of NTT Communication Science Laboratories

  At NTT Communication Science Laboratories, we are pursuing technologies for “approaching and exceeding human abilities” such as “media processing” and “data analysis and machine learning” and science for “obtaining a deep understanding of people” such as “human science” and “diverse brain science” [2]. The following introduces several examples of these research endeavors.

  At NTT Communication Science Laboratories, we are pursuing technologies for “approaching and exceeding human abilities” such as “media processing” and “data analysis and machine learning” and science for “obtaining a deep understanding of people” such as “human science” and “diverse brain science” [2]. The following introduces several examples of these research endeavors.

Technologies for approaching and exceeding human abilities

  Communication is first and foremost the recognition and understanding of spoken words. Under conditions in which a number of people are speaking at the same time, a human is able to focus on the voice of a particular person and understand what that person is talking about. Our aim at NTT Communication Science Laboratories is to achieve such a human ability by computer. Of interest here is that lip movement in addition to voice characteristics has recently come to be used as a clue for differentiating between people having somewhat similar voices. In addition, new voice conversion technology is making it possible to freely change features such as voice quality and intonation while maintaining the content of someone’s spoken words. The further development of these technologies should enable natural communication that overcomes disabilities or age-related decline in speaking or hearing functions or provide support for conversation in an unfamiliar foreign language, for example.

  A human who can identify a song even from a short fragment of music suddenly heard on the street can often remember the title of that song. At NTT Communication Science Laboratories, we are researching and developing “Robust Media Search” technology for high-speed searching and retrieval of similar items from a massive database of songs and video using fragments of audio and video signals as clues. This technology has been commercialized through NTT DATA and has come to be used by many broadcasters as a service for automatically detecting songs used in broadcast programs and generating a list of those songs for music-related rights management [3]. We have also taken up technologies for the searching of objects in real space. For example, “adaptive spotting” is a technology for rapidly finding objects of a desired shape from three-dimensional point cloud data in real space. As in the case of humans, this technology can learn an efficient search method on its own.

For several years now, we have been participating in an National Institute of Informatics (NII) AI project called “Todai Robot Project ? Can a robot get into the University of Tokyo?” and have been researching the extent to which artificial intelligence can solve problems that humans can solve. Here, NTT Communication Science Laboratories was put in charge of English-related requirements for university admission and was given the challenge of passing the English written exam administered by the National Center for University Entrance Examinations. In the end, the AI technology of NTT Communication Science Laboratories achieved a high score of 185 points (64.1 T-score) in the 2019 version of this exam [4]. English problems in these university entrance examinations include many problems that integrate natural language processing and knowledge processing, and we feel that the knowledge gained in tackling these problems can be applied to achieving more natural and mutually understanding conversation between AI and humans.

Science for obtaining a deep understanding of people

  At the same time, the development of AI is making it all the more important to obtain a deep understanding of people. For example, when a user is searching for something on the Internet and product advertisements that match the user’s search words suddenly appear, it is not uncommon for the user to click on one of those links and make a purchase without giving it much thought. At this time, the user would say that he or she purchased the product in a completely voluntary manner hardly admitting of any manipulation by a third party in making that purchase. As AI technology expands, the risk of such clever manipulation, or as one might say, the AI version of “subliminal effects,” increases.

  Here, it is important for avoiding such risk to obtain a deep understanding of the preconceptions a human might have and what kind of behavior those preconceptions might lead to and when. At NTT Communication Science Laboratories, we have set out to clarify brain information processing focusing on athletes having remarkable skills. In this research, we use various types of biological information from the bodies of these athletes to determine how they obtain and judge information from the outside world. In baseball, for example, we might explore the difference between batters who are hitting well and those who are not and ask questions such as “Is it true that good hitters see the ball well?” or “Does a fastball really fly in a straight path?” The plan here is to use the knowledge gained in our research to provide feedback to athletes in the form of training techniques that can sharpen brain functions.

  We are also working to clarify the language acquisition process in children. Human children acquire language by talking with their parents. The human race has evolved in terms of language and language-based communication over a long period of time. On the other hand, the use of characters by the human race is a relatively recent phenomenon, so the ability to “read” is not an inherent function of the brain. Rather, it is achieved through a flexible combination of basic brain functions such as “seeing,” “hearing,” “language,” and “cognition” [5].

  To understand the language acquisition mechanism, we have constructed a “child vocabulary development database” by conducting a large-scale survey of what words could be understood and spoken by children at what time in their development and modeling the results. Considering that this database could be useful in fostering reading ability in children, we turned these results into a “personalized educational picture book” service in collaboration with NTT Printing Corporation in which the content of this book is customized according to the vocabulary development of each and every child. In cooperation with the village of Onna in Okinawa, we worked with NTT Printing Corporation to encourage children to visit the library by creating a personalized educational picture book for each child and presenting it on the occasion of a medical checkup. The idea here was to get children into the habit of visiting a library and reading books from an early stage [6].

Conclusion

  NTT Communication Science Laboratories is pursuing research to approach human abilities and explore human beings while seeking to clarify what ideal “heart-touching” communication really means. In this regard, the Japanese word for “happiness” originates from “shi-awase,” where “shi” means “action” and “awae” means “match,” which suggests that one’s happiness is achieved when interaction or communication with another person goes really well. Going forward, we are committed to creating technologies that contribute to human happiness, or in more modern terms, to the well-being of people, and to linking these technologies to the creation of an fulfilling and enriching society through collaboration with our business partners.


[1] I want to know more about you” was a “TALK ON THE PHONE” PR song that came out directly after NTT privatization in 1985.
[2] “Processing Like People, Understanding People, Helping People?Toward a Future Where Humans and AI Will Coexist and Co-create,” NTT Technical Review, Vol. 17, No. 11, Nov. 2019.
https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr201911fa1.pdf &mode=show_pdf
[3] NTT DATA won the Minister of Internal Affairs and Communications Award at the 9th ASPIC Cloud Award 2015 for its “All songs reporting service for broadcasters.” (in Japanese)
https://www.nttdata.com/global/ja/news/information/2015/100901/
[4] “AI achieved a score of 185 on the English written exam of the National Center Test for University Admissions in 2019.”
https://group.ntt/en/newsrelease/2019/11/18/191118a.html
[5] Reading in the Brain: The New Science of How We Read, Stanislas Dehaene, Penguin Putnam Inc., 2010.
[6] Three-party Joint Experiment Launched Using Personalized Educational Picture Book! (In Japanese)
https://www.nttprint.com/company/itemid419-000048.html

Speaker

Takeshi Yamada
Vice President,
Head of NTT Communication Science Laboratories