Head's Talk

May 30th (Thu) 13:20 - 13:50

Processing like people, understanding people, helping people

- Toward the future where humans and AI will cohabitate and co-create -

Takeshi Yamada, Director of NTT Communication Science Laboratories

Abstract

NTT Communication Science Laboratories is building a theoretical foundation and developing innovative technologies toward person-to-person and person-to-computer heart-touching communication capable of conveying even human emotions. To give some background to this initiative, there have been amazing developments in artificial intelligence (AI) in recent years. In the beginning, computers were especially adept at processing large amounts of data all at one time, a task that humans are incapable of, and at performing high-speed processing on behalf of humans for types of processing that humans are not especially good at. However, thanks to recent AI developments especially in deep learning, computers are approaching—and surpassing in some cases—human abilities in areas where they have long been behind such as speech and image recognition and natural language processing that come naturally to humans. From here on, we can expect AI to evolve even further at a rapid pace, but it is also said that progress in AI performance beyond the complexity of the human brain still lies somewhere in the future. On the other hand, it is because of this complexity that humans are imperfect since they can fall prey to cognitive bias and make mistakes, they can be fooled by an illusion thereby taking for real something that does not actually exist, etc. With the above in mind and with the aim of achieving heart-touching communication, the mission of NTT Communication Science Laboratories is to provide a connection between computers that will continue to develop rapidly within a limited range and humans with their complexity and imperfect nature so as to fill the gap between them. To this end, we place particular importance on basic research not only toward technology that can approach human abilities but also technology that can be used to elucidate human functions and characteristics and to understand what it means to be human and technology to help people in their daily lives [1].

Technology approaching human abilities

Today, there are still many forms of processing that, while being a strong point of human beings, are difficult for computers. Of course, the accuracy of machine translation has risen and English-related fill-in-the-blank questions in a university entrance exam can be correctly answered to some extent [2], but computers are not yet able to understand the full meaning of a sentence or have commonsense. On the other hand, the use of deep learning is now enabling computers to approach the abilities of humans in certain areas such as image recognition and speech recognition. For example, chatting at a meeting or party usually occurs with more than one person talking at the same time or with music playing in the background. In such an acoustic environment, a human can sort out the voice features of the person that he or she wants to listen to and catch the contents of that person’s speech. This is one notable feature of human hearing known as “selective listening.” In the past, computers were not very good at selective listening, but at NTT Communication Science Laboratories, we have applied proprietary deep learning techniques to develop technology that enables computers to catch only the voice of a specific speaker based on the features of that voice and have begun to roll out this technology.

We can expect such technologies to progress even further and to approach even closer to human abilities in the years to come. One area where they could thrive is “cross-media.” In the past, separate analysis techniques were used for different types of media such as speech, video, and text and researched separately. However, thanks to the coming of deep learning, which we might call a “common language,” it is becoming possible to perform “recognition,” “generation,” and “conversion” across multiple types of media (= cross-media). For example, just by hearing a sound, humans are capable of imaging the scene associated with that sound. This is a type of processing that comes natural to people in their daily lives. NTT Communication Science Laboratories is developing cross-media scene analysis technology called “image recognition from sound” for implementation in computers. With this technology, it should be possible to make predictions about a location situated in a camera’s blind spot by using sound. In addition, noting that humans accumulate knowledge daily by watching and listening to TV, we can envision a future in which computers too can learn about things and objects and associated concepts in an autonomous manner and become smarter from media data such as TV broadcasts by discovering co-occurrences of audio and video. We are now engaged in basic research with the aim of bringing this technology to fruition.

Technology for obtaining a deep understanding of people

In the above way, computers are approaching the abilities of humans in specific areas and gradually surpassing them. Nevertheless, a level of performance in AI that exceeds the sophistication of the human brain still appears to lie somewhere in the future. On the other hand, humans can sometimes be subjected to cognitive bias and make mistakes as reflected, for example, by the ease at which some people are cheated by bank transfer scams. In addition, the human brain sometimes operates under an illusion. The “Illusion Forum” website managed by NTT Communication Science Laboratories presents a variety of illusions for consideration [3].

To fill the gap between complex and imperfect human beings and currently limited AI, there is a need to develop an even deeper understanding of human beings. To this end, NTT Communication Science Laboratories is working to clarify “implicit brain functions” related to the basic human senses of seeing, hearing, and moving. Here, illusions can provide important clues to understanding humans at an even deeper level. Additionally, we are looking at top-ranking athletes as part of our efforts in sports brain science to elucidate the outstanding abilities of these individuals from the viewpoint of brain science and to find out how “mind, technique, and body” are interrelated. For example, we have taken up the challenge of explaining the mechanism of how a talented batter judges whether an incoming ball is slow or fast and moves in time with that pitch all within an extremely short period of time of 0.1 second. Sports brain science is a new technology and an ambitious undertaking that departs from conventional sports science and sports analysis techniques that have mainly focused on body training.

Technology for helping people

Results obtained by sports brain science are not limited to sports—they can also be used to manifest latent mental and physical abilities in a person’s everyday life. To put it another way, they can be used as knowledge for improving one’s sense of well-being. With this in mind, we are looking at ways of handling the qualitatively elusive problem of human well-being in a quantitative manner from the viewpoint of human science and to establish design guidelines for improving well-being in people. One example of this approach is measuring the effects of empathetic communication that arises when people come to share the same space.

At the same time, illusions, while providing clues to explaining “implicit brain functions,” also hold the key to filling in the gap between humans and AI and developing interfaces and feedback for helping people in their daily lives. NTT Communication Science Laboratories has developed a device called Buri-Navi that generates the illusion of being pulled by some force as an interface that exploits the properties of human illusions. We are also working on technology for making a sitting person feel as if he or she is actually walking. In fact, we have produced a series of interesting products or techniques in this area, including Hengentou, a new light projection technique that makes objects in a printed picture or photograph appear to move simply by shining a light on it, “Hidden Stereo” that enables a viewer to enjoy 3D video while wearing 3D glasses and vivid 2D video when removing them, and Ukuzo, another light projection technique that gives 2D objects such as printed matter a floating effect by projecting shadow-like patterns onto them. Going forward, we plan to propose new types of interfaces using illusions while pursuing the possibility of novel forms of perception and expression that make use of illusions to create experiences that could not be physically achieved.

Conclusion

The amazing progress of AI technology in recent years will help foster dreams and hopes in many people, but signs of difficult–to-predict changes that AI will bring to society can sometimes induce anxiety, such as the fear that AI will rob oneself of one’s livelihood. But if such attitudes were not unfortunate enough, there are also people who feel there is no scientific basis to “global warming” to say nothing of those who believe in the “flat Earth theory.” In Japan, meanwhile, 2019 has turned out to be both a year marking Reiwa, a new name in the traditional Japanese era system and a year on the verge of holding the Olympics and Paralympics. In various ways, it looks as if Japan is at a transition point between eras. As we enter a future in which technology advances at an even faster pace and competition becomes increasingly severe, NTT Communication Science Laboratories will boldly and tenaciously undertake new challenges with a focus on technologies that process like people, understand people, and help people. We would be extremely pleased if this open house turns out to provide everyone with an opportunity to feel what the future will truly be like by experiencing the latest technologies in person.

[1] Shift to New Dimensions—Further Initiatives to Deepen Communication Science, NTT Technical Review, 2018.11.
https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr201811fa1.html
[2] Taking the English Exam for the “Can a Robot Get into the University of Tokyo?” Project, NTT Technical Review, 2015.07.
https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr201507ra2_s.html
[3] Experience Optical and Acoustic Illusions! Illusion Forum. (in Japanese)
http://www.kecl.ntt.co.jp/IllusionForum/

Photos

Speaker

Takeshi Yamada,
Director of NTT Communication Science Laboratories