This page is obsolete.
Knowledge Processing Research Group
Innovative Communication Science Laboratory
NTT Communication Science Laboratories
Attention: We belong to NTT Corporation (not to NTT Communications Corporation).
Japanese page

News

Access

Research Topics

Now, Natural Language Processing (NLP) research is at a major turning point. We devised a new semi-supervised machine learning method, and achieved the world-best scores for three English benchmark tests: part-of-speech tagging, syntactic tagging, and Named Entity Recognition by using one-giga-word unlabeled data. In addition, we mined diverse and ambiguous causal expressions, and implemented a Japanese Question Answering system NAZEQA that answers why-questions. Natural Language Processing and Knowledge Processing are also studied in other NTT laboratories such as Cyber Space Laboratories and Cyber Solutions Laboratories. These laboratories are requested to contribute to NTT's business. On the other hand, Communication Science Laboratories is requested to produce innovative technology as a basic research laboratory.

Recently, we are working on Statistical Machine Translation based on hierarchical phrases, Semi-Supervised Machine Learning methods for Natural Language Processing, and Complex Question Answering systems that answer `Why' questions.

Statistical Machine Translation
Traditional machine translation systems were built by language experts based on their knowledge. However, this method requires unbearable human efforts to build and maintain the program.

Recently, a new approach called ``Statistical Machine Translation'' appeared. This approach analyzes bilingual corpora to get word correspondences or phrase correspondences.

By this approach, we can build a translation system easily. We are promoting the research to get more accurate and more readable translation results.

Semi-Supervised Learning
Most state-of-the-art natural language processing tools such as ``morphological analyzers,'' ``chunkers,'' and ``named entity recognizers'' are based on ``supervised learning'' and it seems that they are sufficiently accurate. Why do we study ``semi-supervised learning''?

First, these tools are not satisfactory. When we analyze failures of our Question Answering systems, we often encounter errors of these tools. Therefore, we need more accurate tools.

In order to improve these supervised learning-based systems, we have to prepare huge correctly labeled data. However, we can expect only a small improvement.

Therefore, we are studying ``semi-supervised learning'' because we do not have to prepare more labaled data.

Question Answering
We were working on SAIQA, a factoid Question Answering system for Japanese. We also implemented a Japanese-English Cross-Lingual QA System, which searches English documents for an answer to your Japanese question. Now, we are working on a more advanced system called NAZEQA that answers why questions.

Recent Publications

Member

Awards

(C) NTT Communication Science Labs.