NTT Natural Language Research Group
Linguistic Resource
Over the years we have compiled some resources that we would like to share.
They are listed below, along with brief explanations.
Downloadable resources
- MT test set
(euc encoded text (518kb))
- A test set for the evaluation of Japanese-to-English MT
systems, produced by Satoru Ikehara. It consists of 3718
Japanese sentences, with English translations. The test set is
described in Ikehara at al.
(1994a).
- Linguistics and NLP term list: lingdic
(euc encoded text (133kb))
Other Resources
- GoiTaikei: A Japanese Lexicon
(Web-page)
- A Japanese dictionary of over 300,000 words, most marked
using a semantic ontology of 3,000 classes. Available as either a book or
CD-ROM using the EPWING interface from Iwanami Publishing.
- Nihongo-no Goitokusei: Lexical properties of Japanese
(Web-page)
- A Japanese lexical database of over 80,000
Japanese words. It contains information about familiarity, orthography,
pronunciation, and frequency. Available as either a book or
CD-ROM using the EPWING interface from Sanseido.
- Japanese Automatic Word Separator: ALTJAWS
- The morphological analyzer, segmenter and part of speech
tagger used in our J-E MT system was once available for research
use. It is no longer available.
For more information about the tagger, please see the brief explanation.
Natural Language Research Group
NTT
Communication Science Laboratories
2-4 Hikari-dai, Seika-cho, Soraku-gun, Kyoto, JAPAN, 619-0237
Tel: 0774-93-5313 (+81); Fax: 0774-93-5345 (+81)
Back to Top page