NTT

NTT Natural Language Research Group

|Top |Publications |Resources |Members |Links|

[Japanese Version]

Linguistic Resource


Over the years we have compiled some resources that we would like to share. They are listed below, along with brief explanations.


Downloadable resources

MT test set (euc encoded text (518kb))
A test set for the evaluation of Japanese-to-English MT systems, produced by Satoru Ikehara. It consists of 3718 Japanese sentences, with English translations. The test set is described in Ikehara at al. (1994a).
Linguistics and NLP term list: lingdic (euc encoded text (133kb))

Other Resources

GoiTaikei: A Japanese Lexicon (Web-page)
A Japanese dictionary of over 300,000 words, most marked using a semantic ontology of 3,000 classes. Available as either a book or CD-ROM using the EPWING interface from Iwanami Publishing.
Nihongo-no Goitokusei: Lexical properties of Japanese (Web-page)
A Japanese lexical database of over 80,000 Japanese words. It contains information about familiarity, orthography, pronunciation, and frequency. Available as either a book or CD-ROM using the EPWING interface from Sanseido.
Japanese Automatic Word Separator: ALTJAWS
The morphological analyzer, segmenter and part of speech tagger used in our J-E MT system was once available for research use. It is no longer available. For more information about the tagger, please see the brief explanation.

Natural Language Research Group
NTT Communication Science Laboratories
2-4 Hikari-dai, Seika-cho, Soraku-gun, Kyoto, JAPAN, 619-0237
Tel: 0774-93-5313 (+81); Fax: 0774-93-5345 (+81)

Back to Top page