ALTJAWS is the morphological analyzer used in the NTT Communication Science Laboratory's Japanese-to-English machine translation system ALT-J/E.
ALTJAWS takes a Japanese sentence and (1) segments it into words; (2) gives the part-of-speech and semantic feature(s) for each word; and (3) groups the words into phrases (bunsetsu). The dictionary and part-of-speech codes are the same as those used in ALT-J/E. The dictionary contains 350,000 words and over 300 parts-of-speech. There are around 3000 semantic features organized in a hierarchical ontology with is-a and has-a links. It is possible to add your own user dictionary.
ALTJAWS can be used as a preprocessor for Japanese natural language processing applications or for tagging corpora.
ALTJAWS is packaged as a UNIX library, which can easily be called from C programs. The package comes with some sample programs using the library. All documentation is in Japanese.
Input>>私は休暇を取る許可を取り、妻はホテルを取った。 ----- 私は休暇を取る許可を取り、妻はホテルを取った。 1. 私(1710,[8,37])/は(7530) 2. 休暇(1100,[1932,1607])/を(7430) 3. 取る(2387)(2387,捕る)(2387,採る)(2387,盗る) 4. 許可(1220,[1166,1450,1735])/を(7430) 5. 取り(2383,取る)(2383,捕る)(2383,採る)(2383,盗る)/、([P]0210) 6. 妻(1100,[76,49,841])/は(7530) 7. ホテル(1100,[437,374])/を(7430) 8. 取っ(2384,取る)(2384,捕る)(2384,採る)(2384,盗る)/た(7216)/。([P]0110)(Note: the four digit numbers are part-of-speech codes, the numbers in square brackets are semantic features)
| Machine | SparcStation 20, SparcUltra 10, PC etc |
|---|---|
| OS | SUN OS 4.1.X, Solaris 2.x(SPARC), Linux 2.0.x |
| Disk Space | 70MB |
| RAM | 6MB |
ALTJAWS was once offered free for research use, but is now no longer generally available outside of NTT.
However, if you purchase the CR-ROM version of Goi-Taikei --- A Japanese Lexicon, you can use it with the EB Library to look up the semantic features of Japanese words. Please make sure you obey the usage restrictions when you do so.