All the nouns from WordNet 2.1, and their relations, have been imported. That was the starting point for the MLSN dictionaries.
The biggest difference from WordNet is the scope: all the words of society need to be included. Every brand name, every movie or book title, every famous person, and so on, needs to be in MLSN's dictionaries. English WordNet has about 80,000 nouns yet English Wikipedia has just short of a million articles and is still incomplete. When we have one million entries in both our English and Japanese noun tables then perhaps we can start to do some worthwhile machine translation.
Synonyms and hyponyms I felt were missing in English WordNet have been added. Trying to add another language helps you discover concepts that are distinct but do not seem so in English. A simple example is there are many different types of sushi, but WordNet has no hyponyms of sushi. Another example is the Japanese word "girichoko". This is chocolate given as an obligation (on Valentines Day and White Day) rather for romantic reasons. It is culture-specific but if we want to be achieve our project goals of understanding text and machine translation then it is needed in a multi-lingual semantic network.
A more formal word code is used: it always looks like NNwordX. NN is the noun classification in wordnet. In wordnet the X, the final hex digit, is sometimes dropped when it is a zero. That is ambiguous when the word actually ends in a digit (not to mention when X is a..f). Incidentally I expect to have to move to use two hex digits but will cross that bridge if and when I reach it.
product_of and attribute_of relations have been added. I just couldn't squeeze these concepts into the existing relations.
Wordnet has its own inconsistencies. For instance some British slang words are done as hyponyms of the American English word (e.g. 04dole0 or 06boot1), though from a locale-neutral point of view they are just synonyms. When translating to another language both entries becomes the same. For my purposes this is fine, though that policy may change in future. However some other British words are done as synonyms, with a mention in the gloss (e.g. see 06doorknob0). This is not just a British English issue: in many cases a word that could be a synonym has been done as a hyponym (or a sister relationship), and many hyponyms have been done as synonyms.
It also has some English-language-specific relations (e.g. the links from the word "plural"). I am not sure how to handle these in a multilingual semantic network. Perhaps some of the relations should be renamed (e.g. from "usage" to "English language usage").
There is a Global WordNet Association, that I must admit I have not yet looked at that properly. The scope of MLSN is more than just a Japanese translation of WordNet, so I do not know if it will be compatible or not. Incidentally when I skimmed it before I was disappointed to see how few projects had followed WordNet's liberal license.
Back to main page
© Copyright 2006 Darren Cook (email@example.com)