Abstract
The well-known problem of knowledge acquisition is one of the biggest issues in Word
Sense Disambiguation (WSD), where annotated data are still scarce in English and almost absent in other languages. In this
paper we formulate the assumption of One
Sense per Wikipedia Category and present
OneSeC, a language-independent method for
the automatic extraction of hundreds of thousands of sentences in which a target word is
tagged with its meaning. Our automaticallygenerated data consistently lead a supervised
WSD model to state-of-the-art performance
when compared with other automatic and
semi-automatic methods. Moreover, our approach outperforms its competitors on multilingual and domain-specific settings, where
it beats the existing state of the art on all
languages and most domains. All the training data are available for research purposes at
http://trainomatic.org/onesec