Abstract
Semantic compositionality (SC) refers to the
phenomenon that the meaning of a complex
linguistic unit can be composed of the meanings of its constituents. Most related works
focus on using complicated compositionality functions to model SC while few works
consider external knowledge in models. In
this paper, we verify the effectiveness of sememes, the minimum semantic units of human languages, in modeling SC by a confirmatory experiment. Furthermore, we make
the first attempt to incorporate sememe knowledge into SC models, and employ the sememeincorporated models in learning representations of multiword expressions, a typical task
of SC. In experiments, we implement our models by incorporating knowledge from a famous
sememe knowledge base HowNet and perform
both intrinsic and extrinsic evaluations. Experimental results show that our models achieve
significant performance boost as compared to
the baseline methods without considering sememe knowledge. We further conduct quantitative analysis and case studies to demonstrate the effectiveness of applying sememe
knowledge in modeling SC. All the code and
data of this paper can be obtained on https:
//github.com/thunlp/Sememe-SC.