Pay attention when you pay the bills. A multilingual corpus with
dependency-based and semantic annotation of collocations
Abstract
This paper presents a multilingual corpus with
semantic annotation of collocations in English, Portuguese, and Spanish. The whole
resource contains 155k tokens and 1, 526
collocations labeled in context. The annotated examples belong to three syntactic structures (adjective-noun, verb-object, and nominal compounds), and represent 60 lexical functions in the Meaning-Text Theory (e.g., Oper,
Magn, Bon, etc.). Each collocation was annotated by three linguists and the final resource
was revised by a team of experts. The resulting
corpora, which are freely available, can serve
as a basis to evaluate different approaches for
collocation identification and classification in
three languages, which in turn can be useful
for different NLP tasks such as natural language generation or understanding