Abstract
The recently introduced BERT model exhibits
strong performance on several language understanding benchmarks. In this paper, we
describe a simple re-implementation of BERT
for commonsense reasoning. We show that the
attentions produced by BERT can be directly
utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema
Challenge. Our proposed attention-guided
commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by
a margin. While results suggest that BERT
seems to implicitly learn to establish complex
relationships between entities, solving commonsense reasoning tasks might require more
than unsupervised models learned from huge
text corpora.