Abstract
Interpretability of machine learning (ML)
models becomes more relevant with their increasing adoption. In this work, we address the
interpretability of ML based question answering (QA) models on a combination of knowledge bases (KB) and text documents. We
adapt post hoc explanation methods such as
LIME and input perturbation (IP) and compare them with the self-explanatory attention
mechanism of the model. For this purpose, we
propose an automatic evaluation paradigm for
explanation methods in the context of QA. We
also conduct a study with human annotators to
evaluate whether explanations help them identify better QA models. Our results suggest that
IP provides better explanations than LIME or
attention, according to both automatic and human evaluation. We obtain the same ranking
of methods in both experiments, which supports the validity of our automatic evaluation
paradigm