Towards Explainable NLP: A Generative Explanation
Framework for Text Classification
Abstract
Building explainable systems is a critical problem in the field of Natural Language Processing (NLP), since most machine learning models provide no explanations for the predictions.
Existing approaches for explainable machine
learning systems tend to focus on interpreting the outputs or the connections between inputs and outputs. However, the fine-grained
information (e.g. textual explanations for the
labels) is often ignored, and the systems do
not explicitly generate the human-readable explanations. To solve this problem, we propose a novel generative explanation framework that learns to make classification decisions and generate fine-grained explanations
at the same time. More specifically, we introduce the explainable factor and the minimum
risk training approach that learn to generate
more reasonable explanations. We construct
two new datasets that contain summaries, rating scores, and fine-grained reasons. We conduct experiments on both datasets, comparing with several strong neural network baseline systems. Experimental results show that
our method surpasses all baselines on both
datasets, and is able to generate concise explanations at the same time.