Abstract
Textbook Question Answering (TQA) is a task to choose
the most proper answers by reading a multi-modal context of abundant essays and images. TQA serves as a favorable test bed for visual and textual reasoning. However, most of the current methods are incapable of reasoning over the long contexts and images. To address this issue, we propose a novel approach of Instructor Guidance
with Memory Networks (IGMN) which conducts the TQA
task by finding contradictions between the candidate answers and their corresponding context. We build the Contradiction Entity-Relationship Graph (CERG) to extend the
passage-level multi-modal contradictions to an essay level.
The machine thus performs as an instructor to extract the
essay-level contradictions as the Guidance. Afterwards, we
exploit the memory networks to capture the information in
the Guidance, and use the attention mechanisms to jointly
reason over the global features of the multi-modal input.
Extensive experiments demonstrate that our method outperforms the state-of-the-arts on the TQA dataset. The source
code is available at https://github.com/freerailway/igmn