Abstract
In this work, we introduce a new algorithm for analyzing a diagram, which contains visual and textual information in an abstract and integrated way. Whereas diagrams contain richer information compared with individual image-based or language-based data, proper solutions
for automatically understanding them have not been proposed due to their innate characteristics of multi-modality
and arbitrariness of layouts. To tackle this problem, we
propose a unified diagram-parsing network for generating
knowledge from diagrams based on an object detector and
a recurrent neural network designed for a graphical structure. Specifically, we propose a dynamic graph-generation
network that is based on dynamic memory and graph theory. We explore the dynamics of information in a diagram
with activation of gates in gated recurrent unit (GRU) cells.
On publicly available diagram datasets, our model demonstrates a state-of-the-art result that outperforms other baselines. Moreover, further experiments on question answering
shows potentials of the proposed method for various applications