Bidirectional Beam Search: Forward-Backward Inference in
Neural Sequence Models for Fill-in-the-Blank Image Captioning
Abstract
We develop the first approximate inference algorithm for
1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason
about both forward and backward time dependencies.
Beam Search (BS) is a widely used approximate inference
algorithm for decoding sequences from unidirectional neural sequence models. Interestingly, approximate inference
in bidirectional models remains an open problem, despite
their significant advantage in modeling information from
both the past and future. To enable the use of bidirectional
models, we present Bidirectional Beam Search (BiBS), an
efficient algorithm for approximate bidirectional inference.
To evaluate our method and as an interesting problem in
its own right, we introduce a novel Fill-in-the-Blank Image
Captioning task which requires reasoning about both past
and future sentence structure to reconstruct sensible image
descriptions. We use this task as well as the Visual Madlibs
dataset to demonstrate the effectiveness of our approach,
consistently outperforming all baseline methods