Tactical Rewind: Self-Correction via Backtrackingin Vision-and-Language Navigation
Abstract
We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Roomto-Room (R2R) Vision-and-Language navigation challenge
of Anderson et. al. (2018). Given a natural language instruction and photo-realistic image views of a previously
unseen environment, the agent was tasked with navigating
from source to target location as quickly as possible. While
all current approaches make local action decisions or score
entire trajectories using beam search, ours balances local
and global signals when exploring an unobserved environment. Importantly, this lets us act greedily but use global
signals to backtrack when necessary. Applying FAST framework to existing state-of-the-art models achieved a 17% relative gain, an absolute 6% gain on Success rate weighted
by Path Length (SPL).