Abstract
We examine the problem of joint top-down active search of multiple objects under interaction, e.g., person riding a bicycle, cups held by the table, etc.. Such objects under interaction often can provide contextual cues to each other to facilitate more effificient search. By treating each detector as an agent, we present the fifirst collaborative multiagent deep reinforcement learning algorithm to learn the optimal policy for joint active object localization, which effectively exploits such benefificial contextual information. We learn inter-agent communication through cross connections with gates between the Q-networks, which is facilitated by a novel multi-agent deep Q-learning algorithm with joint exploitation sampling. We verify our proposed method on multiple object detection benchmarks. Not only does our model help to improve the performance of state-of-the-art active localization models, it also reveals interesting codetection patterns that are intuitively interpretable