Abstract
With the tremendous advances made by Convolutional
Neural Networks (ConvNets) on object recognition, we can
now easily obtain adequately reliable machine-labeled annotations easily from predictions by off-the-shelf ConvNets.
In this work, we present an “abstraction memory” based
framework for few-shot learning, building upon machinelabeled image annotations. Our method takes large-scale
machine-annotated dataset (e.g., OpenImages) as an external memory bank. In the external memory bank, the information is stored in the memory slots in the form of keyvalue, in which image feature is regarded as the key and the
label embedding serves as the value. When queried by the
few-shot examples, our model selects visually similar data
from the external memory bank and writes the useful information obtained from related external data into another
memory bank, i.e. abstraction memory. Long Short-Term
Memory (LSTM) controllers and attention mechanisms are
utilized to guarantee the data written to the abstraction
memory correlates with the query example. The abstraction
memory concentrates information from the external memory bank to make the few-shot recognition effective. In
the experiments, we first confirm that our model can learn
to conduct few-shot object recognition on clean humanlabeled data from the ImageNet dataset. Then, we demonstrate that with our model, machine-labeled image annotations are very effective and abundant resources for performing object recognition on novel categories. Experimental
results show that our proposed model with machine-labeled
annotations achieves great results, with only a 1% difference in accuracy between the machine-labeled annotations
and the human-labeled annotations.