Abstract
There has been significant progresses for image object
detection in recent years. Nevertheless, video object detection has received little attention, although it is more challenging and more important in practical scenarios.
Built upon the recent works [37, 36], this work proposes
a unified approach based on the principle of multi-frame
end-to-end learning of features and cross-frame motion.
Our approach extends prior works with three new techniques and steadily pushes forward the performance envelope (speed-accuracy tradeoff), towards high performance
video object detection