资源论文Localizing Unseen Activities in Video via Image Query

Localizing Unseen Activities in Video via Image Query

2019-10-09 | |  62 |   42 |   0

Abstract Action localization in untrimmed videos is an important topic in the fifield of video understanding. However, existing action localization methods are restricted to a pre-defifined set of actions and cannot localize unseen activities. Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization. This task faces three inherent challenges: (1) how to eliminate the inflfluence of semantically inessential contents in image queries; (2) how to deal with the fuzzy localization of inaccurate image queries; (3) how to determine the precise boundaries of target segments. We then propose a novel self-attention interaction localizer to retrieve unseen activities in an end-to-end fashion. Specififi- cally, we fifirst devise a region self-attention method with relative position encoding to learn fifine-grained image region representations. Then, we employ a local transformer encoder to build multi-step fusion and reasoning of image and video contents. We next adopt an order-sensitive localizer to directly retrieve the target segment. Furthermore, we construct a new dataset ActivityIBAL by reorganizing the ActivityNet dataset. The extensive experiments show the effectiveness of our method

上一篇:Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation

下一篇:MEGAN: A Generative Adversarial Network for Multi-View Network Embedding

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...