资源论文Documents as multiple overlapping windows into a grid of counts

Documents as multiple overlapping windows into a grid of counts

2020-01-16 | |  70 |   52 |   0

Abstract

In text analysis documents are often represented as disorganized bags of words; models of such count features are typically based on mixing a small number of topics [1, 2]. Recently, it has been observed that for many text corpora documents evolve into one another in a smooth way, with some features dropping and new ones being introduced. The counting grid [3] models this spatial metaphor literally: it is a grid of word distributions learned in such a way that a document’s own distribution of features can be modeled as the sum of the histograms found in a window into the grid. The major drawback of this method is that it is essentially a mixture and all the content must be generated by a single contiguous area on the grid. This may be problematic especially for lower dimensional grids. In this paper, we overcome this issue by introducing the Componential Counting Grid which brings the componential nature of topic models to the basic counting grid. We evaluated our approach on document classification and multimodal retrieval obtaining state of the art results on standard benchmarks.

上一篇:Predicting Parameters in Deep Learning

下一篇:Scoring Workers in Crowdsourcing: How Many Control Questions are Enough

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...