A Fast Approximate AIB Algorithm for Distributional Word Clustering

资源分类

2019-11-28 |

52 |

38 |

Abstract Distributional word clustering merges the words having similar probability distributions to attain reliable parameter estimation, compact classifification models and even better classifification performance. Agglomerative Information Bottleneck (AIB) is one of the typical word clustering algorithms and has been applied to both traditional text classifification and recent image recognition. Although enjoying theoretical elegance, AIB has one main issue on its computational effificiency, especially when clustering a large number of words. Different from existing solutions to this issue, we analyze the characteristics of its objective function — the loss of mutual information, and show that by merely using the ratio of word-class joint probabilities of each word, good candidate word pairs for merging can be easily identifified. Based on this fifinding, we propose a fast approximate AIB algorithm and show that it can signifificantly improve the computational effificiency of AIB while well maintaining or even slightly increasing its classifification performance. Experimental study on both text and image classifification benchmark data sets shows that our algorithm can achieve more than 100 times speedup on large real data sets over the state-of-the-art method.

上一篇：Sketch Tokens: A Learned Mid-levelRepresentation for Contour and Object Detection

下一篇：Rolling Shutter Camera Calibration

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com