Unsupervised Learning of Patterns in Data Streams Using Compression and Edit Distance

资源分类

2019-11-13 |

60 |

39 |

Abstract
Many unsupervised learning methods for recognising patterns in data streams are based on ﬁxed length data sequences, which makes them unsuitable for applications where the data sequences are of variable length such as in speech recognition, behaviour recognition and text classiﬁcation. In order to use these methods on variable length data sequences, a pre-processing step is required to manually segment the data and select the appropriate features, which is often not practical in real-world applications. In this paper we suggest an unsupervised learning method that handles variable length data sequences by identifying structure in the data stream using text compression and the edit distance between ‘words’. We demonstrate that using this method we can automatically cluster unlabelled data in a data stream and perform segmentation. We evaluate the effectiveness of our proposed method using both ﬁxed length and variable length benchmark datasets, comparing it to the Self-Organising Map in the ﬁrst case. The results show a promising improvement over baseline recognition systems.

上一篇：Unsupervised Lexicon Acquisition for HPSG-Based Relation Extraction

下一篇：Combining Supervised and Unsupervised Models via Unconstrained Probabilistic Embedding

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com