Sparse Word Embeddings Using `1 Regularized Online Learning

资源分类

2019-11-25 |

113 |

137 |

Abstract Recently, Word2Vec tool has attracted a lot of interest for its promising performances in a variety of natural language processing (NLP) tasks. However, a critical issue is that the dense word representations learned in Word2Vec are lacking of interpretability. It is natural to ask if one could improve their interpretability while keeping their performances. Inspired by the success of sparse models in enhancing interpretability, we propose to introduce sparse constraint into Word2Vec. Specifically, we take the Continuous Bag of Words (CBOW) model as an example in our study and add the `l regularizer into its learning objective. One challenge of optimization lies in that stochastic gradient descent (SGD) cannot directly produce sparse solutions with `1 regularizer in online training. To solve this problem, we employ the Regularized Dual Averaging (RDA) method, an online optimization algorithm for regularized stochastic learning. In this way, the learning process is very efficient and our model can scale up to very large corpus to derive sparse word representations. The proposed model is evaluated on both expressive power and interpretability. The results show that, compared with the original CBOW model, the proposed model can obtain state-of-the-art results with better interpretability using less than 10% non-zero elements.

上一篇：Robust Natural Language Processing — Combining Reasoning, Cognitive Semantics, and Construction Grammar for Spatial Language

下一篇：Chinese Song Iambics Generation with Neural Attention-Based Model

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Deep Cross-media ...

Cross-media retrieval is a research hotspot in ...
Regularizing RNNs...

Recently, caption generation with an encoder-de...
Learning Expressi...

Facial expression is temporally dynamic event w...
Attributed Graph ...

Graph clustering is a fundamental task which di...
Compact MDDs for ...

Pseudo-Boolean (PB) constraints are usually en...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com