资源论文VQ -WAV 2VEC :S ELF -S UPERVISED LEARNING OFD ISCRETE SPEECH REPRESENTATIONS

VQ -WAV 2VEC :S ELF -S UPERVISED LEARNING OFD ISCRETE SPEECH REPRESENTATIONS

2019-12-31 | |  70 |   41 |   0

Abstract

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a Gumbel-Softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.1

上一篇:MASSIVELY MULTILINGUAL SPARSE WORD REPRE -SENTATIONS

下一篇:STRUCT BERT: INCORPORATING LANGUAGE STRUC -TURES INTO PRE -TRAINING FOR DEEP LANGUAGE UN -DERSTANDING

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...