资源论文Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

2019-09-19 | |  68 |   30 |   0 0 0
Abstract Previous work on end-to-end translation from speech has primarily used frame-level features as speech representations, which creates longer, sparser sequences than text. We show that a na¨?ve method to create compressed phoneme-like speech representations is far more effective and efficient for translation than traditional frame-level speech features. Specifically, we generate phoneme labels for speech frames and average consecutive frames with the same label to create shorter, higher-level source sequences for translation. We see improvements of up to 5 BLEU on both our high and low resource language pairs, with a reduction in training time of 60%. Our improvements hold across multiple data sizes and two language pairs.

上一篇:Effective Adversarial Regularization for Neural Machine Translation

下一篇:HellaSwag: Can a Machine Really Finish Your Sentence

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...