资源论文Fitting New Speakers Based on a Short Untranscribed Sample

Fitting New Speakers Based on a Short Untranscribed Sample

2020-03-16 | |  53 |   35 |   0

Abstract

Learning-based Text To Speech systems have the potential to generalize from one speaker to the next and thus require a relatively short sample of any new voice. However, this promise is currently largely unrealized. We present a method that is designed to capture a new speaker from a short untranscribed audio sample. This is done by employing an additional network that given an audio sample, places the speaker in the embedding space. This network is trained as part of the speech synthesis system using various consistency losses. Our results demonstrate a greatly im proved performance on both the dataset speakers, and, more importantly, when fitting new voices, even from very short samples.

上一篇:Semi-Supervised Learning via Compact Latent Space Clustering

下一篇:A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...