资源论文Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

2020-03-16 | |  71 |   31 |   0

Abstract

In this work, we propose “global style tokens” (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to end speech synthesis system. The embeddings are trained with no explicit labels, yet learn to model a large range of acoustic expressiveness. GSTs lead to a rich set of significant results. Th soft interpretable “labels” they generate can be used to control synthesis in novel ways, such as varying speed and speaking style – independently of the text content. They can also be used for style transfer, replicating the speaking style of single audio clip across an entire long-form text corpus. When trained on noisy, unlabeled found data, GSTs learn to factorize noise and speaker identity, providing a path towards highly scalable but robust speech synthesis.

上一篇:Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction

下一篇:Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

用户评价
全部评价

热门资源

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...