PyTorch implementation ofEfficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attentionbased partially on the following projects:
Online Text-To-Speech Demo
The following notebooks are executable on https://colab.research.google.com :
For audio samples and pretrained models, visit the above notebook links.
Training/Synthesizing English Text-To-Speech
The English TTS uses the LJ-Speech dataset.
Download the dataset: python dl_and_preprop_dataset.py --dataset=ljspeech
Train the Text2Mel model: python train-text2mel.py --dataset=ljspeech
Train the SSRN model: python train-ssrn.py --dataset=ljspeech
Synthesize sentences: python synthesize.py --dataset=ljspeech
Training/Synthesizing Mongolian Text-To-Speech
The Mongolian text-to-speech uses 5 hours audio from the Mongolian Bible.
Download the dataset: python dl_and_preprop_dataset.py --dataset=mbspeech
Train the Text2Mel model: python train-text2mel.py --dataset=mbspeech
Train the SSRN model: python train-ssrn.py --dataset=mbspeech
Synthesize sentences: python synthesize.py --dataset=mbspeech