Tacotron 2 (without wavenet)
PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions.
This implementation includes distributed and fp16 support and uses the LJSpeech dataset.
Distributed and FP16 support relies on work by Christian Sarofeen and NVIDIA's Apex Library.
Visit our [website] for audio samples using our published [Tacotron 2] and [WaveGlow] models.
Pre-requisites
NVIDIA GPU + CUDA cuDNN
Setup
Download and extract the LJ Speech dataset
Clone this repo: git clone https://github.com/NVIDIA/tacotron2.git
CD into this repo: cd tacotron2
Initialize submodule: git submodule init; git submodule update
Update .wav paths: sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' filelists/*.txt
Install [PyTorch 1.0]
Install python requirements or build docker image
Training
python train.py --output_directory=outdir --log_directory=logdir
(OPTIONAL) tensorboard --logdir=outdir/logdir
Multi-GPU (distributed) and FP16 Training
python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True
Inference demo
Download our published [Tacotron 2] model
Download our published [WaveGlow] model
jupyter notebook --ip=127.0.0.1 --port=31337
Load inference.ipynb
N.b. When performing Mel-Spectrogram to Audio synthesis, make sure Tacotron 2 and the Mel decoder were trained on the same mel-spectrogram representation.
Related repos
WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis
nv-wavenet Faster than real time WaveNet.
Acknowledgements
This implementation uses code from the following repos: Keith Ito, Prem Seetharaman as described in our code.
We are inspired by Ryuchi Yamamoto's Tacotron PyTorch implementation.
We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan Wang and Zongheng Yang.