Tacotron 2
A PyTorch implementation of Tacotron2, described in Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions, an end-to-end text-to-speech(TTS) neural network architecture, which directly converts character text sequence to speech.
Dataset
Aishell Dataset, containing 400 speakers and over 170 hours of Mandarin speech data.
Dependency
Python 3.5.2
PyTorch 1.0.0
Usage
Data Pre-processing
Extract data_aishell.tgz:
$ python extract.py
Extract wav files into train/dev/test folders:
$ cd data/data_aishell/wav
$ find . -name '*.tar.gz' -execdir tar -xzvf '{}' ;
Scan transcript data, generate features:
$ python pre_process.py
Now the folder structure under data folder is sth. like:
data/
data_aishell.tgz
data_aishell/
transcript/
aishell_transcript_v0.8.txt
wav/
train/
dev/
test/
aishell.pickle
Train
$ python train.py
If you want to visualize during training, run in your terminal:
$ tensorboard --logdir runs
Demo
Generate mel-spectrogram for text "Waveglow is really awesome!"
$ python demo.py