WaveGlow: a Flow-based Generative Network for Speech Synthesis
Ryan Prenger, Rafael Valle, and Bryan Catanzaro
In our recent paper, we propose WaveGlow: a flow-based network capable of
generating high quality speech from mel-spectrograms. WaveGlow combines insights
from Glow and WaveNet in order to provide fast, efficient and high-quality
audio synthesis, without the need for auto-regression. WaveGlow is implemented
using only a single network, trained using only a single cost function:
maximizing the likelihood of the training data, which makes the training
procedure simple and stable.
Our PyTorch implementation produces audio samples at a rate of 4850
kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio
quality as good as the best publicly available WaveNet implementation.