C++ Code to run optimized inference in CUDA of Waveglow, this implementation gives 25% speedup over Nvidia's Pytorch implementation in full precision and 2.5-3x speedup when using TensorCore
By default, this code will use GPU's TensorCore when running on NVIDIA's Volta GPU
Waveglow, a flow-based network is capable of generating high quality speech from mel-spectograms. It combines insights from Glow and Wavenet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression.
WaveGlow is implemented using only a single network, trained using
only a single cost function: maximizing the likelihood of the training
data, which makes the training procedure simple and stable.
Paper claims that in full-precision (32 bit float) waveglow produces speech at the 500kHz on V100 but typically it is about 300-325kHz with pytorch's implementation and 400-420kHz using our implementation in full precision and around 1000kHz using TensorCore in full precision.