资源算法Waveglow_Inference_in_CUDA

Waveglow_Inference_in_CUDA

2019-12-26 | |  31 |   0 |   0

Waveglow_Inference_in_CUDA

C++ Code to run  optimized inference  in CUDA of Waveglow, this implementation gives 25% speedup over Nvidia's Pytorch implementation in full precision and 2.5-3x speedup when using TensorCore

By default, this code will use GPU's TensorCore when running on NVIDIA's Volta GPU

Waveglow

Cuda C++ implementation of NVIDIA's Waveglow.

The model architecture based on flows is described in this paper. WaveGlow: a Flow-based Generative Network for Speech Synthesis.

Waveglow, a flow-based network is capable of generating high quality speech from mel-spectograms. It combines insights from Glow and  Wavenet  in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression.

WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable.

Paper claims that  in full-precision (32 bit float) waveglow produces speech at the 500kHz on V100 but typically it is about 300-325kHz with pytorch's implementation and 400-420kHz using our implementation in full precision and around 1000kHz using TensorCore in full precision.

Repository Structure

cpp
├── common			(All common files; logger, utils, numpy reader)
│   └── header
│   ├── src
│        
├── sys		        (ML units i.e conv, dense, activation)
│   └── header
│   ├── src      	
│   
├── Waveglow		(WN, upsample, main)
│   └── header
│   ├── src  
├── tools
	└── get_waveglow_weights.py
	└── npy_2_aud.py	

Getting Started

  1. Git clone the repository

  2. Download waveglow_weights

  3. Download mel_spectrograms

  4. Update waveglow_weights path in waveglow/header/hparams.hpp file

  5. Run this

    make
    ls -d path_2_mel_folder  >  filename.txt
    ./waveglow_tts filename.txt OutputDir
    python tools/npy_2_aud.py OutputDir
  1. Audio will be stored in OutputDir in .wav format

Training

You can also train your model using this and then use copy tools/get_waveglow_weights.py file in waveglow folder and run

 python get_waveglow_weights.py <checkpoint path>

Inference and Results

Currently the code takes around 250ms to generate 10secs of speech

Resources and references


上一篇:caffe-yolo

下一篇:waveglow-vqvae

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • shih-styletransfer

    shih-styletransfer Code from Style Transfer ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...