资源算法Waveglow_Inference_in_CUDA

Waveglow_Inference_in_CUDA

2019-12-26 | |  42 |   0 |   0

Waveglow_Inference_in_CUDA

C++ Code to run  optimized inference  in CUDA of Waveglow, this implementation gives 25% speedup over Nvidia's Pytorch implementation in full precision and 2.5-3x speedup when using TensorCore

By default, this code will use GPU's TensorCore when running on NVIDIA's Volta GPU

Waveglow

Cuda C++ implementation of NVIDIA's Waveglow.

The model architecture based on flows is described in this paper. WaveGlow: a Flow-based Generative Network for Speech Synthesis.

Waveglow, a flow-based network is capable of generating high quality speech from mel-spectograms. It combines insights from Glow and  Wavenet  in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression.

WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable.

Paper claims that  in full-precision (32 bit float) waveglow produces speech at the 500kHz on V100 but typically it is about 300-325kHz with pytorch's implementation and 400-420kHz using our implementation in full precision and around 1000kHz using TensorCore in full precision.

Repository Structure

cpp
├── common			(All common files; logger, utils, numpy reader)
│   └── header
│   ├── src
│        
├── sys		        (ML units i.e conv, dense, activation)
│   └── header
│   ├── src      	
│   
├── Waveglow		(WN, upsample, main)
│   └── header
│   ├── src  
├── tools
	└── get_waveglow_weights.py
	└── npy_2_aud.py	

Getting Started

  1. Git clone the repository

  2. Download waveglow_weights

  3. Download mel_spectrograms

  4. Update waveglow_weights path in waveglow/header/hparams.hpp file

  5. Run this

    make
    ls -d path_2_mel_folder  >  filename.txt
    ./waveglow_tts filename.txt OutputDir
    python tools/npy_2_aud.py OutputDir
  1. Audio will be stored in OutputDir in .wav format

Training

You can also train your model using this and then use copy tools/get_waveglow_weights.py file in waveglow folder and run

 python get_waveglow_weights.py <checkpoint path>

Inference and Results

Currently the code takes around 250ms to generate 10secs of speech

Resources and references


上一篇:caffe-yolo

下一篇:waveglow-vqvae

用户评价
全部评价

热门资源

  • TensorFlow-Course

    This repository aims to provide simple and read...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • mxnet_VanillaCNN

    This is a mxnet implementation of the Vanilla C...

  • DuReader_QANet_BiDAF

    Machine Reading Comprehension on DuReader Usin...

  • Klukshu-Sockeye-...

    KLUKSHU SOCKEYE PROJECTS 2016 This repositor...