Train the model and save. The default parameters are pretty much the same as int the original paper.
Raise the flag --preprocess when execute the first time.
There are some files decoded in the samples folder.
Differences from paper
window size: 400 >> depend on minimum_f0 (cuz I use pyworld to get f0 and mcc coefficients)
TODO
Zero padding.
Injected noise.
Voiced/unvoiced conditional sampling.
Post-synthesis denoising.
Notes
I combine two 1x1 convolution kernel to one 1x2 dilated kernel.
This can remove redundant bias parameters and accelerate total speed.
The author said in the middle layers the channels size are 128 not 256.
My model will get stuck at the begining (loss aroung 4.x) for thousands of step, then go down very fast to 2.6 ~ 3.0.
Use smaller learning rate can help a little bit.
Variations of FFTNet
Radix-N FFTNet
Use the flag --radixs to specify each layer's radix.
# a radix-4 FFTNet with 1024 receptive field
python train.py --radixs 4 4 4 4 4
The original FFtNet use Radix-2 structure. In my experiment, a radix-4 network can still achieved similar result,
even radix-8, and by reduce the number of layers, it can run faster.
Transposed FFTNet
Fig. 2 in the paper can be redraw as dilated structure with kernel size 2 (also means radix size 2).
If we draw all the lines;
and transpose the the graph to let the arrows go backward, you'll find a WaveNet dilated structure.
Add the flag --transpose, you can get a simplified version of WaveNet.
# a WaveNet-like structure model withou gated/residual/skip unit.
python train.py --transpose
In my experiment, the transposed models are more easy to train and have slightly lower training loss compare to FFTNet.