voicefilter
Unofficial PyTorch implementation of Google AI's: VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.
Training took about 20 hours on AWS p3.2xlarge(NVIDIA V100).
Listen to audio sample at webpage: http://swpark.me/voicefilter/
| Median SDR | Paper | Ours | | ---------------------- | ----- | ---- | | before VoiceFilter | 2.5 | 1.9 | | after VoiceFilter | 12.6 | 10.2 |
SDR converged at 10, which is slightly lower than paper's.
Python and packages
This code was tested on Python 3.6 with PyTorch 1.0.1. Other packages can be installed by:
pip install -r requirements.txt
Miscellaneous
ffmpeg-normalize is used for resampling and normalizing wav files. See README.md of ffmpeg-normalize for installation.
Download LibriSpeech dataset
To replicate VoiceFilter paper, get LibriSpeech dataset at http://www.openslr.org/12/. train-clear-100.tar.gz
(6.3G) contains speech of 252 speakers, and train-clear-360.tar.gz
(23G) contains 922 speakers. You may use either, but the more speakers you have in dataset, the more better VoiceFilter will be.
Resample & Normalize wav files
First, unzip tar.gz
file to desired folder:
tar -xvzf train-clear-360.tar.gz
Next, copy utils/normalize-resample.sh
to root directory of unzipped data folder. Then:
vim normalize-resample.sh # set "N" as your CPU core number.chmod a+x normalize-resample.sh ./normalize-resample.sh # this may take long
Edit config.yaml
cd config cp default.yaml config.yaml vim config.yaml
Preprocess wav files
In order to boost training speed, perform STFT for each files before training by:
python generator.py -c [config yaml] -d [data directory] -o [output directory] -p [processes to run]
This will create 100,000(train) + 1000(test) data. (About 160G)
Get pretrained model for speaker recognition system
VoiceFilter utilizes speaker recognition system (d-vector embeddings). Here, we provide pretrained model for obtaining d-vector embeddings.
This model was trained with VoxCeleb2 dataset, where utterances are randomly fit to time length [70, 90] frames. Tests are done with window 80 / hop 40 and have shown equal error rate about 1%. Data used for test were selected from first 8 speakers of VoxCeleb1 test dataset, where 10 utterances per each speakers are randomly selected.
The model can be downloaded at this GDrive link.
Run
After specifying train_dir
, test_dir
at config.yaml
, run:
python trainer.py -c [config yaml] -e [path of embedder pt file] -m [name]
This will create chkpt/name
and logs/name
at base directory(-b
option, .
in default)
View tensorboardX
tensorboard --logdir ./logs
Resuming from checkpoint
python trainer.py -c [config yaml] --checkpoint_path [chkpt/name/chkpt_{step}.pt] -e [path of embedder pt file] -m name
python inference.py -c [config yaml] -e [path of embedder pt file] --checkpoint_path [path of chkpt pt file] -m [path of mixed wav file] -r [path of reference wav file] -o [output directory]
These are some of my personal opinions for improvement. If you have other ideas, don't hesitate to open issue.
Masks performed poorly on high-frequency channels.
Training embedder system with linear-scale spectrogram instead of mel might improve this.
Replace zero-padding with partial convolution.
Try power-law compressed reconstruction error as loss function, instead of MSE.
Tried power=0.3
, but failed.
Seungwon Park at MINDsLab (yyyyy@snu.ac.kr, swpark@mindslab.ai)
Apache License 2.0
This repository contains codes adapted/copied from the followings: - utils/adabound.py from https://github.com/Luolc/AdaBound (Apache License 2.0) - utils/audio.py from https://github.com/keithito/tacotron (MIT License) - utils/hparams.py from https://github.com/HarryVolek/PyTorch_Speaker_Verification (No License specified) - utils/normalize-resample.sh from https://unix.stackexchange.com/a/216475
上一篇:YOLOv2
下一篇:Random-Erasing
还没有评论,说两句吧!
热门资源
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com