资源算法 Speech-Transformer

Speech-Transformer

2020-01-10 | |  28 |   0 |   0

Speech Transformer: End-to-End ASR with Transformer

A PyTorch implementation of Speech Transformer [1], an end-to-end automatic speech recognition with Transformer network, which directly converts acoustic features to character sequence using a single nueral network.

Ad: Welcome to join Kwai Speech Team, make your career great! Send your resume to: xukaituo [at] kuaishou [dot] com!
广告时间:欢迎加入快手语音组,make your career great! 快发送简历到xukaituo [at] kuaishou [dot] com吧!
広告:Kwai チームへようこそ!自分のキャリアを照らそう!レジュメをこちらへ: xukaituo [at] kuaishou [dot] com!

Install

  • Python3 (recommend Anaconda)

  • PyTorch 0.4.1+

  • Kaldi (just for feature extraction)

  • pip install -r requirements.txt

  • cd tools; make KALDI=/path/to/kaldi

  • If you want to run egs/aishell/run.sh, download aishell dataset for free.

Usage

Quick start

$ cd egs/aishell# Modify aishell data path to your path in the begining of run.sh $ bash run.sh

That's all!

You can change parameter by $ bash run.sh --parameter_name parameter_value, egs, $ bash run.sh --stage 3. See parameter name in egs/aishell/run.sh before . utils/parse_options.sh.

Workflow

Workflow of egs/aishell/run.sh:

  • Stage 0: Data Preparation

  • Stage 1: Feature Generation

  • Stage 2: Dictionary and Json Data Preparation

  • Stage 3: Network Training

  • Stage 4: Decoding

More detail

egs/aishell/run.sh provide example usage.

# Set PATH and PYTHONPATH$ cd egs/aishell/; . ./path.sh# Train$ train.py -h# Decode$ recognize.py -h

How to visualize loss?

If you want to visualize your loss, you can use visdom to do that:

  1. Open a new terminal in your remote server (recommend tmux) and run $ visdom.

  2. Open a new terminal and run $ bash run.sh --visdom 1 --visdom_id "<any-string>" or $ train.py ... --visdom 1 --vidsdom_id "<any-string>".

  3. Open your browser and type <your-remote-server-ip>:8097, egs, 127.0.0.1:8097.

  4. In visdom website, chose <any-string> in Environment to see your loss.

train-k0.2-bf15000-shuffle-ls0.1.png

How to resume training?

$ bash run.sh --continue_from <model-path>

How to solve out of memory?

When happened in training, try to reduce batch_size. $ bash run.sh --batch_size <lower-value>.

Results

ModelCERConfig
LSTMP9.854x(1024-512). See kaldi-ktnet1
Listen, Attend and Spell13.2See Listen-Attend-Spell's egs/aishell/run.sh
SpeechTransformer12.8See egs/aishell/run.sh

Reference

  • [1] Yuanyuan Zhao, Jie Li, Xiaorui Wang, and Yan Li. "The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition." ICASSP 2019.


上一篇:page-transformer

下一篇:transformer-tensorflow

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • shih-styletransfer

    shih-styletransfer Code from Style Transfer ...