资源算法python_speech_features

python_speech_features

2019-12-19 | |  60 |   0 |   0

python_speech_features

This library provides common speech features for ASR including MFCCs and filterbank energies. If you are not sure what MFCCs are, and would like to know more have a look at thisMFCC tutorial

Project Documentation

Installation

This project is on pypi

To install from pypi:

pip install python_speech_features

From this repository:

git clone https://github.com/jameslyons/python_speech_features
python setup.py develop

Usage

Supported features:

  • Mel Frequency Cepstral Coefficients

  • Filterbank Energies

  • Log Filterbank Energies

  • Spectral Subband Centroids

Example use

From here you can write the features to a file etc.

MFCC Features

The default parameters should work fairly well for most cases, if you want to change the MFCC parameters, the following parameters are supported:

python
def mfcc(signal,samplerate=16000,winlen=0.025,winstep=0.01,numcep=13,
                 nfilt=26,nfft=512,lowfreq=0,highfreq=None,preemph=0.97,
     ceplifter=22,appendEnergy=True)
ParameterDescription
signalthe audio signal from which to compute features. Should be an N*1 array
sampleratethe samplerate of the signal we are working with.
winlenthe length of the analysis window in seconds. Default is 0.025s (25 milliseconds)
winstepthe step between successive windows in seconds. Default is 0.01s (10 milliseconds)
numcepthe number of cepstrum to return, default 13
nfiltthe number of filters in the filterbank, default 26.
nfftthe FFT size. Default is 512
lowfreqlowest band edge of mel filters. In Hz, default is 0
highfreqhighest band edge of mel filters. In Hz, default is samplerate/2
preemphapply preemphasis filter with preemph as coefficient. 0 is no filter. Default is 0.97
ceplifterapply a lifter to final cepstral coefficients. 0 is no lifter. Default is 22
appendEnergyif this is true, the zeroth cepstral coefficient is replaced with the log of the total frame energy.
returnsA numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.

Filterbank Features

These filters are raw filterbank energies. For most applications you will want the logarithm of these features. The default parameters should work fairly well for most cases. If you want to change the fbank parameters, the following parameters are supported:

python
def fbank(signal,samplerate=16000,winlen=0.025,winstep=0.01,
      nfilt=26,nfft=512,lowfreq=0,highfreq=None,preemph=0.97)
ParameterDescription
signalthe audio signal from which to compute features. Should be an N*1 array
sampleratethe samplerate of the signal we are working with
winlenthe length of the analysis window in seconds. Default is 0.025s (25 milliseconds)
winstepthe step between seccessive windows in seconds. Default is 0.01s (10 milliseconds)
nfiltthe number of filters in the filterbank, default 26.
nfftthe FFT size. Default is 512.
lowfreqlowest band edge of mel filters. In Hz, default is 0
highfreqhighest band edge of mel filters. In Hz, default is samplerate/2
preemphapply preemphasis filter with preemph as coefficient. 0 is no filter. Default is 0.97
returnsA numpy array of size (NUMFRAMES by nfilt) containing features. Each row holds 1 feature vector. The second return value is the energy in each frame (total energy, unwindowed)

Reference

sample english.wav obtained from:

wget http://voyager.jpl.nasa.gov/spacecraft/audio/english.au
sox english.au -e signed-integer english.wav


上一篇:Automatic_Speech_Recognition

下一篇:etools-t2f

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...