Project DeepSpeech
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.
NOTE: This documentation applies to the master branch
of DeepSpeech only. If you're using a stable release, you must use the
documentation for the corresponding version by using GitHub's branch
switcher button above.
To install and use deepspeech all you have to do is:
# Create and activate a virtualenvvirtualenv -p python3 $HOME/tmp/deepspeech-venv/source $HOME/tmp/deepspeech-venv/bin/activate# Install DeepSpeechpip3 install deepspeech# Download pre-trained English model and extractcurl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz
tar xvf deepspeech-0.6.0-models.tar.gz# Download example audio filescurl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz
tar xvf audio-0.6.0.tar.gz# Transcribe an audio filedeepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav
A pre-trained English model is available for use and can be downloaded using the instructions below.
Currently, only 16-bit, 16 kHz, mono-channel WAVE audio files are
supported in the Python client. A package with some example audio files
is available for download in our release notes.
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the release notes to find which GPUs are supported. To run deepspeech
on a GPU, install the GPU specific package:
# Create and activate a virtualenvvirtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/source $HOME/tmp/deepspeech-gpu-venv/bin/activate# Install DeepSpeech CUDA enabled packagepip3 install deepspeech-gpu# Transcribe an audio file.deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio audio/2830-3980-0043.wav
Please ensure you have the required CUDA dependencies.
See the output of deepspeech -h
for more information on the use of deepspeech
. (If you experience problems running deepspeech
, please check required runtime dependencies).
Table of Contents