Standard version: exactly the same as DNN, feed LSTM nnet into
nnet-forward as AM scorer, remember nnet-forward doesn't have a
mechanism to delay target, so "time-shift" component is needed to do
this.
Google version:
convert binary nnet into text format via nnet-copy, and open text nnet with your text editor
change "Transmit" component to "TimeShift", keep your setup consistent with "--targets-delay" used in nnet-train-lstm-streams
edit "LstmProjectedStreams" to "LstmProjected", remove "NumStream"
tag, now the "google version" is converted to "standard version", and
you can perform AM scoring via nnet-forward, e.g:
In google's paper, two layers of medium-sized LSTM is the best setup to beat DNN on WER. You can do this by text level editing:
use some of your training data to train one layer LSTM nnet
convert it into text format with nnet-copy with "--binary=false"
insert a pre-initialized LSTM component text between softmax and
your pretrained LSTM, and you can feed all your training data to the
stacked LSTM, e.g:
Q3. How do I know when to use "Transmit" or "TimeShift"?
The key is how you apply "target-delay".
standard version: the nnet should be trained with "TimeShift"
because default nnet1 training tool (nnet-train-frame-shuf &
nnet-train-perutt) doesn't provide target delay.
google version: due to the complexity of multi-stream training, the
training tool "nnet-train-lstm-streams" provides an option
"--target-delay", so in multi-stream training, a dummy "Transmit"
component is used for a trivial reason related to how nnet1 calls
Backpropagate(). But in testing time, the google version is first
converted to standard version, so the "transmit" should also be switched
to "TimeShift" during the conversion.
Q4. Why are the "dropout" codes commented out?
I implemented the "forward-connection droping out" according another
paper from google, but later I didn't implement dropout retention, so
the effects of dropout are not tested at all, and I leave it commented
out.