Training RNNs as Fast as CNNs

资源分类

2019-09-11 |

123 |

0 |

Training RNNs as Fast as CNNs

About

SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.

Average processing time of LSTM, conv2d and SRU, tested on GTX 1070

For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.

Reference:

Training RNNs as Fast as CNNs

@article{lei2017sru,
  title={Training RNNs as Fast as CNNs},
 author={Tao Lei, Yu Zhang and Yoav Artzi},
  journal={arXiv preprint arXiv:1709.02755},
  year={2017}
}

Requirements

GPU and CUDA 8 are required
PyTorch
CuPy
pynvrtc

Install requirements via pip install -r requirements.txt. CuPy and pynvrtc needed to compile the CUDA code into a callable function at runtime. Only single GPU training is supported.

Examples

The usage of SRU is similar to nn.LSTM. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).

import torchfrom torch.autograd import Variablefrom cuda_functional import SRU, SRUCell# input has length 20, batch size 32 and dimension 128x = Variable(torch.FloatTensor(20, 32, 128).cuda())input_size, hidden_size = 128, 128rnn = SRU(input_size, hidden_size,
    num_layers = 2,          # number of stacking RNN layers
    dropout = 0.0,           # dropout applied between RNN layers
    rnn_dropout = 0.0,       # variational dropout applied on linear transformation
    use_tanh = 1,            # use tanh?
    use_relu = 0,            # use ReLU?
    use_selu = 0,            # use SeLU?
    bidirectional = False,   # bidirectional RNN ?
    weight_norm = False,     # apply weight normalization on parameters
    layer_norm = False,      # apply layer normalization on the output of each layer
    highway_bias = 0         # initial bias of highway gate (<= 0))rnn.cuda()output_states, c_states = rnn(x)      # forward pass# output_states is (length, batch size, number of directions * hidden size)# c_states is (layers, batch size, number of directions * hidden size)

Make sure cuda_functional.py and the shared library cuda/lib64 can be found by the system, e.g.

export LD_LIBRARY_PATH=/usr/local/cuda/lib64
export PYTHONPATH=path_to_repo/sru

Instead of using PYTHONPATH, the SRU module now can be installed as a regular package via python setup.py install or pip install. See this PR.

classification
question answering (SQuAD)
language modelling on PTB
speech recognition (Note: implemented in CNTK instead of PyTorch)

machine translation: SRU has been included in OpenNMT-py by Jianyu Zhan and Sasha Rush. Also thanks to @jingxil for testing. See results here.

Contributors

https://github.com/taolei87/sru/graphs/contributors

Other Implementations

@musyoku had a very nice SRU implementaion in chainer.

@adrianbg implemented the CPU version.

To-do

[x] ReLU activation
[x] support multi-GPU via nn.DataParallel (see example here)
[x] layer normalization
[x] weight normalization
[x] SeLU activation
[ ] residual
[ ] support packed sequence

上一篇：Mask R-CNN

下一篇：InferSent

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Keras-ResNeXt

Keras ResNeXt Implementation of ResNeXt models...
seetafaceJNI

项目介绍基于中科院seetaface2进行封装的JAVA...
spark-corenlp

This package wraps Stanford CoreNLP annotators ...
capsnet-with-caps...

CapsNet with capsule-wise convolution Project ...
inferno-boilerplate

This is a very basic boilerplate example for pe...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com