nninit
Parameter initialisation schemes for Torch7 neural network modules. Works with nn
, and therefore nngraph
. Allows arbitrary indexing of weights/biases/parameters. Supported modules:
nn.Linear / nn.LinearNoBias
nn.LookupTable
nn.TemporalConvolution
nn.SpatialConvolution / cudnn.SpatialConvolution
nn.VolumetricConvolution / cudnn.VolumetricConvolution
Readme contents:
luarocks install nninit
nninit
adds an init
method to nn.Module
, with the following API:
module:init(accessor, initialiser, ...)
The accessor
argument is used to extract the tensor to be initialised from the module. The initialiser
argument is a function that takes the module, tensor, and further
options; it adjusts the tensor and returns the module, allowing init
calls to be chained. nninit
comes with several initialiser functions. ...
represents additional arguments for the initialiser function.
The accessor
argument is used to extract the tensor to be initialised from the module. It can either be a string, table, or function.
The tensor is accessed as a property of the module. For example:
module:init('weight', nninit.constant, 1)
The tensor is first accessed as a property of the module from the first element, and a subtensor is then extracted using Torch's indexing operator applied to the second element. For example:
module:init({'weight', {{1, 5}, {}}}, nninit.uniform, -1, 1)
The tensor must be returned as the result of the function applied to the module. For example:
module:init(function(m) return m.weight:narrow(1, 1, 10) end, nninit.normal, 0, 0.01)
Copies the init
tensor to the tensor to be initialised.
Fills tensor with the constant val
.
Adds to current tensor with the constant val
.
Multiplies current tensor by the constant val
.
Fills tensor ~ N(mean
, stdv
).
Adds to current tensor with ~ N(mean
, stdv
).
Fills tensor ~ U(a
, b
).
Adds to current tensor with ~ U(a
, b
).
Only supports the module weights as the tensor. Relies on the module type to determine appropriate identity.
Fills weights with the identity matrix (for linear layers/lookup tables).
Fills filters with the Dirac delta function (for convolutional layers). Normalises by the number of input layers.
Fills tensor with stdv = gain * sqrt(2 / (fanIn + fanOut))
. Uses the uniform distribution by default.
Optional named parameters dist
and gain
can be passed in via a table.
Also known as Glorot initialisation.
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics.
Fills tensor with stdv = gain * sqrt(1 / fanIn)
. Uses the normal distribution by default.
Optional named parameters dist
and gain
can be passed in via a table. The initialisation scheme typically
includes the gain for ReLU units, which has to be manually specified in nninit.kaiming
with the option {gain = 'relu'}
.
Also known as He initialisation.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. arXiv preprint arXiv:1502.01852.
Only supports tensors with at least 2 dimensions.
Fills tensor with a (normally distributed) random orthogonal matrix.
Optional named parameter gain
can be passed in via a table.
Saxe, A. M., McClelland, J. L., & Ganguli, S. (2013). Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120.
Sets (1 - sparsity)
percent of the tensor to 0, where sparsity
is between 0 and 1. For example, a sparsity
of 0.2 drops out 80% of the tensor.
Martens, J. (2010). Deep learning via Hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning (ICML-10).
Only supports 2D convolutions with a symmetric filter size.
Fills convolution tensor with matrices that are orthogonal in the
frequency space.
The initialisation scheme described in the paper includes the gain for
ReLU units, which has to be manually specified with the option {gain = 'relu'}
.
The optional named parameter std
can be passed in via a table. It specifies the noise to break symmetry in the inverse Fourier transform.
Aghajanyan, A. (2017). Convolution Aware Initialization. arXiv preprint arXiv:1702.06295.
The 2 types of distribution supported are 'normal'
and 'uniform'
.
Gains can be calculated depending on the succeeding nonlinearity. If gain
is a number it is used directly; if gain
is a string the following mapping is used. By default gains (where applicable) are set to 1.
Gain | Parameters | Mapping |
---|---|---|
'linear' | 1 | |
'sigmoid' | 1 | |
'tanh' | 5 / 3 | |
'relu' | sqrt(2) | |
'lrelu' | leakiness | sqrt(2 / (1 + leakiness^2)) |
If the gain
must be calculated from additional parameters, gain
must be passed as table with the string as the first element as well as named parameters. For example:
module:init('weight', nninit.kaiming, {gain = {'lrelu', leakiness = 0.3}})
local nn = require 'nn'require 'cunn'local cudnn = require 'cudnn'require 'rnn'local nninit = require 'nninit'local getBias = function(module) return module.biasendlocal batchSize = 5local imgSize = 16local nChannels = 3local nFilters = 8local rho = 6local hiddenSize = 2local cnn = nn.Sequential() cnn:add(cudnn.SpatialConvolution(nChannels, nFilters, 2, 2):init('weight', nninit.eye) :init('weight', nninit.mulConstant, 1/2) :init('weight', nninit.addNormal, 0, 0.01) :init(getBias, nninit.constant, 0)) cnn:add(nn.View(nFilters*15*15)) cnn:add(nn.Linear(nFilters*15*15, nFilters):init('weight', nninit.kaiming, { dist = 'uniform', gain = {'lrelu', leakiness = 0.3} })) cnn:add(nn.RReLU(1/3, 1/3)) cnn:add(nn.Linear(nFilters, 6):init('weight', nninit.orthogonal, {gain = 'relu'})) cnn:add(cudnn.ReLU()) cnn:add(nn.Linear(6, 4):init('weight', nninit.xavier, {dist = 'normal', gain = 1.1})) cnn:add(nn.Linear(4, hiddenSize):init('weight', nninit.sparse, 0.2) :init(getBias, nninit.constant, 0))local model = nn.Sequential() model:add(nn.Sequencer(cnn))local lstm = nn.FastLSTM(hiddenSize, hiddenSize, rho)-- Note that chaining will pass through the module initialised, never parentslstm.i2g:init({'bias', {{2*hiddenSize+1, 3*hiddenSize}}}, nninit.constant, 1) -- High forget gate biasmodel:add(nn.Sequencer(lstm)) model:cuda()local inputs = {}for i = 1, rho do table.insert(inputs, torch.ones(batchSize, nChannels, imgSize, imgSize):cuda())endprint(model:forward(inputs))
To develop nninit/use it to test new initialisation schemes, git clone
/download this repo and use luarocks make rocks/nninit-scm-1.rockspec
to install nninit locally.
还没有评论,说两句吧!
热门资源
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
shih-styletransfer
shih-styletransfer Code from Style Transfer ...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com