macarico
An implementation of the imperative learning to search framework [1] in pytorch, compatible with automatic differentiation, for deep learning-based structured prediction and reinforcement learning.
[1] http://hal3.name/docs/daume16compiler.pdf
The basic structure is:
macarico/ base.py defines the abstract classes used for maçarico, such as Env, Policy, Features, Learner, Attention annealing.py tools for annealing, useful for eg DAgger util.py basic utility functions tasks/ example tasks, such as: sequence_labeler, dependency_parser, sequence2sequence, etc. all of these define an Env that can be run features/ contains example static features and dynamic features sequence.py defines two types of static features: RNNFeatures (obtained by running an RNN over the input) and BOWFeatures (simple bag of words). also defines useful attention models over sequences. actor.py defines two types of dynamic features: TransitionRNN (which is an actor that has an RNN-like hidden state), and TransitionBOW (which has no hidden state and instead just conditions on the previous actions) policies/ currently only implements a linear policy lts/ various learning to search algorithms, such as: maximum_likelihood, dagger, reinforce, aggrevate and LOLS tests/ run_tests.sh run all (or some) of the tests, compare the outputs to previous versions to make sure you didn't botch anything. (*please* run this before pushing changes.) test_util.py some utilities for running tests, such as train/eval loops, printint outputs, etc. nlp_data.py generate or load data for various natural language processing tasks. (requires external data.) test_X.py various tests for different parts of maçarico. if you develop something new, please create a test! output/ outputs from prvious runs
Take a look at existing tasks.
Create a new mytask.py file in macarico/tasks that defines:
an Example
class, which contains labeled examples for your task. This class must define a mk_env
function that returns an environment (Env
) particular to this task.
an Env
class, which defines how your environment works. It must provide run_episode
and loss
at the minimum. For some learning algorithms, it must provide rewind
(mostly for efficiency) and/or reference
.
if none of the existing attention models make sense for your task (this is the case for e.g. DependencyParser
), define your own Attention
mechanism.
make a test case in tests/test_mytask.py that tests it.
There are two types of features: static features (things that can be precomputed on the input before the environment starts running) and dynamic features (things that depend on the status of the environment).
For static features (like RNNFeatures
), create a class the derives from macarico.Features
(and probably also from nn.Module
if it has any of its own parameters). At a minimum, this must define its dimensionality and give a name to itself (called the field
). This field
can then be referenced either by other features or by Attention
modules. This should defined a _forward
method that computes the static features, and which will be cached automatically for you. It should return a tensor of dimension (M,dim), where M is arbitrary (but which must be compatible with Attention
) and where dim is the pre-declared dimensionality.
For dynamic features (like TransitionRNN
), create a class as before. However, instead of defining the static _forward
function, you must define your own dynamic forward
function. This can peek at state.t
and state.T
to get the current and maximum time step. It should return features /just/ for the current timestep, state.t
.
An Attention
mechanism tells a dynamic model where to look to access its features. There are type types: hard attention and soft attention.
A hard attention mechanism defines its field (which features it is attending to) and its arity (how many feature vectors does it attend to at any given time). Then, at runtime, given a state
, it must return the indices into the corresonding fields based on the state, where the number of indices is exactly equal to its arity.
A soft attention mechanism still defines its field but declares its arity to be None
. This means that instead of returning an /index/ into its input, it must return a /distribution/ over its input as a torch Variable tensor.
The most basic type of Learner
basically behaves like a Policy
, but additionally provides an update
function that, for instance, does backprop.
Perhaps the simplest example is MaximumLikelihood
, which just behaves according to a reference policy, but accumulates an objective function that's the sum of individual predictions. At update
time, it runs backprop on this objective.
One very important thing in Learners, is that even if they do not use the return value of their underlying policy, they must call the underlying policy every time they run. Why? Because the underlying policy may accumulate state (as in the case of TransitionRNN
) and if it is "skipped" the policy will become very confused because it will have missed some input.
A slightly different example is Reinforce
, which implements the reinforce RL algorithm. This Learner does not explicitly accumulate an objective that it then backprops on; instead it uses the fact that stochastic choices can be backpropped through automatically using torch's .reinforce
function.
Because we have designed maçarico to be as modular as possible, there are some places where the different pieces need to "talk" to each other.
Let's take test_sequence_labeler.py
as an example. In test_demo
, we have code that looks like:
data = [Example(x, y, n_labels) for x, y in ...]
This constructs sequence_labeler.Example
data structures and calls them data. If you look at the Example
data structure, you find it has two main components: tokens
and labels
, corresponding to x
and y
respectively, above.
Next, we build some static features:
features = RNNFeatures(n_types, input_field = 'tokens', output_field = 'tokens_feats')
This constructs a biLSTM over the inputs. Where does it look? It looks in tokens
because that's the specified input field. And it stores the features generated by the biLSTM in tokens_feats
. You can therefore think of features
as something that maps from tokens
to tokens_feats`. (Note: those two arguments are the default and could have been left off for convenience, but here we're trying to make everything explicit.)
Next, we need an actor. The actor is the thing that takes a state of the world and produces a feature representation. (This feature representation will later be consumed by the Policy
to predict an action.) However, the actor needs to attend somewhere when making predictions. In this case, when the environment (the sequence labeler) is predicting the label of the n
th word, the actor should look at that word! This can be done with the AttendAt
attention mechanism.
attention = AttendAt(field='tokens_feats', get_position=lambda state: state.n)
This constructs an attention mechanism that essentially returns tokens_feats[state.n]
when the environment state is on word n
. Note that this hinges on the fact that we know that the environment stores "current position" in state.n
. (Again, these arguments are the default and could be left off.)
Next, we can construct the actor. The actor itself an RNN (not bidirectional this time), which uses the biLSTM features we build above, together with the simple attention mechanism.
actor = TransitionRNN([features], [attention], n_labels)
Finally, we can construct the policy. In this case, it's just a linear function that maps from the actor
's feature represention to one of n_labels
actions:
policy = LinearPolicy(actor, n_labels)
Tracing back through this. The policy maps from a state feature representation to an action. This mapping is done by LinearPolicy
. But where does the state feature representation come from? It comes from the actor
, which, when labeling word n
asks its attention model(s) what features to use. In this case, the attention tells it to look at tokens_feats[n]
, where tokens_feats[n]
is the output of the biLSTM.
We will now train this model with DAgger. In order to do this, we need to anneal the degree to which rollin is done according to the reference policy versus the learned policy:
p_rollin_ref = stochastic(ExponentialAnnealing(0.99))
Next, we construct an optimizer. This is exactly like you would do in pytorch, extracting parameters from the policy
:
optimizer = torch.optim.Adam(policy.parameters(), lr=0.01)
And now we can train:
for epoch in xrange(5): # train on each example, one at a time for ex in data: optimizer.zero_grad() learner = DAgger(ref, policy, p_rollin_ref) env = ex.mk_env() env.run_episode(learner) learner.update(env.loss()) optimizer.step() p_rollin_ref.step() # now make some predictions for ex in data: env = ex.mk_env() out = env.run_episode(policy) print 'prediction = %s' % out
下一篇:rlpyt
还没有评论,说两句吧!
热门资源
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com