Abstract
Given that in practice training data is scarce for all small set of problems, a core question is how to incor prior knowledge into a model. In this paper, we consid the case of prior procedural knowledge for neural netw such as knowing how a program should traverse a sequen but not what local actions should be performed at each step. To this end, we present an end-to-end differenti interpreter for the programming language Forth which enables programmers to write program sketches with slo that can be filled with behaviour trained from program input-output data. We can optimise this behaviour dire through gradient descent techniques on user-specified objectives, and also integrate the program into any la neural computation graph. We show empirically that our interpreter is able to effectively leverage different of prior program structure and learn complex behaviour such as sequence sorting and addition. When connected to outputs of an LSTM and trained jointly, our interpr achieves state-of-the-art accuracy for end-to-end reas about quantities expressed in natural language stories