Abstract
We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic
context free grammar. In contrast to traditional
formulations which learn a single stochastic
grammar, our context-free rule probabilities
are modulated by a per-sentence continuous
latent variable, which induces marginal dependencies beyond the traditional context-free
assumptions. Inference in this grammar is
performed by collapsed variational inference,
in which an amortized variational posterior is
placed on the continuous variable, and the latent trees are marginalized with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach
compared to recent state-of-the-art methods
for grammar induction from words with neural language models.