Abstract
Unsupervised PCFG inducers hypothesize sets
of compact context-free rules as explanations
for sentences. These models not only provide tools for low-resource languages, but also
play an important role in modeling language
acquisition (Bannard et al., 2009; Abend
et al., 2017). However, current PCFG induction models, using word tokens as input, are
unable to incorporate semantics and morphology into induction, and may encounter issues
of sparse vocabulary when facing morphologically rich languages. This paper describes a
neural PCFG inducer which employs context
embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information1
. Linguistically motivated similarity penalty and categorical distance constraints are imposed on the inducer
as regularization. Experiments show that the
PCFG induction model with normalizing flow
produces grammars with state-of-the-art accuracy on a variety of different languages. Ablation further shows a positive effect of normalizing flow, context embeddings and proposed
regularizers.