Curate and Generate A Corpus and Method for Joint Control of Semantics and Style in Neural NLG.pdf
Abstract
Neural natural language generation (NNLG)
from structured meaning representations has
become increasingly popular in recent years.
While we have seen progress with generating
syntactically correct utterances that preserve
semantics, various shortcomings of NNLG systems are clear: new tasks require new training data which is not available or straightforward to acquire, and model outputs are simple and may be dull and repetitive. This paper addresses these two critical challenges in
NNLG by: (1) scalably (and at no cost) creating training datasets of parallel meaning representations and reference texts with rich style
markup by using data from freely available
and naturally descriptive user reviews, and (2)
systematically exploring how the style markup
enables joint control of semantic and stylistic
aspects of neural model output. We present
YELPNLG, a corpus of 300,000 rich, parallel meaning representations and highly stylistically varied reference texts spanning different restaurant attributes, and describe a novel
methodology that can be scalably reused to
generate NLG datasets for other domains. The
experiments show that the models control important aspects, including lexical choice of adjectives, output length, and sentiment, allowing the models to successfully hit multiple
style targets without sacrificing semantics.