Abstract
We present an approach for recursively splitting and rephrasing complex English sentences
into a novel semantic hierarchy of simpli-
fied sentences, with each of them presenting a more regular structure that may facilitate a wide variety of artificial intelligence
tasks, such as machine translation (MT) or
information extraction (IE). Using a set of
hand-crafted transformation rules, input sentences are recursively transformed into a twolayered hierarchical representation in the form
of core sentences and accompanying contexts
that are linked via rhetorical relations. In
this way, the semantic relationship of the decomposed constituents is preserved in the output, maintaining its interpretability for downstream applications. Both a thorough manual
analysis and automatic evaluation across three
datasets from two different domains demonstrate that the proposed syntactic simplification approach outperforms the state of the art
in structural text simplification. Moreover, an
extrinsic evaluation shows that when applying
our framework as a preprocessing step the performance of state-of-the-art Open IE systems
can be improved by up to 346% in precision
and 52% in recall. To enable reproducible research, all code is provided online