Abstract
Lexically and syntactically simpler sentences result
in shorter reading time and better understanding in
many people. However, no reliable systems for automatic assessment of sentence complexity have been
proposed so far. Instead, the assessment is usually
done manually, requiring expert human annotators.
To address this problem, we first define the sentence
complexity assessment as a five-level classification
task, and build a ‘gold standard’ dataset. Next, we
propose robust systems for sentence complexity assessment, using a novel set of features based on
leveraging lexical properties of freely available corpora, and investigate the impact of the feature type
and corpus size on the classification performance