Abstract
Several linguistic studies have shown the
prevalence of various lexical and grammatical
patterns in texts authored by a person of a particular gender, but models for part-of-speech
tagging and dependency parsing have still not
adapted to account for these differences. To
address this, we annotate the Wall Street Journal part of the Penn Treebank with the gender
information of the articles’ authors, and build
taggers and parsers trained on this data that
show performance differences in text written
by men and women. Further analyses reveal
numerous part-of-speech tags and syntactic relations whose prediction performances bene-
fit from the prevalence of a specific gender
in the training data. The results underscore
the importance of accounting for gendered differences in syntactic tasks, and outline future
venues for developing more accurate taggers
and parsers. We release our data to the research community