Abstract
In this paper we propose a novel neural approach for automatic decipherment of lost languages. To compensate for the lack of strong
supervision signal, our model design is informed by patterns in language change documented in historical linguistics. The model
utilizes an expressive sequence-to-sequence
model to capture character-level correspondences between cognates. To effectively train
the model in an unsupervised manner, we innovate the training procedure by formalizing
it as a minimum-cost flow problem. When
applied to the decipherment of Ugaritic, we
achieve a 5.5% absolute improvement over
state-of-the-art results. We also report the
first automatic results in deciphering Linear B,
a syllabic language related to ancient Greek,
where our model correctly translates 67.3% of
cognates