Poetry to Prose Conversion in Sanskrit as a Linearisation Task: A case for
Low-Resource Languages
Abstract
The word ordering in a Sanskrit verse is often not aligned with its corresponding prose
order. Conversion of the verse to its corresponding prose helps in better comprehension of the construction. Owing to the resource constraints, we formulate this task as
a word ordering (linearisation) task. In doing so, we completely ignore the word arrangement at the verse side. kavya guru ¯ , the
approach we propose, essentially consists of
a pipeline of two pretraining steps followed
by a seq2seq model. The first pretraining
step learns task specific token embeddings
from pretrained embeddings. In the next step,
we generate multiple hypotheses for possible
word arrangements of the input (Wang et al.,
2018). We then use them as inputs to a neural seq2seq model for the final prediction. We
empirically show that the hypotheses generated by our pretraining step result in predictions that consistently outperform predictions
based on the original order in the verse. Overall, kavya guru ¯ outperforms current state of
the art models in linearisation for the poetry to
prose conversion task in Sanskrit