Towards Automating Healthcare Question Answering
in a Noisy Multilingual Low-Resource Setting
Abstract
We discuss ongoing work into automating a
multilingual digital helpdesk service available
via text messaging to pregnant and breastfeeding mothers in South Africa. Our anonymized
dataset consists of short informal questions,
often in low-resource languages, with unreliable language labels, spelling errors and
code-mixing, as well as template answers
with some inconsistencies. We explore crosslingual word embeddings, and train parametric and non-parametric models on 90K samples for answer selection from a set of 126
templates. Preliminary results indicate that
LSTMs trained end-to-end perform best, with
a test accuracy of 62.13% and a recall@5 of
89.56%, and demonstrate that we can accelerate response time by several orders of magnitude