Abstract
A challenging problem in on-device text classification is to build highly accurate neural
models that can fit in small memory footprint and have low latency. To address this
challenge, we propose an on-device neural
network SGNN++ which dynamically learns
compact projection vectors from raw text using structured and context-dependent partition
projections. We show that this results in accelerated inference and performance improvements.
We conduct extensive evaluation on multiple
conversational tasks and languages such as English, Japanese, Spanish and French. Our
SGNN++ model significantly outperforms all
baselines, improves upon existing on-device
neural models and even surpasses RNN, CNN
and BiLSTM models on dialog act and intent prediction. Through a series of ablation
studies we show the impact of the partitioned
projections and structured information leading to 10% improvement. We study the impact of the model size on accuracy and introduce quantization-aware training for SGNN++
to further reduce the model size while preserving the same quality. Finally, we show fast inference on mobile phones.