Abstract
Predictive phenotyping is about accurately predicting what phenotypes will occur in the next clinical
visit based on longitudinal Electronic Health Record
(EHR) data. While deep learning (DL) models have
recently demonstrated strong performance in predictive phenotyping, they require access to a large
amount of labeled data, which are expensive to acquire. To address this label-insufficient challenge,
we propose a deep dictionary learning framework
(DDL) for phenotyping, which utilizes unlabeled
data as a complementary source of information to
generate a better, more succinct data representation.
Our empirical evaluations on multiple EHR datasets
demonstrated that DDL outperforms the existing predictive phenotyping methods on a wide variety of
clinical tasks that require patient phenotyping. The
results also show that unlabeled data can be used
to generate better data representation that helps improve DDL’s phenotyping performance over existing
methods that only uses labeled data