Abstract
Numerous semi-supervised learning methods have been proposed to augment Multinomial Naive Bayes (MNB) using unlabeled documents, but their use in practice is often limited due to implementation difficulty, inconsistent prediction performance, or high computational cost. In this paper, we propose a new, very simple semi-supervised extension of MNB, called Semi-supervised Frequency Estimate (SFE). Our experiments show that it consistently improves MNB with additional data (labeled or unlabeled) in terms of AUC and accuracy, which is not the case when combining MNB with Expectation Maximization (EM). We attribute this to the fact that SFE consistently produces better conditional log likelihood values than both EM+MNB and MNB in labeled training data.