Abstract
In Bayesian machine learning, conjugate priors are popular, mostly due to mathematical convenience.In this paper, we show that there are deeper reasons for choosing a conjugate prior. Specifically, we for-mulate the conjugate prior in the form of Bregman divergence and show that it is the inherent geome-try of conjugate priors that makes them appropriate and intuitive. This geometric interpretation allows one to view the hyperparameters of conjugate pri-ors as the effective sample points, thus providing additional intuition. We use this geometric under-standing of conjugate priors to derive the hyperpa-rameters and expression of the prior used to couple the generative and discriminative components of a hybrid model for semi-supervised learning.