Abstract
Modelling natural images with sparse coding (SC) has faced two main challenges: ?exibly representing varying pixel intensities and realistically representing lowlevel image components. This paper proposes a novel multiple-cause generative model of low-level image statistics that generalizes the standard SC model in two crucial points: (1) it uses a spike-and-slab prior distribution for a more realistic representation of component absence/intensity, and (2) the model uses the highly nonlinear combination rule of maximal causes analysis (MCA) instead of a linear combination. The major challenge is parameter optimization because a model with either (1) or (2) results in strongly multimodal posteriors. We show for the ?rst time that a model combining both improvements can be trained ef?ciently while retaining the rich structure of the posteriors. We design an exact piecewise Gibbs sampling method and combine this with a variational method based on preselection of latent dimensions. This combined training scheme tackles both analytical and computational intractability and enables application of the model to a large number of observed and hidden dimensions. Applying the model to image patches we study the optimal encoding of images by simple cells in V1 and compare the model’s predictions with in vivo neural recordings. In contrast to standard SC, we ?nd that the optimal prior favors asymmetric and bimodal activity of simple cells. Testing our model for consistency we ?nd that the average posterior is approximately equal to the prior. Furthermore, we ?nd that the model predicts a high percentage of globular receptive ?elds alongside Gabor-like ?elds. Similarly high percentages are observed in vivo. Our results thus argue in favor of improvements of the standard sparse coding model for simple cells by using ?exible priors and nonlinear combinations.