Abstract. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests. A binary hash code for a data point is obtained by a set of decision trees, setting ‘1’ for the visited tree leaf, and ‘0’ for the rest. We propose to fifirst randomly group arriving classes at each tree split node into two groups, obtaining a signifificantly simplifified two-class classifification problem that can be a handled with a light-weight CNN weak learner. Code uniqueness is achieved via the random class grouping, whilst code consistency is achieved using a low-rank loss in the CNN weak learners that encourages intra-class compactness for the two random class groups. Finally, we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a nearoptimal unique hash for each class. The proposed approach signifificantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets, and is comparable to image classifification methods while utilizing a more compact, effificient and scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs, instead of simply going deeper