Abstract
Convolutional Neural Networks (CNN) are state-of-theart models for many image classification tasks. However, to recognize cancer subtypes automatically, training a CNN on gigapixel resolution Whole Slide Tissue Images (WSI) is currently computationally impossible. The differentiation of cancer subtypes is based on cellular-level visual features observed on image patch scale. Therefore, we argue that in this situation, training a patch-level classifier on image patches will perform better than or similar to an image-level classifier. The challenge becomes how to intelligently combine patch-level classification results and model the fact that not all patches will be discriminative. We propose to train a decision fusion model to aggregatepatch-level predictions given by patch-level CNNs, which tothe best of our knowledge has not been shown before. Fur-thermore, we formulate a novel Expectation-Maximization(EM) based method that automatically locates discrimina-tive patches robustly by utilizing the spatial relationshipsof patches. We apply our method to the classification ofglioma and non-small-cell lung carcinoma cases into sub-types. The classification accuracy of our method is simi-lar to the inter-observer agreement between pathologists.Although it is impossible to train CNNs on WSIs, we experimentally demonstrate using a comparable non-cancer dataset of smaller images that a patch-based CNN can outperform an image-based CNN.