A Unifying Framework for Learning Bag Labels
from Generalized Multiple-Instance Data
Abstract
We study the problem of bag-level classification from generalized multiple-instance (GMI) data. GMI learning is an extension of the popular multiple-instance setting. In GMI data, bags are labeled positive if they contain instances of certain types, and avoid instances of other types. For example, an image of a “sunny beach” should contain sand and sea, but not clouds. We formulate a novel generative process for the GMI setting in which bags are distributions over instances. In this model, we show that a broad class of distribution-distance kernels is sufficient to represent arbitrary GMI concepts. Further, we show that a variety of previously proposed kernel approaches to the standard MI and GMI settings can be unified under the distribution kernel framework. We perform an extensive empirical study which indicates that the family of distribution distance kernels is accurate for a wide variety of real-world MI and GMI tasks as well as efficient when compared to a large set of baselines. Our theoretical and empirical results indicate that distribution-distance kernels can serve as a unifying framework for learning bag labels from GMI (and therefore MI) problems.