Abstract
In many machine learning applications, labeling every instance of data is burdensome. Multiple Instance Learning (MIL), in which training data is provided in the form of labeled bags rather than labeled instances, is one approach for a more relaxed form of supervised learning. Though much progress has been made in analyzing MIL problems, existing work considers bags that have a finite number of instances. In this paper we argue that in many applications of MIL (e.g. image, audio, etc.) the bags are better modeled as low dimensional manifolds in high dimensional feature space. We show that the geometric structure of such manifold bags affects PAC-learnability. We discuss how a learning algorithm that is designed for finite sized bags can be adapted to learn from manifold bags. Furthermore, we propose a simple heuristic that reduces the memory requirements of such algorithms. Our experiments on real-world data validate our analysis and show that our approach works well.