Abstract
A generalized formulation of the multiple instance learning problem is considered. Under this formulation, both positive and negative bags are soft, in the sense that negative bags can also contain positive instances. This reflflects a problem setting commonly found in practical applications, where labeling noise appears on both positive and negative training samples. A novel bag-level representation is introduced, using instances that are most likely to be positive (denoted top instances), and its ability to separate soft bags, depending on their relative composition in terms of positive and negative instances, is studied. This study inspires a new large-margin algorithm for soft-bag classifification, based on a latent support vector machine that effificiently explores the combinatorial space of bag compositions. Empirical evaluation on three datasets is shown to confifirm the main fifindings of the theoretical analysis and the effectiveness of the proposed soft-bag classififier