Learning Homophily Couplings from Non-IID Data for
Joint Feature Selection and Noise-Resilient Outlier Detection
Abstract
This paper introduces a novel wrapper-based
outlier detection framework (WrapperOD) and its
instance (HOUR) for identifying outliers in noisy
data (i.e., data with noisy features) with strong
couplings between outlying behaviors. Existing
subspace or feature selection-based methods are
significantly challenged by such data, as their
search of feature subset(s) is independent of
outlier scoring and thus can be misled by noisy
features. In contrast, HOUR takes a wrapper
approach to iteratively optimize the feature subset
selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective
function. HOUR learns homophily couplings
between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together)
in constructing a noise-resilient outlier scoring
function to produce a reliable outlier ranking in
each iteration. We show that HOUR (i) retains
a 2-approximation outlier ranking to the optimal
one; and (ii) significantly outperforms five stateof-the-art competitors on 15 real-world data sets
with different noise levels in terms of AUC and/or
P@n. The source code of HOUR is available at
https://sites.google.com/site/gspangsite/sourcecode