Out-of-sample Node Representation Learning for Heterogeneous Graph in
Real-time Android Malware Detection
Abstract
The increasingly sophisticated Android malware
calls for new defensive techniques that are capable of protecting mobile users against novel threats.
In this paper, we first extract the runtime Application Programming Interface (API) call sequences
from Android apps, and then analyze higher-level
semantic relations within the ecosystem to comprehensively characterize the apps. To model different types of entities (i.e., app, API, device, signature, affiliation) and rich relations among them,
we present a structured heterogeneous graph (HG)
for modeling. To efficiently classify nodes (e.g.,
apps) in the constructed HG, we propose the HGLearning method to first obtain in-sample node embeddings and then learn representations of out-ofsample nodes without rerunning/adjusting HG embeddings at the first attempt. We later design a
deep neural network classifier taking the learned
HG representations as inputs for real-time Android
malware detection. Comprehensive experiments on
large-scale and real sample collections from Tencent Security Lab are performed. Promising results
demonstrate that our developed system AiDroid
which integrates our proposed method outperforms
others in real-time Android malware detection