The Android ecosystem has witnessed a surge in malware, which not only puts mobile devices at risk but also increases the burden on malware analysts assessing and categorizing threats. In this paper, we show how to use machine learning to automatically classify Android malware samples into families with high accuracy, while observing only their runtime behavior. We focus exclusively on dynamic analysis of runtime behavior to provide a clean point of comparison that is dual to static approaches. Specific challenges in the use of dynamic analysis on Android are the limited information gained from tracking low-level events and the imperfect coverage when testing apps, e.g., due to inactive command and control servers. We observe that on Android, pure system calls do not carry enough semantic content for classification and instead rely on lightweight virtual machine introspection to also reconstruct Android-level inter-process communication. To address the sparsity of data resulting from low coverage, we introduce a novel classification method that fuses Support Vector Machines with Conformal Prediction to generate high-accuracy prediction sets where the information is insufficient to pinpoint a single family.
|Title of host publication||Mobile Security Technologies (MoST)|
|Publication status||Published - 26 May 2016|