Deakin University

File(s) under permanent embargo

Static malware clustering using enhanced deep embedding method

journal contribution
posted on 2019-01-01, 00:00 authored by Chee Keong (Allan) Ng, Frank JiangFrank Jiang, Leo ZhangLeo Zhang, Wanlei Zhou
Malware refers to any software, programs, or files that are intentionally utilised to compromise the system and cause unexpected losses to end‐users such as economical losses or privacy breaches. The rapid growth of malware makes it impossible to keep up with its progress merely via human interventions or manual analysis. One of the challenges for the human‐oriented approaches is they will cause backlog and inability to keep up with the development traces of the malware. Hence, an efficient method is needed urgently to analyse effectively and identify accurately the malware in their domain. Malware clustering has been extensively studied in the machine learning area with regards to distance functions, grouping algorithm and cluster validation. A large number of research studies have been done via behavioral analysis for clustering to achieve high performance of malware detections. However, there is a trade‐off for better detection performance between behaviorial approaches and high computational forces. Up to date, little work focuses on the deep learning representations for malware clustering. Therefore, in this paper, we propose an enhanced deep embedded clustering method to facilitate an effective and efficient malware clustering process. The new method takes advantage of linear dimensionality reduction and a customised deep neural network to learn malware representations in an orthogonal space and performs cluster assignments. Our experimental results demonstrate that the proposed clustering model outperforms the traditional K‐means method with regards to the enhanced features using various auto‐encoder, pre‐trained weight and principle component analysis (PCA).



Concurrency and computation: practice & experience






Special Issue: Special Issue on Algorithmic Advances in Parallel Architectures and Energy Efficient Computing (PPAM2017) and Recent Advances in Machine Learning for Cyber‐security (MLCSec2018)

Article number



1 - 16




Chichester, Eng.





Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2019, John Wiley & Sons