Deakin University
Browse

File(s) under permanent embargo

A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression

journal contribution
posted on 2017-12-01, 00:00 authored by Shamsul HudaShamsul Huda, Jemal AbawajyJemal Abawajy, M Abdollahian, R Islam, John YearwoodJohn Yearwood
Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation anti-virus engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current anti-virus engines in detecting malware. In this paper, we propose a stepwise binary logistic regression-based dimensionality reduction techniques for malware detection using application program interface (API) call statistics. Finding the most significant malware feature using traditional wrapper-based approaches takes an exponential complexity of the dimension (m) of the dataset with a brute-force search strategies and order of (m-1) complexity with a backward elimination filter heuristics. The novelty of the proposed approach is that it finds the worst case computational complexity which is less than order of (m-1). The proposed approach uses multi-linear regression and the p-value of each individual API feature for selection of the most uncorrelated and significant features in order to reduce the dimensionality of the large malware data and to ensure the absence of multi-collinearity. The stepwise logistic regression approach is then employed to test the significance of the individual malware feature based on their corresponding Wald statistic and to construct the binary decision the model. When the selected most significant APIs are used in a decision rule generation systems, this approach not only reduces the tree size but also improves classification performance. Exhaustive experiments on a large malware data set show that the proposed approach clearly exceeds the existing standard decision rule, support vector machine-based template approach with complete data and provides a better statistical fitness.

History

Journal

Concurrency Computation

Volume

29

Issue

23

Season

Special Issue

Article number

e3912

Pagination

1 - 18

Publisher

Wiley

Location

Chichester, Eng.

ISSN

1532-0626

eISSN

1532-0634

Language

eng

Notes

Special Issue: Combined Special issues on Applications and techniques in information and network security (CSTA2015) and International conference on innovative network systems and applications held under the federated conference on computer science and information systems (FedCSis‐INetSApp2015)

Publication classification

C Journal article; C1 Refereed article in a scholarly journal

Copyright notice

2016, Wiley