You are not logged in.

A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression

Huda, MD Shamsul, Abawajy, Jemal, Abdollahian, Mali, Islam, Rafiqul and Yearwood, John Leighton 2016, A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression, Concurrency computation, In Press, pp. 1-18, doi: 10.1002/cpe.3912.

Attached Files
Name Description MIMEType Size Downloads

Title A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression
Author(s) Huda, MD Shamsul
Abawajy, Jemal
Abdollahian, Mali
Islam, Rafiqul
Yearwood, John Leighton
Journal name Concurrency computation
Season In Press
Start page 1
End page 18
Total pages 18
Publisher Wiley
Place of publication Chichester, Eng.
Publication date 2016-01-01
ISSN 1532-0626
1532-0634
Keyword(s) malware detection
binary logistic regression
stepwise regression
API call statistics
AIC criteria
chi-square
Summary Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation anti-virus engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current anti-virus engines in detecting malware. In this paper, we propose a stepwise binary logistic regression-based dimensionality reduction techniques for malware detection using application program interface (API) call statistics. Finding the most significant malware feature using traditional wrapper-based approaches takes an exponential complexity of the dimension (m) of the dataset with a brute-force search strategies and order of (m-1) complexity with a backward elimination filter heuristics. The novelty of the proposed approach is that it finds the worst case computational complexity which is less than order of (m-1). The proposed approach uses multi-linear regression and the p-value of each individual API feature for selection of the most uncorrelated and significant features in order to reduce the dimensionality of the large malware data and to ensure the absence of multi-collinearity. The stepwise logistic regression approach is then employed to test the significance of the individual malware feature based on their corresponding Wald statistic and to construct the binary decision the model. When the selected most significant APIs are used in a decision rule generation systems, this approach not only reduces the tree size but also improves classification performance. Exhaustive experiments on a large malware data set show that the proposed approach clearly exceeds the existing standard decision rule, support vector machine-based template approach with complete data and provides a better statistical fitness.
Language eng
DOI 10.1002/cpe.3912
Field of Research 080303 Computer System Security
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2016, Wiley
Persistent URL http://hdl.handle.net/10536/DRO/DU:30089582

Document type: Journal Article
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 13 Abstract Views, 3 File Downloads  -  Detailed Statistics
Created: Mon, 28 Nov 2016, 16:00:33 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.