Openly accessible

Application of rank correlation, clustering and classification in information security

Beliakov, Gleb, Yearwood, John and Kelarev, Andrei 2012, Application of rank correlation, clustering and classification in information security, Journal of networks, vol. 7, no. 6, pp. 935-945.

Attached Files
Name Description MIMEType Size Downloads
beliakov-applicationofrank-2012.pdf Published version application/pdf 671.81KB 25

Title Application of rank correlation, clustering and classification in information security
Author(s) Beliakov, Gleb
Yearwood, John
Kelarev, Andrei
Journal name Journal of networks
Volume number 7
Issue number 6
Start page 935
End page 945
Total pages 11
Publisher Academy Publisher
Place of publication Oulu, Finland
Publication date 2012-06
ISSN 1796-2056
Keyword(s) classification
clustering
consensus functions
phishing websites
Summary This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman--Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for the effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms.
Language eng
Field of Research 109999 Technology not elsewhere classified
Socio Economic Objective 970110 Expanding Knowledge in Technology
HERDC Research category C1 Refereed article in a scholarly journal
Copyright notice ©2012, The Author
Persistent URL http://hdl.handle.net/10536/DRO/DU:30046944

Document type: Journal Article
Collections: School of Information Technology
Open Access Collection
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.

Versions
Version Filter Type
Access Statistics: 66 Abstract Views, 25 File Downloads  -  Detailed Statistics
Created: Mon, 13 Aug 2012, 12:52:58 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.