beliakov-applicationofrank-2012.pdf (671.81 kB)
Application of rank correlation, clustering and classification in information security
journal contribution
posted on 2012-06-01, 00:00 authored by Gleb BeliakovGleb Beliakov, John YearwoodJohn Yearwood, Andrei KelarevThis article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman--Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for the effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms.
History
Journal
Journal of networksVolume
7Issue
6Pagination
935 - 945Publisher
Academy PublisherLocation
Oulu, FinlandPublisher DOI
ISSN
1796-2056Language
engPublication classification
C1 Refereed article in a scholarly journalCopyright notice
2012, The AuthorUsage metrics
Categories
No categories selectedLicence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC