Deakin University
Browse
beliakov-applicationofrank-2012.pdf (671.81 kB)

Application of rank correlation, clustering and classification in information security

Download (671.81 kB)
journal contribution
posted on 2012-06-01, 00:00 authored by Gleb BeliakovGleb Beliakov, John YearwoodJohn Yearwood, Andrei Kelarev
This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman--Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for the effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms.

History

Journal

Journal of networks

Volume

7

Issue

6

Pagination

935 - 945

Publisher

Academy Publisher

Location

Oulu, Finland

ISSN

1796-2056

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2012, The Author

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC