Deakin University
Browse

File(s) under permanent embargo

A rapid hybrid clustering algorithm for large volumes of high dimensional data

journal contribution
posted on 2019-04-01, 00:00 authored by P Rathore, D Kumar, J C Bezdek, Sutharshan RajasegararSutharshan Rajasegarar, M S Palaniswami
IEEE Clustering large volumes of high-dimensional data is a challenging task. Many clustering algorithms have been developed to address either handling datasets with a very large sample size or with a very high number of dimensions, but they are often impractical when the data is large in both aspects. To simultaneously overcome both the ‘curse of dimensionality’ problem due to high dimensions and scalability problems due to large sample size, we propose a new fast clustering algorithm called FensiVAT. FensiVAT is a hybrid, ensemble-based clustering algorithm which uses fast data-space reduction and an intelligent sampling strategy. In addition to clustering, FensiVAT also provides visual evidence that is used to estimate the number of clusters (cluster tendency assessment) in the data. In our experiments, we compare FensiVAT with seven state-of-the-art approaches which are popular for large sample size or high-dimensional data clustering. Experimental results suggest that FensiVAT, which can cluster large volumes of high-dimensional datasets in a few seconds, is the fastest and most accurate method of the ones tested.

History

Journal

IEEE transactions on knowledge and data engineering

Volume

31

Issue

4

Pagination

641 - 654

Publisher

Institute of Electrical and Eletronics Engineers

Location

Piscataway, N.J.

ISSN

1041-4347

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2018, IEEE