File(s) under permanent embargo
A rapid hybrid clustering algorithm for large volumes of high dimensional data
journal contribution
posted on 2019-04-01, 00:00 authored by P Rathore, D Kumar, J C Bezdek, Sutharshan RajasegararSutharshan Rajasegarar, M S PalaniswamiIEEE Clustering large volumes of high-dimensional data is a challenging task. Many clustering algorithms have been developed to address either handling datasets with a very large sample size or with a very high number of dimensions, but they are often impractical when the data is large in both aspects. To simultaneously overcome both the ‘curse of dimensionality’ problem due to high dimensions and scalability problems due to large sample size, we propose a new fast clustering algorithm called FensiVAT. FensiVAT is a hybrid, ensemble-based clustering algorithm which uses fast data-space reduction and an intelligent sampling strategy. In addition to clustering, FensiVAT also provides visual evidence that is used to estimate the number of clusters (cluster tendency assessment) in the data. In our experiments, we compare FensiVAT with seven state-of-the-art approaches which are popular for large sample size or high-dimensional data clustering. Experimental results suggest that FensiVAT, which can cluster large volumes of high-dimensional datasets in a few seconds, is the fastest and most accurate method of the ones tested.
History
Journal
IEEE transactions on knowledge and data engineeringVolume
31Issue
4Pagination
641 - 654Publisher
Institute of Electrical and Eletronics EngineersLocation
Piscataway, N.J.Publisher DOI
ISSN
1041-4347Language
engPublication classification
C1 Refereed article in a scholarly journalCopyright notice
2018, IEEEUsage metrics
Categories
No categories selectedKeywords
Big Data Cluster AnalysisRandom ProjectionEnsemble ClusteringVisual Assessment of Cluster TendencySingle LinkageCurse of DimensionalityScience & TechnologyTechnologyComputer Science, Artificial IntelligenceComputer Science, Information SystemsEngineering, Electrical & ElectronicComputer ScienceEngineeringVISUAL ASSESSMENTINTERNETNUMBER
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC