File(s) under permanent embargo
ClusiVAT: a mixed visual/numerical clustering algorithm for big data
conference contributionposted on 2013-01-01, 00:00 authored by D Kumar, M Palaniswami, Sutharshan RajasegararSutharshan Rajasegarar, C Leckie, J C Bezdek, T C Havens
Recent algorithmic and computational improvements have reduced the time it takes to build a minimal spanning tree (MST) for big data sets. In this paper we compare single linkage clustering based on MSTs built with the Filter-Kruskal method to the proposed clusiVAT algorithm, which is based on sampling the data, imaging the sample to estimate the number of clusters, followed by non-iterative extension of the labels to the rest of the big data with the nearest prototype rule. Numerical experiments with both synthetic and real data confirm the theory that clusiVAT produces true single linkage clusters in compact, separated data. We also show that single linkage fails, while clusiVAT finds high quality partitions that match ground truth labels very well. And clusiVAT is fast: it recovers the preferred c = 3 Gaussian clusters in a mixture of 1 million two-dimensional data points with 100% accuracy in 3.1 seconds.
EventIEEE Computer Society. Conference (2013 : Silicon Valley, California)
SeriesIEEE Computer Society Conference
Pagination112 - 117
PublisherInstitute of Electrical and Electronics Engineers
LocationSilicon Valley, Calif.
Place of publicationPiscataway, N.J.
Publication classificationE Conference publication; E1.1 Full written paper - refereed
Copyright notice2013, IEEE
Title of proceedingsBig Data 2013 : Proceedings of the 2013 IEEE International Conference on Big Data
Read the peer-reviewed publication
Information managementData handlingData storage systemsClustering algorithmsPartitioning algorithmsCouplingsVisualizationCluster AnalysisPattern RecognitionSingle LinkageBig DataFilter-Kruskal MSTScience & TechnologyTechnologyComputer Science, Information SystemsComputer Science, Theory & MethodsEngineering, Electrical & ElectronicComputer ScienceEngineeringVISUAL ASSESSMENT