File(s) under permanent embargo
ClusiVAT: a mixed visual/numerical clustering algorithm for big data
conference contribution
posted on 2013-01-01, 00:00 authored by D Kumar, M Palaniswami, Sutharshan RajasegararSutharshan Rajasegarar, C Leckie, J C Bezdek, T C HavensRecent algorithmic and computational improvements have reduced the time it takes to build a minimal spanning tree (MST) for big data sets. In this paper we compare single linkage clustering based on MSTs built with the Filter-Kruskal method to the proposed clusiVAT algorithm, which is based on sampling the data, imaging the sample to estimate the number of clusters, followed by non-iterative extension of the labels to the rest of the big data with the nearest prototype rule. Numerical experiments with both synthetic and real data confirm the theory that clusiVAT produces true single linkage clusters in compact, separated data. We also show that single linkage fails, while clusiVAT finds high quality partitions that match ground truth labels very well. And clusiVAT is fast: it recovers the preferred c = 3 Gaussian clusters in a mixture of 1 million two-dimensional data points with 100% accuracy in 3.1 seconds.
History
Event
IEEE Computer Society. Conference (2013 : Silicon Valley, California)Series
IEEE Computer Society ConferencePagination
112 - 117Publisher
Institute of Electrical and Electronics EngineersLocation
Silicon Valley, Calif.Place of publication
Piscataway, N.J.Publisher DOI
Start date
2013-10-06End date
2013-10-09ISBN-13
978-1-4799-1293-3Language
engPublication classification
E Conference publication; E1.1 Full written paper - refereedCopyright notice
2013, IEEEEditor/Contributor(s)
[Unknown]Title of proceedings
Big Data 2013 : Proceedings of the 2013 IEEE International Conference on Big DataUsage metrics
Read the peer-reviewed publication
Keywords
Information managementData handlingData storage systemsClustering algorithmsPartitioning algorithmsCouplingsVisualizationCluster AnalysisPattern RecognitionSingle LinkageBig DataFilter-Kruskal MSTScience & TechnologyTechnologyComputer Science, Information SystemsComputer Science, Theory & MethodsEngineering, Electrical & ElectronicComputer ScienceEngineeringVISUAL ASSESSMENT