Deakin University
Browse

File(s) under permanent embargo

ClusiVAT: a mixed visual/numerical clustering algorithm for big data

conference contribution
posted on 2013-01-01, 00:00 authored by D Kumar, M Palaniswami, Sutharshan RajasegararSutharshan Rajasegarar, C Leckie, J C Bezdek, T C Havens
Recent algorithmic and computational improvements have reduced the time it takes to build a minimal spanning tree (MST) for big data sets. In this paper we compare single linkage clustering based on MSTs built with the Filter-Kruskal method to the proposed clusiVAT algorithm, which is based on sampling the data, imaging the sample to estimate the number of clusters, followed by non-iterative extension of the labels to the rest of the big data with the nearest prototype rule. Numerical experiments with both synthetic and real data confirm the theory that clusiVAT produces true single linkage clusters in compact, separated data. We also show that single linkage fails, while clusiVAT finds high quality partitions that match ground truth labels very well. And clusiVAT is fast: it recovers the preferred c = 3 Gaussian clusters in a mixture of 1 million two-dimensional data points with 100% accuracy in 3.1 seconds.

History

Event

IEEE Computer Society. Conference (2013 : Silicon Valley, California)

Series

IEEE Computer Society Conference

Pagination

112 - 117

Publisher

Institute of Electrical and Electronics Engineers

Location

Silicon Valley, Calif.

Place of publication

Piscataway, N.J.

Start date

2013-10-06

End date

2013-10-09

ISBN-13

978-1-4799-1293-3

Language

eng

Publication classification

E Conference publication; E1.1 Full written paper - refereed

Copyright notice

2013, IEEE

Editor/Contributor(s)

[Unknown]

Title of proceedings

Big Data 2013 : Proceedings of the 2013 IEEE International Conference on Big Data