Enhancing the effectiveness of clustering with spectra analysis

Li, Wenyuan, Ng, Wee-Keong, Liu, Ying and Ong, Kok-Leong 2007, Enhancing the effectiveness of clustering with spectra analysis, IEEE transactions on knowledge and data engineering, vol. 19, no. 7, pp. 887-902, doi: 10.1109/TKDE.2007.1066.

Attached Files
Name Description MIMEType Size Downloads

Title Enhancing the effectiveness of clustering with spectra analysis
Author(s) Li, Wenyuan
Ng, Wee-Keong
Liu, Ying
Ong, Kok-Leong
Journal name IEEE transactions on knowledge and data engineering
Volume number 19
Issue number 7
Start page 887
End page 902
Publisher Institute of Electrical and Electronics Engineers
Place of publication New York, N.Y.
Publication date 2007-07
ISSN 1041-4347
Keyword(s) clustering
spectral methods
Summary For many clustering algorithms, such as K-Means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters, that is, k, to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images, or biological data. In an effort to improve the effectiveness of clustering, we seek the answer to a fundamental question: How can we effectively estimate the number of clusters in a given data set? We propose an efficient method based on spectra analysis of eigenvalues (not eigenvectors) of the data set as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and experimental results. We then show how our method is capable of suggesting a range of k that is well suited to different analysis contexts. Finally, we conclude with further  empirical results to show how the answer to this fundamental question enhances the clustering process for large text collections.
Language eng
DOI 10.1109/TKDE.2007.1066
Field of Research 080604 Database Management
HERDC Research category C1 Refereed article in a scholarly journal
Copyright notice ©2007, IEEE
Persistent URL http://hdl.handle.net/10536/DRO/DU:30007065

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 18 times in TR Web of Science
Scopus Citation Count Cited 26 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 609 Abstract Views, 5 File Downloads  -  Detailed Statistics
Created: Mon, 29 Sep 2008, 08:47:52 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.