Enhancing the effectiveness of clustering with spectra analysis

Li, W; Ng, W K; Liu, Y; Ong, Kok-Leong

Enhancing the effectiveness of clustering with spectra analysis

journal contribution

posted on 2007-07-01, 00:00 authored by W Li, W K Ng, Y Liu, Kok-Leong Ong

For many clustering algorithms, such as K-Means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters, that is, k, to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images, or biological data. In an effort to improve the effectiveness of clustering, we seek the answer to a fundamental question: How can we effectively estimate the number of clusters in a given data set? We propose an efficient method based on spectra analysis of eigenvalues (not eigenvectors) of the data set as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and experimental results. We then show how our method is capable of suggesting a range of k that is well suited to different analysis contexts. Finally, we conclude with further empirical results to show how the answer to this fundamental question enhances the clustering process for large text collections.

History

Journal

IEEE transactions on knowledge and data engineering

Volume

19

Pagination

887 - 902

Location

New York, N.Y.

ISSN

1041-4347

eISSN

1558-2191

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2007, IEEE

Usage metrics

Keywords

clustering spectral methods eigenvalues eigenvectors Science & Technology Technology Computer Science, Artificial Intelligence Computer Science, Information Systems Engineering, Electrical & Electronic Computer Science Engineering

Enhancing the effectiveness of clustering with spectra analysis

History

Journal

Volume

Pagination

Location

ISSN

eISSN

Language

Publication classification

Copyright notice

Usage metrics

Categories

Keywords

Licence

Exports