A spectroscopy of texts for effective clustering

Li, Wenyuan, Ng, Wee-Keong, Ong, Kok-Leong and Lim, Ee-Peng 2004, A spectroscopy of texts for effective clustering, Lecture notes in computer science, vol. 3202, pp. 301-312, doi: 10.1007/b100704.

Attached Files
Name Description MIMEType Size Downloads

Title A spectroscopy of texts for effective clustering
Author(s) Li, Wenyuan
Ng, Wee-Keong
Ong, Kok-Leong
Lim, Ee-Peng
Journal name Lecture notes in computer science
Volume number 3202
Start page 301
End page 312
Publisher Springer-Verlag
Place of publication Berlin, Germany
Publication date 2004
ISSN 0302-9743
Summary For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.
Notes Book title: "Knowledge Discovery in Databases: PKDD 2004"
Language eng
DOI 10.1007/b100704
Field of Research 080604 Database Management
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
Copyright notice ©2004, Springer-Verlag Berlin Heidelberg
Persistent URL http://hdl.handle.net/10536/DRO/DU:30008663

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 469 Abstract Views, 0 File Downloads  -  Detailed Statistics
Created: Mon, 13 Oct 2008, 15:38:26 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.