File(s) under permanent embargo
Document clustering in correlation similarity measure space
journal contribution
posted on 2012-06-01, 00:00 authored by T Zhang, Y Tang, B Fang, Yong XiangYong XiangThis paper presents a new spectral clustering method called correlation preserving indexing (CPI), which is performed in the correlation similarity measure space. In this framework, the documents are projected into a low-dimensional semantic space in which the correlations between the documents in the local patches are maximized while the correlations between the documents outside these patches are minimized simultaneously. Since the intrinsic geometrical structure of the document space is often embedded in the similarities between the documents, correlation as a similarity measure is more suitable for detecting the intrinsic geometrical structure of the document space than euclidean distance. Consequently, the proposed CPI method can effectively discover the intrinsic structures embedded in high-dimensional document space. The effectiveness of the new method is demonstrated by extensive experiments conducted on various data sets and by comparison with existing document clustering methods.
History
Journal
IEEE transactions on knowledge and data engineeringVolume
24Issue
6Pagination
1002 - 1013Publisher
IEEELocation
Piscataway, N. J.ISSN
1041-4347eISSN
1558-2191Language
engPublication classification
C1 Refereed article in a scholarly journalUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC