Sparse subspace representation for spectral document clustering

Saha, Budhaditya, Phung, Dinh, Pham, Duc Son and Venkatesh, Svetha 2012, Sparse subspace representation for spectral document clustering, in ICDM 2012 : Proceedings of the 12th IEEE International Conference on Data Mining, IEEE, Piscataway, N.J., pp. 1092-1097.

Attached Files
Name Description MIMEType Size Downloads

Title Sparse subspace representation for spectral document clustering
Author(s) Saha, Budhaditya
Phung, Dinh
Pham, Duc Son
Venkatesh, Svetha
Conference name Data Mining. Conference (12th : 2012 : Brussels, Belgium)
Conference location Brussels, Belgium
Conference dates 10 -13 Dec. 2012
Title of proceedings ICDM 2012 : Proceedings of the 12th IEEE International Conference on Data Mining
Editor(s) Zaki, Mohammed J.
Siebes, Arno
Xu, Yu Jeffrey
Goethals, Bart
Wu, Xindong
Publication date 2012
Conference series Data Mining Conference
Start page 1092
End page 1097
Total pages 6
Publisher IEEE
Place of publication Piscataway, N.J.
Keyword(s) document clustering
sparse representation
Summary We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pair wise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets.
ISBN 9780769549057
Language eng
Field of Research 109999 Technology not elsewhere classified
Socio Economic Objective 970110 Expanding Knowledge in Technology
HERDC Research category E1 Full written paper - refereed
Persistent URL http://hdl.handle.net/10536/DRO/DU:30051784

Document type: Conference Paper
Collection: Centre for Pattern Recognition and Data Analytics
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Google Scholar Search Google Scholar
Access Statistics: 35 Abstract Views, 3 File Downloads  -  Detailed Statistics
Created: Thu, 04 Apr 2013, 12:59:55 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.