Deakin University
Browse

File(s) under permanent embargo

Sparse subspace representation for spectral document clustering

conference contribution
posted on 2012-01-01, 00:00 authored by Budhaditya Saha, Quoc-Dinh Phung, D Pham, Svetha VenkateshSvetha Venkatesh
We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pair wise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets.

History

Event

Data Mining. Conference (12th : 2012 : Brussels, Belgium)

Pagination

1092 - 1097

Publisher

IEEE

Location

Brussels, Belgium

Place of publication

Piscataway, N.J.

Start date

2012-12-10

End date

2012-12-13

ISBN-13

9780769549057

Language

eng

Publication classification

E1 Full written paper - refereed

Editor/Contributor(s)

M Zaki, A Siebes, Y Xu, B Goethals, X Wu

Title of proceedings

ICDM 2012 : Proceedings of the 12th IEEE International Conference on Data Mining

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC