Data clustering using side information dependent Chinese restaurant processes

Li, Cheng, Rana, Santu, Phung,D and Venkatesh,S 2016, Data clustering using side information dependent Chinese restaurant processes, Knowledge and information systems, vol. 47, no. 2, pp. 463-488, doi: 10.1007/s10115-015-0834-7.

Attached Files
Name Description MIMEType Size Downloads

Title Data clustering using side information dependent Chinese restaurant processes
Author(s) Li, Cheng
Rana, SantuORCID iD for Rana, Santu
Phung,DORCID iD for Phung,D
Venkatesh,SORCID iD for Venkatesh,S
Journal name Knowledge and information systems
Volume number 47
Issue number 2
Start page 463
End page 488
Total pages 26
Publisher Springer
Place of publication Berlin, Germany
Publication date 2016-05
ISSN 0219-1377
Keyword(s) Side information
Data clustering
Bayesian nonparametric models
Summary Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach.
Language eng
DOI 10.1007/s10115-015-0834-7
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2016, Springer
Persistent URL

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 3 times in TR Web of Science
Scopus Citation Count Cited 6 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 685 Abstract Views, 2 File Downloads  -  Detailed Statistics
Created: Mon, 07 Mar 2016, 14:59:31 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact