Data clustering using side information dependent Chinese restaurant processes
Li, Cheng, Rana, Santu, Phung,D and Venkatesh,S 2016, Data clustering using side information dependent Chinese restaurant processes, Knowledge and information systems, vol. 47, no. 2, pp. 463-488, doi: 10.1007/s10115-015-0834-7.
Attached Files
Name
Description
MIMEType
Size
Downloads
Title
Data clustering using side information dependent Chinese restaurant processes
Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach.
Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.