Deakin University
Browse

Data clustering using side information dependent Chinese restaurant processes

Version 2 2024-06-06, 01:30
Version 1 2015-08-26, 14:46
journal contribution
posted on 2024-06-06, 01:30 authored by C Li, Santu RanaSantu Rana, D Phung, Svetha VenkateshSvetha Venkatesh
Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach.

History

Journal

Knowledge and Information Systems

Volume

47

Pagination

463-488

Location

Berlin, Germany

ISSN

0219-1377

eISSN

0219-3116

Language

English

Publication classification

C Journal article, C1 Refereed article in a scholarly journal

Copyright notice

2016, Springer

Issue

2

Publisher

SPRINGER LONDON LTD