You are not logged in.

Exploiting side information in distance dependent Chinese restaurant processes for data clustering

Li, Cheng, Phung, Dinh, Rana, Santu and Venkatesh, Svetha 2013, Exploiting side information in distance dependent Chinese restaurant processes for data clustering, in ICME 2013 : Proceedings of the 14th IEEE International Conference on Multimedia and Expo, IEEE, Piscataway, N.J., pp. 1-6, doi: 10.1109/ICME.2013.6607475.

Attached Files
Name Description MIMEType Size Downloads

Title Exploiting side information in distance dependent Chinese restaurant processes for data clustering
Author(s) Li, Cheng
Phung, DinhORCID iD for Phung, Dinh orcid.org/0000-0002-9977-8247
Rana, SantuORCID iD for Rana, Santu orcid.org/0000-0003-2247-850X
Venkatesh, SvethaORCID iD for Venkatesh, Svetha orcid.org/0000-0001-8675-6631
Conference name Multimedia and Expo. IEEE International Conference (14th : 2013 : San Jose, California)
Conference location San Jose, California
Conference dates 15-19 Jul. 2013
Title of proceedings ICME 2013 : Proceedings of the 14th IEEE International Conference on Multimedia and Expo
Editor(s) [Unknown]
Publication date 2013
Conference series IEEE International Conference on Multimedia and Expo
Start page 1
End page 6
Total pages 6
Publisher IEEE
Place of publication Piscataway, N.J.
Keyword(s) side information
annotated data
clustering
distance dependent Chinese restaurant processes
multimedia
Summary Multimedia contents often possess weakly annotated data such as tags, links and interactions. The weakly annotated data is called side information. It is the auxiliary information of data and provides hints for exploring the link structure of data. Most clustering algorithms utilize pure data for clustering. A model that combines pure data and side information, such as images and tags, documents and keywords, can perform better at understanding the underlying structure of data. We demonstrate how to incorporate different types of side information into a recently proposed Bayesian nonparametric model, the distance dependent Chinese restaurant process (DD-CRP). Our algorithm embeds the affinity of this information into the decay function of the DD-CRP when side information is in the form of subsets of discrete labels. It is flexible to measure distance based on arbitrary side information instead of only the spatial layout or time stamp of observations. At the same time, for noisy and incomplete side information, we set the decay function so that the DD-CRP reduces to the traditional Chinese restaurant process, thus not inducing side effects of noisy and incomplete side information. Experimental evaluations on two real-world datasets NUS WIDE and 20 Newsgroups show exploiting side information in DD-CRP significantly improves the clustering performance.
ISBN 9781479900152
Language eng
DOI 10.1109/ICME.2013.6607475
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category E1 Full written paper - refereed
HERDC collection year 2013
Copyright notice ©2013, IEEE
Persistent URL http://hdl.handle.net/10536/DRO/DU:30057163

Document type: Conference Paper
Collection: Centre for Pattern Recognition and Data Analytics
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 3 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 260 Abstract Views, 3 File Downloads  -  Detailed Statistics
Created: Wed, 23 Oct 2013, 10:02:30 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.