Scalable nonparametric Bayesian multilevel clustering

Huynh, Viet, Phung, Dinh, Venkatesh, Svetha, Nguyen, Xuan Long, Hoffman, Matt and Bui, Hung Hai 2016, Scalable nonparametric Bayesian multilevel clustering, in UAI 2016: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence, AUAI Press, Corvallis, Or., pp. 289-298.

Attached Files
Name Description MIMEType Size Downloads

Title Scalable nonparametric Bayesian multilevel clustering
Author(s) Huynh, Viet
Phung, DinhORCID iD for Phung, Dinh
Venkatesh, SvethaORCID iD for Venkatesh, Svetha
Nguyen, Xuan Long
Hoffman, Matt
Bui, Hung Hai
Conference name Uncertainty in Artificial Intelligence. Conference (32nd : 2016 : New York, N.Y.)
Conference location New York, N.Y.
Conference dates 25-29 Jun. 2016
Title of proceedings UAI 2016: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence
Editor(s) Ihler, A.
Janzing, D.
Publication date 2016
Start page 289
End page 298
Total pages 10
Publisher AUAI Press
Place of publication Corvallis, Or.
Summary Multilevel clustering problems where the con-tent and contextual information are jointly clustered are ubiquitous in modern datasets. Existing works on this problem are limited to small datasets due to the use of the Gibbs sampler. We address the problem of scaling up multi-level clustering under a Bayesian nonparametric setting, extending the MC2 model proposed in (Nguyen et al., 2014). We ground our approach in structured mean-field and stochastic variational inference (SVI) and develop a tree-structured SVI algorithm that exploits the interplay between content and context modeling. Our new algorithm avoids the need to repeatedly go through the corpus as in Gibbs sampler. More crucially, our method is immediately amendable to parallelization, facilitating a scalable distributed implementation on the Apache Spark platform. We conduct extensive experiments in a variety of domains including text, images, and real-world user application activities. Direct comparison with the Gibbs-sampler demonstrates that our method is an order-of-magnitude faster without loss of model quality. Our Spark-based implementation gains an-other order-of-magnitude speedup and can scale to large real-world datasets containing millions of documents and groups.
ISBN 9780996643115
Language eng
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category E1 Full written paper - refereed
ERA Research output type E Conference publication
Copyright notice ©2016, AUAI Press
Persistent URL

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 2 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 512 Abstract Views, 4 File Downloads  -  Detailed Statistics
Created: Tue, 26 Jul 2016, 17:11:42 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact