Scalable nonparametric Bayesian multilevel clustering

Huynh, V; Phung, Q; Venkatesh, Svetha; Nguyen, XL; Hoffman, M; Bui, HH

Scalable nonparametric Bayesian multilevel clustering

conference contribution

posted on 2024-06-05, 04:36 authored by V Huynh, Q Phung, Svetha VenkateshSvetha Venkatesh, XL Nguyen, M Hoffman, HH Bui

Multilevel clustering problems where the con-tent and contextual information are jointly clustered are ubiquitous in modern datasets. Existing works on this problem are limited to small datasets due to the use of the Gibbs sampler. We address the problem of scaling up multi-level clustering under a Bayesian nonparametric setting, extending the MC2 model proposed in (Nguyen et al., 2014). We ground our approach in structured mean-field and stochastic variational inference (SVI) and develop a tree-structured SVI algorithm that exploits the interplay between content and context modeling. Our new algorithm avoids the need to repeatedly go through the corpus as in Gibbs sampler. More crucially, our method is immediately amendable to parallelization, facilitating a scalable distributed implementation on the Apache Spark platform. We conduct extensive experiments in a variety of domains including text, images, and real-world user application activities. Direct comparison with the Gibbs-sampler demonstrates that our method is an order-of-magnitude faster without loss of model quality. Our Spark-based implementation gains an-other order-of-magnitude speedup and can scale to large real-world datasets containing millions of documents and groups.

History

Pagination

289-298

Location

New York, N.Y.

Start date

2016-06-25

End date

2016-06-29

ISBN-13

9780996643115

Language

eng

Publication classification

E Conference publication, E1 Full written paper - refereed

Copyright notice

2016, AUAI Press

Editor/Contributor(s)

Ihler A, Janzing D

Title of proceedings

UAI 2016: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence

Event

Uncertainty in Artificial Intelligence. Conference (32nd : 2016 : New York, N.Y.)

Publisher

AUAI Press

Place of publication

Corvallis, Or.

Publication URL

http://www.auai.org/uai2016/proceedings.php

Usage metrics

Keywords

080109 Pattern Recognition and Data Mining 970108 Expanding Knowledge in the Information and Computing Sciences Centre for Pattern Recognition and Data Analytics Pattern Recognition and Data Analytics 4605 Data management and data science

Scalable nonparametric Bayesian multilevel clustering

History

Pagination

Location

Start date

End date

ISBN-13

Language

Publication classification

Copyright notice

Editor/Contributor(s)

Title of proceedings

Event

Publisher

Place of publication

Publication URL

Usage metrics

Categories

Keywords

Licence

Exports