File(s) under permanent embargo
A Bayesian framework for learning shared and individual subspaces from multiple data sources
conference contribution
posted on 2011-01-01, 00:00 authored by Sunil GuptaSunil Gupta, Quoc-Dinh Phung, B Adams, Svetha VenkateshSvetha VenkateshThis paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds.
History
Event
Knowledge Discovery and Data Mining. Pacific-Asia Conference (15th : 2011 : Shenzhen, China)Source
Advances in knowledge discovery and data mining : 15th Pacific-Asia Conference, PAKDD 2011, Shenzhen, China, May 24-27, 2011, proceedings, part IISeries
Lecture notes in artificial intelligence : 6635Pagination
136 - 147Publisher
Springer-VerlagLocation
Shenzhen, ChinaPlace of publication
Berlin, GermanyPublisher DOI
Start date
2011-05-24End date
2011-05-27ISSN
0302-9743ISBN-13
9783642208461ISBN-10
3642208460Language
engPublication classification
E1.1 Full written paper - refereed; E Conference publicationCopyright notice
2011, Springer-Verlag Berlin HeidelbergExtent
45Editor/Contributor(s)
J Huang, L Cao, J SrivastavaTitle of proceedings
PAKDD 2011 : Advances in knowledge discovery and data mining : 15th Pacific-Asia Conference, Shenzhen, China, May 24-27, 2011, proceedings, part IIUsage metrics
Categories
No categories selectedKeywords
Bayesian formulationBayesian frameworksdata setsdata sourceefficient algorithmformal frameworkGibbs samplingheterogeneous data sourcesmatrix factorizationsmultiple data sourcesmutual knowledgepartial knowledgesocial mediaScience & TechnologyTechnologyComputer Science, Artificial IntelligenceComputer Science, Information SystemsComputer Science, Theory & MethodsComputer Science
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC