Deakin University
Browse

Graph-induced restricted Boltzmann machines for document modeling

Version 2 2024-06-05, 11:49
Version 1 2016-02-19, 13:11
journal contribution
posted on 2024-06-05, 11:49 authored by TD Nguyen, Truyen TranTruyen Tran, D Phung, Svetha VenkateshSvetha Venkatesh
Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) –where the underlying graphical model is an undirected bipartite graph. Inference is efficient document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy.

History

Journal

Information Sciences

Volume

328

Pagination

60-75

Location

Amsterdam, The Netherlands

ISSN

0020-0255

eISSN

1872-6291

Language

English

Publication classification

C Journal article, C1 Refereed article in a scholarly journal

Copyright notice

2016, Elsevier

Publisher

ELSEVIER SCIENCE INC