Graph-induced restricted Boltzmann machines for document modeling

Nguyen, Tu Dinh, Truyen,Tran, Phung, Dinh and Venkatesh, Svetha 2016, Graph-induced restricted Boltzmann machines for document modeling, Information sciences, vol. 328, pp. 60-75, doi: 10.1016/j.ins.2015.08.023.

Attached Files
Name Description MIMEType Size Downloads

Title Graph-induced restricted Boltzmann machines for document modeling
Author(s) Nguyen, Tu Dinh
Truyen,TranORCID iD for Truyen,Tran
Phung, DinhORCID iD for Phung, Dinh
Venkatesh, SvethaORCID iD for Venkatesh, Svetha
Journal name Information sciences
Volume number 328
Start page 60
End page 75
Total pages 16
Publisher Elsevier
Place of publication Amsterdam, The Netherlands
Publication date 2016-01
ISSN 0020-0255
Keyword(s) Document modeling
Restricted Boltzmann machine
Feature group discovery
Topic coherence
Word graphs
Summary Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) –where the underlying graphical model is an undirected bipartite graph. Inference is efficient document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy.
Language eng
DOI 10.1016/j.ins.2015.08.023
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2016, Elsevier
Persistent URL

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 7 times in TR Web of Science
Scopus Citation Count Cited 7 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 677 Abstract Views, 4 File Downloads  -  Detailed Statistics
Created: Fri, 19 Feb 2016, 12:12:19 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact