File(s) under permanent embargo
Regularizing topic discovery in emrs with side information by using hierarchical bayesian models
conference contribution
posted on 2014-12-04, 00:00 authored by Cheng Li, Santu RanaSantu Rana, Quoc-Dinh Phung, Svetha VenkateshSvetha VenkateshWe propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.
History
Event
Pattern Recognition. Conference (22nd : 2014 : Stockholm, Sweden)Pagination
1307 - 1312Publisher
IEEELocation
Stockholm, SwedenPlace of publication
Piscataway, N.J.Publisher DOI
Start date
2014-08-24End date
2014-08-28ISSN
1051-4651ISBN-13
9781479952083Language
engPublication classification
E1 Full written paper - refereed; E Conference publicationCopyright notice
2014, IEEEEditor/Contributor(s)
[Unknown]Title of proceedings
ICPR 2014 : Proceedings of the 22nd International Conference on Pattern RecognitionUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC