Deakin University
Browse

File(s) under permanent embargo

Regularizing topic discovery in emrs with side information by using hierarchical bayesian models

conference contribution
posted on 2014-12-04, 00:00 authored by Cheng Li, Santu RanaSantu Rana, Quoc-Dinh Phung, Svetha VenkateshSvetha Venkatesh
We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.

History

Event

Pattern Recognition. Conference (22nd : 2014 : Stockholm, Sweden)

Pagination

1307 - 1312

Publisher

IEEE

Location

Stockholm, Sweden

Place of publication

Piscataway, N.J.

Start date

2014-08-24

End date

2014-08-28

ISSN

1051-4651

ISBN-13

9781479952083

Language

eng

Publication classification

E1 Full written paper - refereed; E Conference publication

Copyright notice

2014, IEEE

Editor/Contributor(s)

[Unknown]

Title of proceedings

ICPR 2014 : Proceedings of the 22nd International Conference on Pattern Recognition