Deakin University
Browse

File(s) under permanent embargo

Regularizing topic discovery in emrs with side information by using hierarchical bayesian models

Version 2 2024-06-06, 01:30
Version 1 2015-04-20, 12:09
conference contribution
posted on 2024-06-06, 01:30 authored by C Li, Santu RanaSantu Rana, D Phung, Svetha VenkateshSvetha Venkatesh
We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.

History

Pagination

1307-1312

Location

Stockholm, Sweden

Start date

2014-08-24

End date

2014-08-28

ISSN

1051-4651

ISBN-13

9781479952083

Language

eng

Publication classification

E1 Full written paper - refereed, E Conference publication

Copyright notice

2014, IEEE

Editor/Contributor(s)

[Unknown]

Title of proceedings

ICPR 2014 : Proceedings of the 22nd International Conference on Pattern Recognition

Event

Pattern Recognition. Conference (22nd : 2014 : Stockholm, Sweden)

Publisher

IEEE

Place of publication

Piscataway, N.J.