Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records

Li, Cheng, Rana, Santu, Phung, Dinh and Venkatesh, Svetha 2016, Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records, Knowledge-based systems, vol. 99, pp. 168-182, doi: 10.1016/j.knosys.2016.02.005.

Attached Files
Name Description MIMEType Size Downloads

Title Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records
Author(s) Li, Cheng
Rana, SantuORCID iD for Rana, Santu orcid.org/0000-0003-2247-850X
Phung, DinhORCID iD for Phung, Dinh orcid.org/0000-0002-9977-8247
Venkatesh, SvethaORCID iD for Venkatesh, Svetha orcid.org/0000-0001-8675-6631
Journal name Knowledge-based systems
Volume number 99
Start page 168
End page 182
Total pages 15
Publisher Elsevier
Place of publication Amsterdam, The Netherlands
Publication date 2016-05-01
ISSN 0950-7051
Keyword(s) Bayesian nonparametric models
correspondence models
word distances
disease topics
readmission prediction
procedure codes prediction
Summary Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy.
Language eng
DOI 10.1016/j.knosys.2016.02.005
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2016, Elsevier
Persistent URL http://hdl.handle.net/10536/DRO/DU:30083446

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 10 times in TR Web of Science
Scopus Citation Count Cited 11 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 598 Abstract Views, 2 File Downloads  -  Detailed Statistics
Created: Fri, 13 May 2016, 14:48:02 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.