Although random control trial is the gold standard in medical research, researchers are increasingly looking to alternative data sources for hypothesis generation and early-stage evidence collection. Coded clinical data are collected routinely in most hospitals. While they contain rich information directly related to the real clinical setting, they are both noisy and semantically diverse, making them difficult to analyze with conventional statistical tools. This paper presents a novel application of Bayesian nonparametric modeling to uncover latent information in coded clinical data. For a patient cohort, a Bayesian nonparametric model is used to reveal the common comorbidity groups shared by the patients and the proportion that each comorbidity group is reflected individual patient. To demonstrate the method, we present a case study based on hospitalization coding from an Australian hospital. The model recovered 15 comorbidity groups among 1012 patients hospitalized during a month. When patients from two areas of unequal socio-economic status were compared, it reveals higher prevalence of diverticular disease in the region of lower socio-economic status. The study builds a convincing case for routine coded data to speed up hypothesis generation.
History
Location
Stockholm, Sweden
Language
eng
Publication classification
E Conference publication, E1.1 Full written paper - refereed
Copyright notice
2014, IAPR
Pagination
1-4
Start date
2014-08-24
End date
2014-08-24
Title of proceedings
IAPR 2014 : Proceedings of 2nd International Workshop on Pattern Recognition for Healthcare Analytics
Event
IAPR Pattern Recognition for Healthcare Worskshop (2nd : 2014 : Stockholm, Sweden)