Deakin University

File(s) under permanent embargo

Speed up health research through topic modeling of coded clinical data

conference contribution
posted on 2014-08-24, 00:00 authored by Wei LuoWei Luo, Quoc-Dinh Phung, Tien Vu Nguyen, Truyen TranTruyen Tran, Svetha VenkateshSvetha Venkatesh
Although random control trial is the gold standard in medical research, researchers are increasingly looking to alternative data sources for hypothesis generation and early-stage evidence collection. Coded clinical data are collected routinely in most hospitals. While they contain rich information directly related to the real clinical setting, they are both noisy and semantically diverse, making them difficult to analyze with conventional statistical tools. This paper presents a novel application of Bayesian nonparametric modeling to uncover latent information in coded clinical data. For a patient cohort, a Bayesian nonparametric model is used to reveal the common comorbidity groups shared by the patients and the proportion that each comorbidity group is reflected individual patient. To demonstrate the method, we present a case study based on hospitalization coding from an Australian hospital. The model recovered 15 comorbidity groups among 1012 patients hospitalized during a month. When patients from two areas of unequal socio-economic status were compared, it reveals higher prevalence of diverticular disease in the region of lower socio-economic status. The study builds a convincing case for routine coded data to speed up hypothesis generation.



IAPR Pattern Recognition for Healthcare Worskshop (2nd : 2014 : Stockholm, Sweden)


1 - 4


International Association of Pattern Recognition


Stockholm, Sweden

Place of publication

[Stockholm, Sweden]

Start date


End date




Publication classification

E Conference publication; E1.1 Full written paper - refereed

Copyright notice

2014, IAPR

Title of proceedings

IAPR 2014 : Proceedings of 2nd International Workshop on Pattern Recognition for Healthcare Analytics

Usage metrics

    Research Publications


    No categories selected