File(s) under permanent embargo

Diabetic complication prediction using a similarity-enhanced latent Dirichlet allocation model

journal contribution
posted on 2019-10-01, 00:00 authored by S Ding, Z Li, Xiao LiuXiao Liu, H Huang, S Yang
© 2019 Diabetes and its complications have been recognized worldwide as a major public health threat. Predicting diabetic complications is regarded as a highly effective technique for increasing the survival rate of diabetic patients. While many studies currently use medical images and structured medical records, very limited efforts have been dedicated to applying data mining techniques for unstructured textual medical records, such as admission and discharge records. Moreover, the similarities among medical records that are overlooked by existing approaches could potentially improve the accuracy of prediction models. In this paper, we propose an approach for diabetic complication prediction based on a similarity-enhanced latent Dirichlet allocation (seLDA) model. Specifically, we first estimate the similarity between textual medical records after data preprocessing, and then we perform seLDA-based diabetic complication topic mining based on similarity constraints. Finally, we construct a prediction model by solving a multilabel classification problem with support vector machines (SVMs). The experimental results show that our approach outperforms the conventional LDA-based approach in similarity indices by 22.49%. Additionally, our approach shows significant improvements in prediction accuracy over four other representative seLDA-based approaches, including random forests (RF), k-nearest neighbors (KNN), logistic regression (LR) and deep neural networks (DNNs).



Information sciences




12 - 24




Amsterdam, The Netherlands





Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2019, Elsevier