Deakin University

File(s) under permanent embargo

ALEC: Active learning with ensemble of classifiers for clinical diagnosis of coronary artery disease

journal contribution
posted on 2023-04-26, 05:05 authored by Fahime KhozeimehFahime Khozeimeh, Roohallah AlizadehsaniRoohallah Alizadehsani, M Shirani, M Tartibi, A Shoeibi, H Alinejad-Rokny, C Harlapur, SJ Sultanzadeh, Abbas KhosraviAbbas Khosravi, Saeid Nahavandi, RS Tan, UR Acharya
Invasive angiography is the reference standard for coronary artery disease (CAD) diagnosis but is expensive and associated with certain risks. Machine learning (ML) using clinical and noninvasive imaging parameters can be used for CAD diagnosis to avoid the side effects and cost of angiography. However, ML methods require labeled samples for efficient training. The labeled data scarcity and high labeling costs can be mitigated by active learning. This is achieved through selective query of challenging samples for labeling. To the best of our knowledge, active learning has not been used for CAD diagnosis yet. An Active Learning with Ensemble of Classifiers (ALEC) method is proposed for CAD diagnosis, consisting of four classifiers. Three of these classifiers determine whether a patient's three main coronary arteries are stenotic or not. The fourth classifier predicts whether the patient has CAD or not. ALEC is first trained using labeled samples. For each unlabeled sample, if the outputs of the classifiers are consistent, the sample along with its predicted label is added to the pool of labeled samples. Inconsistent samples are manually labeled by medical experts before being added to the pool. The training is performed once more using the samples labeled so far. The interleaved phases of labeling and training are repeated until all samples are labeled. Compared with 19 other active learning algorithms, ALEC combined with a support vector machine classifier attained superior performance with 97.01% accuracy. Our method is justified mathematically as well. We also comprehensively analyze the CAD dataset used in this paper. As part of dataset analysis, features pairwise correlation is computed. The top 15 features contributing to CAD and stenosis of the three main coronary arteries are determined. The relationship between stenosis of the main arteries is presented using conditional probabilities. The effect of considering the number of stenotic arteries on sample discrimination is investigated. The discrimination power over dataset samples is visualized, assuming each of the three main coronary arteries as a sample label and considering the two remaining arteries as sample features.



Computers in Biology and Medicine



Article number





United States








Elsevier BV