Deakin University
Browse

Explainable Machine Learning for Efficient Diabetes Prediction Using Hyperparameter Tuning, SHAP Analysis, Partial Dependency, and LIME

journal contribution
posted on 2025-01-15, 02:41 authored by Md Manowarul Islam, Habibur Rahman Rifat, Md Shamim Bin Shahid, Arnisha Akhter, Md Ashraf UddinMd Ashraf Uddin, Khandaker Mohammad Mohi Uddin
ABSTRACTDiabetes is a chronic metabolic disease characterized by elevated blood glucose levels and poses significant health risks, such as cardiovascular disease and cognitive damage. Understanding the causes of diabetes is crucial to managing it and preventing complications. The clinical community has a lot of diabetes diagnostic data. Machine learning algorithms may simplify finding hidden patterns, retrieving data from databases, and predicting outcomes. To tackle the challenge of designing an improved diabetes classification algorithm that is more accurate, random oversampling and hyper‐tuning parameter techniques have been used in this study. Whereas most of the existing methods were built upon considering any single dataset, for getting more acceptability in general, our proposed model has been designed based on two benchmark datasets: the BRFSS dataset, which has multiple classes, and the Diabetes 2019 dataset, which has binary classes. What is more, to improve the comprehensibility of the proposed model, a variety of explainability methodologies such as SHapley Additive Explanations (SHAP), Partial Dependency, and Local Interpretable Model‐agnostic Explanations (LIME) have been implemented which are not often noticed in the previous works. The detailed explainability charts will enable the end users or practitioners to understand the exact factors of any given diagnostic report. This research focused on classifying type 2 diabetes using machine learning and providing an explanation for the outcomes derived from the model predictions. Random oversampling and quantile transform are used to rectify imbalances in the dataset and guarantee the resilience of model training. By meticulously adjusting parameters with gridsearchCV, we successfully optimized our models to attain exceptional accuracy across binary and multi‐class datasets. We evaluate the proposed model using two datasets and performance metrics. The extra trees classifier (ET) performed exceptionally, achieving 97.23% accuracy on the multi‐class dataset and 97.45% on the binary dataset.

History

Journal

Engineering Reports

Location

Chichester, Eng.

ISSN

2577-8196

eISSN

2577-8196

Language

Eng

Publication classification

C1.1 Refereed article in a scholarly journal

Publisher

Wiley

Usage metrics

    Research Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC