Openly accessible

Effect of data scaling methods on machine learning algorithms and model performance

Ahsan, M M, Mahmud, M A Parvez, Saha, P K, Gupta, K D and Siddique, Z 2021, Effect of data scaling methods on machine learning algorithms and model performance, Technologies, vol. 9, no. 3, pp. 1-17, doi: 10.3390/technologies9030052.

Attached Files
Name Description MIMEType Size Downloads

Title Effect of data scaling methods on machine learning algorithms and model performance
Author(s) Ahsan, M M
Mahmud, M A ParvezORCID iD for Mahmud, M A Parvez orcid.org/0000-0002-1905-6800
Saha, P K
Gupta, K D
Siddique, Z
Journal name Technologies
Volume number 9
Issue number 3
Article ID 52
Start page 1
End page 17
Total pages 17
Publisher MDPI AG
Place of publication Basel, Switzerland
Publication date 2021
ISSN 2227-7080
Summary Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset—such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms—Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)—and six different data scaling methods—Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model’s performance varies depending on the data scaling method.
Language eng
DOI 10.3390/technologies9030052
Indigenous content off
HERDC Research category C1 Refereed article in a scholarly journal
Free to Read? Yes
Persistent URL http://hdl.handle.net/10536/DRO/DU:30154112

Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 16 Abstract Views, 1 File Downloads  -  Detailed Statistics
Created: Tue, 03 Aug 2021, 10:03:40 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.