Openly accessible

On the effectiveness of discretizing quantitative attributes in linear classifiers

Zaidi, Nayyar A., Du, Yang and Webb, Geoffrey I. 2020, On the effectiveness of discretizing quantitative attributes in linear classifiers, IEEE Access, vol. 8, pp. 198856-198871, doi: 10.1109/access.2020.3034955.

Attached Files
Name Description MIMEType Size Downloads

Title On the effectiveness of discretizing quantitative attributes in linear classifiers
Author(s) Zaidi, Nayyar A.ORCID iD for Zaidi, Nayyar A. orcid.org/0000-0003-4024-2517
Du, Yang
Webb, Geoffrey I.
Journal name IEEE Access
Volume number 8
Start page 198856
End page 198871
Total pages 16
Publisher Institute of Electrical and Electronics Engineers (IEEE)
Place of publication Piscataway, N. J.
Publication date 2020-10
ISSN 2169-3536
Keyword(s) discretization
classification
logistic regression
support vector classifier
artificial neuron
big datasets
bias-variance analysis
Summary Linear models in machine learning are extremely computational efficient but they have high representation bias due to non-linear nature of many real-world datasets. In this article, we show that this representation bias can be greatly reduced by discretization. Discretization is a common procedure in machine learning that is used to convert a quantitative attribute into a qualitative one. It is often motivated by the limitation of some learners to handle qualitative data. Since discretization looses information (as fewer distinctions among instances are possible using discretized data relative to undiscretized data) – where discretization is not essential, it might appear desirable to avoid it, and typically, it is avoided. However, in the past, it has been shown that discretization can leads to superior performance on generative linear models, e.g., naive Bayes. This motivates a systematic study of the effects of discretizing quantitative attributes for discriminative linear models, as well. In this article, we demonstrate that, contrary to prevalent belief, discretization of quantitative attributes, for discriminative linear models, is a beneficial pre-processing step, as it leads to far superior classification performance, especially on bigger datasets, and surprisingly, much better convergence, which leads to better training time. We substantiate our claims with an empirical study on 52 benchmark datasets, using three linear models optimizing different objective functions.
Language eng
DOI 10.1109/access.2020.3034955
Indigenous content off
Field of Research 08 Information and Computing Sciences
09 Engineering
10 Technology
HERDC Research category C1 Refereed article in a scholarly journal
Copyright notice ©2020, The Authors
Free to Read? Yes
Persistent URL http://hdl.handle.net/10536/DRO/DU:30145726

Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 1 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 104 Abstract Views, 1 File Downloads  -  Detailed Statistics
Created: Wed, 25 Nov 2020, 04:35:22 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.