Proportional k-interval discretization for naive-Bayes classifiers
conference contribution
posted on 2001-01-01, 00:00authored byYing Yang, G Webb
This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.
History
Title of proceedings
ECML 2001 : Machine Learning : 12th European Conference on Machine Learning
Event
European Conference on Machine Learning (12th : 2001 : Freiburg, Germany)