Deakin University
Browse

Unsupervised discretization algorithm based on mixture probabilistic model

journal contribution
posted on 2002-01-01, 00:00 authored by Gang LiGang Li, F Tong
A theoretically rigorous algorithm for discretization of continuous attributes is presented based on mixture probabilistic models. This algorithm can automatically divide the range of specified attribute into intervals without prior knowledge or referencing attributes. A mixture probabilistic model in which each mixture component corresponding to a different interval represents all the attribute values. The Expectation-Maximization algorithm for maximum likelihood determines the parameters for the mixture probabilistic model. One advantage of mixture probabilistic-model approach to discretizing is that it allows the use of approximate Bayes factors to compare models. In order to determine the most suitable number of intervals, the maximum likelihood parameters for mixture probability model with different number of components are calculated, and BIC (Bayesian Information Criteria) of these models are compared. From them, the model with the highest BIC is chosen as the resulting generative probabilistic model and determining the number of intervals. So choosing the best model simultaneously solves the problem of determining the number of intervals and the dividing method. Experimental results show that this form of discretization can have distinct advantages over competing non-probabilistic approaches (such as K-means algorithm) for certain reasons, since it allows uncertainty in interval membership, direct control over the variability is allowed within each interval, and permits an objective treatment of the ever-thorny question of how many intervals are being suggested by data.

History

Journal

Jisuanji Xuebao/Chinese Journal of Computers

Volume

25

Pagination

158-164

Location

China

ISSN

0254-4164

Publication classification

CN.1 Other journal article

Issue

2

Publisher

Chinese Academy of Sciences

Usage metrics

    Research Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC