Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

Petitjean, François; Buntine, Wray; Webb, Geoffrey I; Zaidi, Nayyar

1/1

3 files

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

conference contribution

posted on 2018-01-01, 00:00 authored by François Petitjean, Wray Buntine, Geoffrey I Webb, Nayyar ZaidiNayyar Zaidi

This paper introduces a novel parameter estimation method for the probability tables of Bayesian network classifiers (BNCs), using hierarchical Dirichlet processes (HDPs). The main result of this paper is to show that improved parameter estimation allows BNCs to outperform leading learning methods such as random forest for both 0–1 loss and RMSE, albeit just on categorical datasets. As data assets become larger, entering the hyped world of “big”, efficient accurate classification requires three main elements: (1) classifiers with low-bias that can capture the fine-detail of large datasets (2) out-of-core learners that can learn from data without having to hold it all in main memory and (3) models that can classify new data very efficiently. The latest BNCs satisfy these requirements. Their bias can be controlled easily by increasing the number of parents of the nodes in the graph. Their structure can be learned out of core with a limited number of passes over the data. However, as the bias is made lower to accurately model classification tasks, so is the accuracy of their parameters’ estimates, as each parameter is estimated from ever decreasing quantities of data. In this paper, we introduce the use of HDPs for accurate BNC parameter estimation even with lower bias. We conduct an extensive set of experiments on 68 standard datasets and demonstrate that our resulting classifiers perform very competitively with random forest in terms of prediction, while keeping the out-of-core capability and superior classification time.

History

Event

European Machine Learning and Data Mining. Conference (2018 : Dublin, Ireland)

Volume

107

Series

Machine Learning

Pagination

1303 - 1331

Publisher

Springer

Location

Dublin, Ireland

Place of publication

Berlin, Germany

Publisher DOI

https://doi.org/10.1007/s10994-018-5718-0

Link to full text

https://link.springer.com/content/pdf/10.1007/s10994-018-5718-0.pdf

Start date

2018-09-10

End date

2018-09-14

ISSN

0885-6125

eISSN

1573-0565

Language

eng

Publication classification

E1.1 Full written paper - refereed

Editor/Contributor(s)

M Berlingerio, F Bonchi, T Gartner, N Hurley, G Ifrim

Title of proceedings

ECML-PKDD 2018 : Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

Usage metrics

Keywords

Bayesian network Parameter estimation Graphical models Dirichlet processes Smoothing Classification Science & Technology Technology Computer Science, Artificial Intelligence Computer Science PROBABILITY-DISTRIBUTIONS Information Systems Artificial Intelligence and Image Processing

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

History

Event

Volume

Series

Pagination

Publisher

Location

Place of publication

Publisher DOI

Link to full text

Start date

End date

ISSN

eISSN

Language

Publication classification

Editor/Contributor(s)

Title of proceedings

Usage metrics

Categories

Keywords

Licence

Exports