Deakin University
Browse

The need for low bias algorithms in classification learning from large data sets

Download (289.67 kB)
conference contribution
posted on 2002-01-01, 00:00 authored by Damien Brain, G Webb
This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm — the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.

History

Pagination

62 - 73

Location

Helsinki, Finland

Open access

  • Yes

Start date

2002-08-19

End date

2002-08-23

ISBN-10

3540440372

Language

eng

Publication classification

E1 Full written paper - refereed

Copyright notice

2002, PKDD

Editor/Contributor(s)

T Elomaa, H Mannila, H Toivonen

Usage metrics

    Research Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC