The need for low bias algorithms in classification learning from large data sets
Brain, Damien and Webb, Geoffrey I. 2002, The need for low bias algorithms in classification learning from large data sets, in PKDD 2002 : Principles of Data mining and Knowledge Discovery : 6th European Conference Proceedings, PKDD, [Helsinki, Finland], pp. 62-73.
Attached Files
(Some files may be inaccessible until you login with your Deakin Research Online credentials)
Name
Description
MIMEType
Size
Downloads
Title
The need for low bias algorithms in classification learning from large data sets
PKDD 2002 : Principles of Data mining and Knowledge Discovery : 6th European Conference Proceedings
Editor(s)
Elomaa, Tapio Mannila, Heikki Toivonen, Hannu
Publication date
2002
Start page
62
End page
73
Publisher
PKDD
Place of publication
[Helsinki, Finland]
Summary
This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm — the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.
ISBN
3-540-44037-2
Language
eng
Field of Research
080110 Simulation and Modelling
Socio Economic Objective
970108 Expanding Knowledge in the Information and Computing Sciences