Deakin University
Browse

File(s) under permanent embargo

A novel data pre-processing technique robust to units and scales of measurement

conference contribution
posted on 2019-01-01, 00:00 authored by Arbind Agrahari Baniya, Sunil AryalSunil Aryal, K C Santosh
Many existing data mining algorithms use feature values directly in their model, making them sensitive to units/scales used to measure/represent data. Pre-processing of data based on rank transformation has been suggested as a potential solution to overcome this issue. However, the resulting data after pre-processing with rank transformation is uniformly distributed, which may not be very useful in many data mining applications. In this paper, we present a better and e ective alternative based on ranks over multiple sub-samples of data. We call the proposed pre-processing technique as ARES | Average Rank over an Ensemble of Sub-samples. Our empirical results of widely used data mining algorithms for classification and anomaly detection in a wide range of data sets suggest that ARES results in more consistent task speci c outcome across various algorithms and data sets. In addition to this, it results in better or competitive outcome most of the time compared to the most widely used min-max normalisation and the traditional rank transformation.

History

Event

Neural Information Processing. Conference (2019 : 26th : Sydney, New South Wales)

Volume

16

Issue

3

Pagination

1 - 8

Publisher

Australian Journal of Intelligent Information Processing Systems

Location

Sydney, New South Wales

Place of publication

Nedlands, W.A.

Start date

2019-12-12

End date

2019-12-15

ISSN

1321-2133

Language

eng

Publication classification

E1 Full written paper - refereed

Editor/Contributor(s)

Tom Gedeon, Kevin Wong, Minho Lee

Title of proceedings

Special Issue of the Australian Journal of Intelligent Information Processing Systems: 26th International Conference on Neural Information Processing