Deakin University
Browse

Distributed data augmented support vector machine on spark

Version 2 2024-06-05, 11:50
Version 1 2017-06-08, 23:00
conference contribution
posted on 2016-01-01, 00:00 authored by Tu Dinh Nguyen, Tien Vu Nguyen, Trung Minh Le, Quoc-Dinh Phung
Support vector machines (SVMs) are widely-used for classification in machine learning and data mining tasks. However, they traditionally have been applied to small to medium datasets. Recent need to scale up with data size has attracted research attention to develop new methods and implementation for SVM to perform tasks at scale. Distributed SVMs are relatively new and studied recently, but the distributed implementation for SVM with data augmentation has not been developed. This paper introduces a distributed data augmentation implementation for SVM on Apache Spark, a recent advanced and popular platform for distributed computing that has been employed widely in research as well as in industry. We term our implementation sparkling vector machine (SkVM) which supports both classification and regression tasks by scanning through the data exactly once. In addition, we further develop a framework to handle the data with new classes arriving under an online classification setting where new data points can have labels that have not previously seen - a problem we term label-drift classification. We demonstrate the scalability of our proposed method on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our method are comparable or better than those of baselines whilst the execution time is much faster at an order of magnitude.

History

Event

Pattern Recognition. Conference (23rd : 2016 : Cancun, Mexico)

Pagination

498 - 503

Publisher

IEEE

Location

Cancun, Mexico

Place of publication

Piscataway, N.J.

Start date

2016-12-04

End date

2016-12-08

ISSN

1051-4651

ISBN-13

9781509048472

Language

eng

Publication classification

E Conference publication; E1 Full written paper - refereed

Copyright notice

2016, IEEE

Editor/Contributor(s)

[Unknown]

Title of proceedings

2016 23rd International Conference on Pattern Recognition (ICPR 2016)