Outlier detection on mixed-type data: an energy-based approach

Do, K, Tran, Truyen, Phung, Quoc-Dinh and Venkatesh, Svetha 2016, Outlier detection on mixed-type data: an energy-based approach, in ADMA 2016 : Proceedings of the 12th International Conference of Advanced Data Mining and Applications, Springer International Publishing, Cham, Switzerland, pp. 111-125, doi: 10.1007/978-3-319-49586-6_8.

Attached Files
Name Description MIMEType Size Downloads

Title Outlier detection on mixed-type data: an energy-based approach
Author(s) Do, K
Tran, TruyenORCID iD for Tran, Truyen orcid.org/0000-0001-6531-8907
Phung, Quoc-DinhORCID iD for Phung, Quoc-Dinh orcid.org/0000-0002-9977-8247
Venkatesh, SvethaORCID iD for Venkatesh, Svetha orcid.org/0000-0001-8675-6631
Conference name Advanced Data Mining and Applications. International Conference (12th : 2016 : Gold Coast, Queensland)
Conference location Gold Coat, Queensland
Conference dates 2016/12/12 - 2016/12/15
Title of proceedings ADMA 2016 : Proceedings of the 12th International Conference of Advanced Data Mining and Applications
Editor(s) Li, J
Li, X
Wang, S
Li, J
Sheng, QZ
Publication date 2016
Series Lecture notes in artificial intelligence
Start page 111
End page 125
Total pages 15
Publisher Springer International Publishing
Place of publication Cham, Switzerland
Summary Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use free-energy derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.
ISBN 9783319495859
ISSN 0302-9743
Language eng
DOI 10.1007/978-3-319-49586-6_8
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 0 Not Applicable
HERDC Research category E1 Full written paper - refereed
Copyright notice ©2016, Springer International Publishing
Persistent URL http://hdl.handle.net/10536/DRO/DU:30091399

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 5 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 461 Abstract Views, 3 File Downloads  -  Detailed Statistics
Created: Fri, 17 Feb 2017, 16:38:32 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.