CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities

Zhu, Ye, Ting, KM, Carman, MJ and Angelova Turkedjieva, Maia 2021, CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities, Pattern Recognition, pp. 1-41, doi: 10.1016/j.patcog.2021.107977.

Attached Files
Name Description MIMEType Size Downloads
t031332-cdf.pdf Published version application/pdf 0

Title CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities
Author(s) Zhu, YeORCID iD for Zhu, Ye
Ting, KM
Carman, MJ
Angelova Turkedjieva, MaiaORCID iD for Angelova Turkedjieva, Maia
Journal name Pattern Recognition
Article ID 107977
Start page 1
End page 41
Total pages 41
Publisher Elsevier
Place of publication Amsterdam, The Netherlands
Publication date 2021-04-08
ISSN 0031-3203
Keyword(s) Density-ratio
Density-based clustering
kNN anomaly detection
inhomogeneous cluster densities
Science & Technology
Computer Science, Artificial Intelligence
Engineering, Electrical & Electronic
Computer Science
Summary The problem of inhomogeneous cluster densities has been a long-standing issue for distance-based and density-based algorithms in clustering and anomaly detection. These algorithms implicitly assume that all clusters have approximately the same density. As a result, they often exhibit a bias towards dense clusters in the presence of sparse clusters. Many remedies have been suggested; yet, we show that they are partial solutions which do not address the issue satisfactorily. To match the implicit assumption, we propose to transform a given dataset such that the transformed clusters have approximately the same density while all regions of locally low density become globally low density—homogenising cluster density while preserving the cluster structure of the dataset. We show that this can be achieved by using a new multi-dimensional Cumulative Distribution Function in a transform-and-shift method. The method can be applied to every dataset, before the dataset is used in many existing algorithms to match their implicit assumption without algorithmic modification. We show that the proposed method performs better than existing remedies.
Notes In Press
Language eng
DOI 10.1016/j.patcog.2021.107977
Indigenous content off
Field of Research 0801 Artificial Intelligence and Image Processing
0806 Information Systems
0906 Electrical and Electronic Engineering
HERDC Research category C1 Refereed article in a scholarly journal
Persistent URL

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 2 Abstract Views, 0 File Downloads  -  Detailed Statistics
Created: Mon, 13 Sep 2021, 15:13:29 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact