This paper examines differences in the choice of similarity measures with respect to their sensitivity to outliers in clustering problems, formulated as mathematical programming problems. Namely, we are focusing on the study of norms (norm-based similarity measures) and convex functions of norms (function-norm-based similarity measures). The study consists of two parts: the study of theoretical models and numerical experiments. The main result of this study is a criterion for the outliers sensitivity with respect to the corresponding similarity measure. In particular, the obtained results show that the norm-based similarity measures are not sensitive to outliers whilst a very widely used square of the Euclidean norm similarity measure (least squares) is sensitive to outliers.
History
Journal
Dynamics of continuous, discrete and impulsive systems series B: applications and algorithms