Deakin University
Browse
- No file added yet -

Self-adaptive k-means based on a covering algorithm

Download (1.08 MB)
Version 2 2024-06-06, 04:27
Version 1 2018-09-18, 17:56
journal contribution
posted on 2024-06-06, 04:27 authored by Y Zhang, Y Zhou, X Guo, J Wu, Q He, Xiao LiuXiao Liu, Y Yang
The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

History

Journal

Complexity

Volume

2018

Article number

ARTN 7698274

Location

Cairo, Egypt

Open access

  • Yes

ISSN

1076-2787

eISSN

1099-0526

Language

English

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2018, Yiwen Zhang et al.

Publisher

WILEY-HINDAWI