Deakin University
Browse

Scalable bottom-up subspace clustering using FP-trees for high dimensional data

Version 2 2024-06-04, 06:15
Version 1 2019-04-15, 15:23
conference contribution
posted on 2024-06-04, 06:15 authored by MT Doan, J Qi, Sutharshan RajasegararSutharshan Rajasegarar, C Leckie
Subspace clustering aims to find groups of similar objects (clusters) that exist in lower dimensional subspaces from a high dimensional dataset. It has a wide range of applications, such as analysing high dimensional sensor data or DNA sequences. However, existing algorithms have limitations in finding clusters in non-disjoint subspaces and scaling to large data, which impinge their applicability in areas such as bioinformatics and the Internet of Things. We aim to address such limitations by proposing a subspace clustering algorithm using a bottom-up strategy. Our algorithm first searches for base clusters in low dimensional subspaces. It then forms clusters in higher-dimensional subspaces using these base clusters, which we formulate as a frequent pattern mining problem. This formulation enables efficient search for clusters in higher-dimensional subspaces, which is done using FP-trees. The proposed algorithm is evaluated against traditional bottom-up clustering algorithms and state-of-the-art subspace clustering algorithms. The experimental results show that the proposed algorithm produces clusters with high accuracy, and scales well to large volumes of data. We also demonstrate the algorithm's performance using real-life ten genomic datasets.

History

Pagination

106-111

Location

Seattle, Wash.

Start date

2018-12-10

End date

2018-12-13

ISBN-13

9781538650356

Language

eng

Publication classification

E1 Full written paper - refereed

Copyright notice

2018, IEEE

Editor/Contributor(s)

Abe N, Liu H, Pu C, Hu X, Ahmed N, Qiao M, Song Y, Kossman D, Liu B, Lee K, Tang J, He J, Saltz J

Title of proceedings

Big Data : Proceedings of the 2018 IEEE International Conference on Big Data

Event

IEEE Computer Society. Conference (2018 : Seattle, Wash.)

Publisher

Institute of Electrical and Electronics Engineers

Place of publication

Piscataway, N.J.

Series

IEEE Computer Society Conference

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC