File(s) under permanent embargo
Scalable bottom-up subspace clustering using FP-trees for high dimensional data
conference contribution
posted on 2018-01-01, 00:00 authored by M T Doan, J Qi, Sutharshan RajasegararSutharshan Rajasegarar, C LeckieSubspace clustering aims to find groups of similar objects (clusters) that exist in lower dimensional subspaces from a high dimensional dataset. It has a wide range of applications, such as analysing high dimensional sensor data or DNA sequences. However, existing algorithms have limitations in finding clusters in non-disjoint subspaces and scaling to large data, which impinge their applicability in areas such as bioinformatics and the Internet of Things. We aim to address such limitations by proposing a subspace clustering algorithm using a bottom-up strategy. Our algorithm first searches for base clusters in low dimensional subspaces. It then forms clusters in higher-dimensional subspaces using these base clusters, which we formulate as a frequent pattern mining problem. This formulation enables efficient search for clusters in higher-dimensional subspaces, which is done using FP-trees. The proposed algorithm is evaluated against traditional bottom-up clustering algorithms and state-of-the-art subspace clustering algorithms. The experimental results show that the proposed algorithm produces clusters with high accuracy, and scales well to large volumes of data. We also demonstrate the algorithm's performance using real-life ten genomic datasets.
History
Event
IEEE Computer Society. Conference (2018 : Seattle, Wash.)Series
IEEE Computer Society ConferencePagination
106 - 111Publisher
Institute of Electrical and Electronics EngineersLocation
Seattle, Wash.Place of publication
Piscataway, N.J.Publisher DOI
Start date
2018-12-10End date
2018-12-13ISBN-13
9781538650356Language
engPublication classification
E1 Full written paper - refereedCopyright notice
2018, IEEEEditor/Contributor(s)
N Abe, H Liu, C Pu, X Hu, N Ahmed, M Qiao, Y Song, D Kossman, B Liu, K Lee, J Tang, J He, J SaltzTitle of proceedings
Big Data : Proceedings of the 2018 IEEE International Conference on Big DataUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC