Deakin University
Browse

File(s) under permanent embargo

Visual Structural Assessment and Anomaly Detection for High-Velocity Data Streams

Version 2 2024-06-06, 07:07
Version 1 2020-06-03, 09:34
journal contribution
posted on 2024-06-06, 07:07 authored by P Rathore, D Kumar, JC Bezdek, Sutharshan RajasegararSutharshan Rajasegarar, M Palaniswami
The widespread use of Internet-of-Things (IoT) technologies, smartphones, and social media services generates huge amounts of data streaming at high velocity. Automatic interpretation of these rapidly arriving data streams is required for the timely detection of interesting events that usually emerge in the form of clusters. This article proposes a new relative of the visual assessment of the cluster tendency (VAT) model, which produces a record of structural evolution in the data stream by building a cluster heat map of the entire processing history in the stream. The existing VAT-based algorithms for streaming data, called inc-VAT/inc-iVAT and dec-VAT/dec-iVAT, are not suitable for high-velocity and high-volume streaming data because of high memory requirements and slower processing speed as the accumulated data increases. The scalable iVAT (siVAT) algorithm can handle big batch data, but for streaming data, it needs to be (re)applied everytime a new datapoint arrives, which is not feasible due to the associated computation complexities. To address this problem, we propose an incremental siVAT algorithm, called inc-siVAT, which deals with the streaming data in chunks. It first extracts a small size smart sample using an intelligent sampling scheme, called maximin random sampling (MMRS), then incrementally updates the smart sample points on the fly, using our novel incremental MMRS (inc-MMRS) algorithm, to reflect changes in the data stream after each chunk is processed, and finally, produces an incrementally built iVAT image of the updated smart sample, using the inc-VAT/inc-iVAT and dec-VAT/dec-iVAT algorithms. These images can be used to visualize the evolving cluster structure and for anomaly detection in streaming data. Our method is illustrated with one synthetic and four real datasets, two of which evolve significantly over time. Our numerical experiments demonstrate the algorithm's ability to successfully identify anomalies and visualize changing cluster structure in streaming data.

History

Journal

IEEE Transactions on Cybernetics

Volume

51

Pagination

5979-5992

Location

United States

ISSN

2168-2267

eISSN

2168-2275

Language

English

Notes

In Press

Publication classification

C1 Refereed article in a scholarly journal

Issue

12

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC