An EM-based algorithm for clustering data streams in sliding windows
journal contribution
posted on 2009-01-01, 00:00authored byX Dang, V Lee, Weng Keet Ng, A Ciptadi, Kok-Leong Ong
Cluster analysis has played a key role in data understanding. When such an important data mining task is extended to the context of data streams, it becomes more challenging since the data arrive at a mining system in one-pass manner. The problem is even more difficult when the clustering task is considered in a sliding window model which requiring the elimination of outdated data must be dealt with properly. We propose SWEM algorithm that exploits the Expectation Maximization technique to address these challenges. SWEM is not only able to process the stream in an incremental manner, but also capable to adapt to changes happened in the underlying stream distribution.