EWNStream +: effective and real-time clustering of short text streams using evolutionary word relation network

Yang, Shuiqiao, Huang, Guangyan, Zhou, Xiangmin, Mak, Vicky and Yearwood, John 2021, EWNStream +: effective and real-time clustering of short text streams using evolutionary word relation network, International journal of information technology and decision making, pp. 1-30, doi: 10.1142/S0219622021500024.

Attached Files
Name Description MIMEType Size Downloads

Title EWNStream +: effective and real-time clustering of short text streams using evolutionary word relation network
Author(s) Yang, Shuiqiao
Huang, GuangyanORCID iD for Huang, Guangyan orcid.org/0000-0002-1821-8644
Zhou, Xiangmin
Mak, VickyORCID iD for Mak, Vicky orcid.org/0000-0002-9306-5780
Yearwood, JohnORCID iD for Yearwood, John orcid.org/0000-0002-7562-6767
Journal name International journal of information technology and decision making
Start page 1
End page 30
Total pages 30
Publisher World Scientific Publishing Company
Place of publication Singapore
Publication date 2021-01-18
ISSN 0219-6220
1793-6845
Keyword(s) short text stream
clustering
topic discovery
event detection
Summary The real-time clustering of short text streams has various applications, such as event tracking, text summarization and sentimental analysis. However, accurately and efficiently clustering short text streams is challenging due to the sparsity problem (i.e., the limited information comprised in a single short text document leads to high-dimensional and sparse vectors when we represent short texts using traditional vector space models), topic drift and the fast generated text streams. In this paper, we provide an effective and real-time Evolutionary Word relation Network for short text streams clustering (EWNStream+) method. The EWNStream+ method constructs a bi-weighted word relation network using the aggregated term frequencies and term co-occurrence statistics at corpus level to overcome the sparsity problem and topic drift of short texts. Better still, as the query window in the stream shifts to the newly arriving data, EWNStream+ is capable of incrementally updating the word relation network by incorporating new word statistics and decaying the old ones to naturally capture the underlying topic drift in the data streams and reduce the size of the network. The experimental results on a real-world dataset show that EWNStream+ can achieve better clustering accuracy and time efficiency than several counterpart methods.
Language eng
DOI 10.1142/S0219622021500024
Indigenous content off
Field of Research 0801 Artificial Intelligence and Image Processing
1503 Business and Management
HERDC Research category C1 Refereed article in a scholarly journal
Persistent URL http://hdl.handle.net/10536/DRO/DU:30147840

Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 103 Abstract Views, 2 File Downloads  -  Detailed Statistics
Created: Mon, 08 Feb 2021, 09:56:05 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.