File(s) under permanent embargo
EWNStream +: Effective and Real-time Clustering of Short Text Streams Using Evolutionary Word Relation Network
journal contribution
posted on 2021-01-01, 00:00 authored by S Yang, Guangyan HuangGuangyan Huang, X Zhou, Vicky MakVicky Mak, John YearwoodJohn YearwoodThe real-time clustering of short text streams has various applications, such as event tracking, text summarization and sentimental analysis. However, accurately and efficiently clustering short text streams is challenging due to the sparsity problem (i.e., the limited information comprised in a single short text document leads to high-dimensional and sparse vectors when we represent short texts using traditional vector space models), topic drift and the fast generated text streams. In this paper, we provide an effective and real-time Evolutionary Word relation Network for short text streams clustering (EWNStream[Formula: see text]) method. The EWNStream[Formula: see text] method constructs a bi-weighted word relation network using the aggregated term frequencies and term co-occurrence statistics at corpus level to overcome the sparsity problem and topic drift of short texts. Better still, as the query window in the stream shifts to the newly arriving data, EWNStream[Formula: see text] is capable of incrementally updating the word relation network by incorporating new word statistics and decaying the old ones to naturally capture the underlying topic drift in the data streams and reduce the size of the network. The experimental results on a real-world dataset show that EWNStream[Formula: see text] can achieve better clustering accuracy and time efficiency than several counterpart methods.
History
Journal
International Journal of Information Technology and Decision MakingVolume
20Issue
1Pagination
341 - 370Publisher
WORLD SCIENTIFIC PUBL CO PTE LTDPublisher DOI
ISSN
0219-6220eISSN
1793-6845Language
EnglishPublication classification
C1 Refereed article in a scholarly journalUsage metrics
Keywords
Science & TechnologyTechnologyComputer Science, Artificial IntelligenceComputer Science, Information SystemsComputer Science, Interdisciplinary ApplicationsOperations Research & Management ScienceComputer ScienceShort text streamclusteringtopic discoveryevent detectionSENTIMENTALGORITHMSYSTEMMODELArtificial Intelligence and Image Processing
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC