Dynamic Clustering of Stream Short Documents Using Evolutionary Word Relation Network
Version 2 2024-06-04, 00:07Version 2 2024-06-04, 00:07
Version 1 2020-03-19, 09:37Version 1 2020-03-19, 09:37
conference contribution
posted on 2024-06-04, 00:07 authored by S Yang, Guangyan HuangGuangyan Huang, X Zhou, Y Xiang© Springer Nature Singapore Pte Ltd 2020. The explosive growth of web 2.0 applications (e.g., social networks, question answering forums and blogs) leads to continuous generation of short texts. Using clustering analysis to automatically categorize the stream short texts has been proved to be one of the critical unsupervised learning techniques. However, the unique attributes of short texts (e.g, few meaningful keywords, noisy features and lacking context) and the temporal dynamics of data in the stream challenge this task. To tackle the problem, in this paper, we propose a stream clustering algorithm EWNStream by exploring the Evolutionary Word relation Network. The word relation network is constructed with the aggregated word co-occurrence patterns from batch of short texts in the stream to overcome the sparse features of short text at document level. To cope with the temporal dynamics of data in the stream, the word relation network will be incrementally updated with the new arriving batches of data. The change of word relation network indicates the evolution of underlying clusters in the stream. Based on the evolutionary word relation network, we proposed a keyword group discovery strategy to extract the representative terms for the underlying short text clusters. The keyword groups are used as cluster centers to group the stream short texts. The experimental results on real-word Twitter dataset show that our method can achieve much better clustering accuracy and time efficiency.
History
Volume
1179Pagination
418-428Location
Ningbo, ChinaStart date
2019-05-15End date
2019-05-20ISSN
1865-0929eISSN
1865-0937ISBN-13
9789811528095Language
engPublication classification
E1 Full written paper - refereedTitle of proceedings
ICDS 2019 : Data science : 6th International Conference, ICDS 2019, Ningbo, China, May 15-20, 2019, revised selected papersEvent
Data Science. Conference (2019 : 6th : Ningbo, China)Publisher
SpringerPlace of publication
Berlin, GermanySeries
Communications in Computer and Information ScienceUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC