Mining streams of short text for analysis of world-wide event evolutions

Huang, Guangyan, He, Jing, Zhang, Yanchun, Zhou, Wanlei, Liu, Hai, Zhang, Peng, Ding, Zhiming, You, Yue and Cao, Jian 2015, Mining streams of short text for analysis of world-wide event evolutions, World wide web, vol. 18, no. 5, pp. 1201-1217, doi: 10.1007/s11280-014-0293-1.

Attached Files
Name Description MIMEType Size Downloads

Title Mining streams of short text for analysis of world-wide event evolutions
Author(s) Huang, GuangyanORCID iD for Huang, Guangyan
He, Jing
Zhang, Yanchun
Zhou, WanleiORCID iD for Zhou, Wanlei
Liu, Hai
Zhang, Peng
Ding, Zhiming
You, Yue
Cao, Jian
Journal name World wide web
Volume number 18
Issue number 5
Start page 1201
End page 1217
Total pages 17
Publisher Springer
Place of publication Berlin, Germany
Publication date 2015-09
ISSN 1386-145X
Keyword(s) Clustering
Event evolutions
Streams of short text
Text mining
Topic discovery
Summary Streams of short text, such as news titles, enable us to effectively and efficiently learn the real world events that occur anywhere and anytime. Short text messages that are companied by timestamps and generally brief events using only a few words differ from other longer text documents, such as web pages, news stories, blogs, technical papers and books. For example, few words repeat in the same news titles, thus frequency of the term (i.e., TF) is not as important in short text corpus as in longer text corpus. Therefore, analysis of short text faces new challenges. Also, detecting and tracking events through short text analysis need to reliably identify events from constant topic clusters; however, existing methods, such as Latent Dirichlet Allocation (LDA), generates different topic results for a corpus at different executions. In this paper, we provide a Finding Topic Clusters using Co-occurring Terms (FTCCT) algorithm to automatically generate topics from a short text corpus, and develop an Event Evolution Mining (EEM) algorithm to discover hot events and their evolutions (i.e., the popularity degrees of events changing over time). In FTCCT, a term (i.e., a single word or a multiple-words phrase) belongs to only one topic in a corpus. Experiments on news titles of 157 countries within 4 months (from July to October, 2013) demonstrate that our FTCCT-based method (combining FTCCT and EEM) achieves far higher quality of the event's content and description words than LDA-based method (combining LDA and EEM) for analysis of streams of short text. Our method also visualizes the evolutions of the hot events. The discovered world-wide event evolutions have explored some interesting correlations of the world-wide events; for example, successive extreme weather phenomenon occur in different locations - typhoon in Hong Kong and Philippines followed hurricane and storm flood in Mexico in September 2013. © 2014 Springer Science+Business Media New York.
Language eng
DOI 10.1007/s11280-014-0293-1
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 810105 Intelligence
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Grant ID DE140100387
Copyright notice ©2014, Springer
Persistent URL

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 7 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 551 Abstract Views, 2 File Downloads  -  Detailed Statistics
Created: Fri, 13 Mar 2015, 10:18:18 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact