File(s) under permanent embargo
Clustering Hashtags Using Temporal Patterns
conference contribution
posted on 2020-01-01, 00:00 authored by Borui Cai Borui, Guangyan HuangGuangyan Huang, S Yang, Yong XiangYong Xiang, C H ChiTwitter hashtags provide a high-level summary of tweets,
while cluster hashtags have many applications. Existing text-based methods (relying on explicit words in tweets) are greatly affected by the sparsity of the short tweet texts and the low co-occurrence rates of hashtags in
tweets. Meanwhile, semantically related hashtags but using different textexpressions may show similar temporal patterns (i.e., the frequencies of
hashtag usages changing with the time), which can help capture events,
opinions and synonyms. In this paper, we propose a novel clustering
hashtags by their temporal patterns (CHTP) method as a complement
to text-based methods. In CHTP, hashtags are represented as hashtag
time series that show their temporal patterns, so, hashtag clusters can
be discovered by clustering hashtag time series. Density-based clustering
algorithms are suitable to discover naturally shaped hashtag clusters but
they are not fine enough (use one distance threshold to define density)
to differentiate clusters of various density levels. Therefore, we develop
a new parameter-free Density-Sensitive Clustering (DSC) algorithm to
discover clusters of different density levels and use it in CHTP to group
hashtags by temporal patterns. DSC recursively partitions the dataset
from coarse-grained to fine-grained (using adaptive distance thresholds)
to discover hashtag clusters of different density levels. Experiments conducted on Twitter datasets show that the DSC algorithm finds hashtag
clusters of different densities more effectively than counterpart methods,
and CHTP (using DSC) can discover meaningful hashtag clusters, 36%
of which cannot be found by the text-based approaches.
while cluster hashtags have many applications. Existing text-based methods (relying on explicit words in tweets) are greatly affected by the sparsity of the short tweet texts and the low co-occurrence rates of hashtags in
tweets. Meanwhile, semantically related hashtags but using different textexpressions may show similar temporal patterns (i.e., the frequencies of
hashtag usages changing with the time), which can help capture events,
opinions and synonyms. In this paper, we propose a novel clustering
hashtags by their temporal patterns (CHTP) method as a complement
to text-based methods. In CHTP, hashtags are represented as hashtag
time series that show their temporal patterns, so, hashtag clusters can
be discovered by clustering hashtag time series. Density-based clustering
algorithms are suitable to discover naturally shaped hashtag clusters but
they are not fine enough (use one distance threshold to define density)
to differentiate clusters of various density levels. Therefore, we develop
a new parameter-free Density-Sensitive Clustering (DSC) algorithm to
discover clusters of different density levels and use it in CHTP to group
hashtags by temporal patterns. DSC recursively partitions the dataset
from coarse-grained to fine-grained (using adaptive distance thresholds)
to discover hashtag clusters of different density levels. Experiments conducted on Twitter datasets show that the DSC algorithm finds hashtag
clusters of different densities more effectively than counterpart methods,
and CHTP (using DSC) can discover meaningful hashtag clusters, 36%
of which cannot be found by the text-based approaches.
History
Event
Web Information Systems Engineering. Conference (2020 : Amsterdam, The Netherlands)Volume
12342Series
Lecture Notes in Computer SciencePagination
183 - 195Publisher
Springer International PublishingLocation
Amsterdam, The NetherlandsPlace of publication
Cham, SwitzerlandPublisher DOI
Start date
2020-10-20End date
2020-10-24ISSN
0302-9743eISSN
1611-3349ISBN-13
9783030620042Language
engPublication classification
E1 Full written paper - refereedCopyright notice
2020, Springer Nature Switzerland AGEditor/Contributor(s)
[Unknown]Title of proceedings
WISE 2020 : Proceedings of the International Conference on Web Information Systems EngineeringUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC