File(s) under permanent embargo

Clustering Hashtags Using Temporal Patterns

conference contribution
posted on 2020-01-01, 00:00 authored by Borui Cai Borui, Guangyan HuangGuangyan Huang, S Yang, Yong XiangYong Xiang, C H Chi
Twitter hashtags provide a high-level summary of tweets,
while cluster hashtags have many applications. Existing text-based methods (relying on explicit words in tweets) are greatly affected by the sparsity of the short tweet texts and the low co-occurrence rates of hashtags in
tweets. Meanwhile, semantically related hashtags but using different textexpressions may show similar temporal patterns (i.e., the frequencies of
hashtag usages changing with the time), which can help capture events,
opinions and synonyms. In this paper, we propose a novel clustering
hashtags by their temporal patterns (CHTP) method as a complement
to text-based methods. In CHTP, hashtags are represented as hashtag
time series that show their temporal patterns, so, hashtag clusters can
be discovered by clustering hashtag time series. Density-based clustering
algorithms are suitable to discover naturally shaped hashtag clusters but
they are not fine enough (use one distance threshold to define density)
to differentiate clusters of various density levels. Therefore, we develop
a new parameter-free Density-Sensitive Clustering (DSC) algorithm to
discover clusters of different density levels and use it in CHTP to group
hashtags by temporal patterns. DSC recursively partitions the dataset
from coarse-grained to fine-grained (using adaptive distance thresholds)
to discover hashtag clusters of different density levels. Experiments conducted on Twitter datasets show that the DSC algorithm finds hashtag
clusters of different densities more effectively than counterpart methods,
and CHTP (using DSC) can discover meaningful hashtag clusters, 36%
of which cannot be found by the text-based approaches.

History

Event

Web Information Systems Engineering. Conference (2020 : Amsterdam, The Netherlands)

Volume

12342

Series

Lecture Notes in Computer Science

Pagination

183 - 195

Publisher

Springer International Publishing

Location

Amsterdam, The Netherlands

Place of publication

Cham, Switzerland

Start date

2020-10-20

End date

2020-10-24

ISSN

0302-9743

eISSN

1611-3349

ISBN-13

9783030620042

Language

eng

Publication classification

E1 Full written paper - refereed

Copyright notice

2020, Springer Nature Switzerland AG

Editor/Contributor(s)

[Unknown]

Title of proceedings

WISE 2020 : Proceedings of the International Conference on Web Information Systems Engineering