Deakin University
Browse

File(s) under permanent embargo

Clustering Hashtags Using Temporal Patterns

Version 2 2024-06-04, 00:07
Version 1 2020-12-21, 16:32
conference contribution
posted on 2024-06-04, 00:07 authored by B Cai, Guangyan HuangGuangyan Huang, S Yang, Yong XiangYong Xiang, CH Chi
Twitter hashtags provide a high-level summary of tweets, while cluster hashtags have many applications. Existing text-based methods (relying on explicit words in tweets) are greatly affected by the sparsity of the short tweet texts and the low co-occurrence rates of hashtags in tweets. Meanwhile, semantically related hashtags but using different textexpressions may show similar temporal patterns (i.e., the frequencies of hashtag usages changing with the time), which can help capture events, opinions and synonyms. In this paper, we propose a novel clustering hashtags by their temporal patterns (CHTP) method as a complement to text-based methods. In CHTP, hashtags are represented as hashtag time series that show their temporal patterns, so, hashtag clusters can be discovered by clustering hashtag time series. Density-based clustering algorithms are suitable to discover naturally shaped hashtag clusters but they are not fine enough (use one distance threshold to define density) to differentiate clusters of various density levels. Therefore, we develop a new parameter-free Density-Sensitive Clustering (DSC) algorithm to discover clusters of different density levels and use it in CHTP to group hashtags by temporal patterns. DSC recursively partitions the dataset from coarse-grained to fine-grained (using adaptive distance thresholds) to discover hashtag clusters of different density levels. Experiments conducted on Twitter datasets show that the DSC algorithm finds hashtag clusters of different densities more effectively than counterpart methods, and CHTP (using DSC) can discover meaningful hashtag clusters, 36% of which cannot be found by the text-based approaches.

History

Volume

12342

Pagination

183-195

Location

Amsterdam, The Netherlands

Start date

2020-10-20

End date

2020-10-24

ISSN

0302-9743

eISSN

1611-3349

ISBN-13

9783030620042

Language

eng

Publication classification

E1 Full written paper - refereed

Copyright notice

2020, Springer Nature Switzerland AG

Editor/Contributor(s)

[Unknown]

Title of proceedings

WISE 2020 : Proceedings of the International Conference on Web Information Systems Engineering

Event

Web Information Systems Engineering. Conference (2020 : Amsterdam, The Netherlands)

Publisher

Springer International Publishing

Place of publication

Cham, Switzerland

Series

Lecture Notes in Computer Science

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC