Deakin University
Browse
yang-discoveringtopic-2019.pdf (6.36 MB)

Discovering topic representative terms for short text clustering

Download (6.36 MB)
journal contribution
posted on 2019-01-01, 00:00 authored by Shuiqiao Yang, Guangyan HuangGuangyan Huang, Borui Cai Borui
© 2013 IEEE. Clustering short texts are one of the most important text analysis methods to help extract knowledge from online social media platforms, such as Twitter, Facebook, and Weibo. However, the instant features (such as abbreviation and informal expression) and the limited length of short texts challenge the clustering task. Fortunately, short texts about the same topic often share some common terms (or term stems), which can effectively represent a topic (i.e., supported by a cluster of short texts), and we also call them topic representative terms. Taking advantage of topic representative terms, it is much easier to cluster short texts by grouping short texts into the most similar topic representative term groups. This paper provides a novel topic representative term discovery (TRTD) method for short text clustering. In our TRTD method, we discover groups of closely bound up topic representative terms by exploiting the closeness and significance of terms. The closeness of the topic representative terms is measured by their interdependent co-occurrence, and the significance is measured by their global term occurrences throughout the whole short text corpus. The experimental results on real-world datasets demonstrate that TRTD achieves better accuracy and efficiency in short text clustering than the state-of-the-art methods.

History

Journal

IEEE Access

Volume

7

Pagination

92037 - 92047

Publisher

IEEE

Location

Piscataway, N.J.

ISSN

2169-3536

eISSN

2169-3536

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC