Deakin University
Browse

Short text similarity measurement using context-aware weighted biterms

Version 2 2024-06-04, 13:55
Version 1 2020-05-05, 10:22
journal contribution
posted on 2024-06-04, 13:55 authored by S Yang, Guangyan HuangGuangyan Huang, Bahadorreza OfoghiBahadorreza Ofoghi, John YearwoodJohn Yearwood
© 2020 John Wiley & Sons, Ltd. With the development of internet technologies, social media and mobile devices, short texts have become an increasingly popular medium among users to communicate with friends, search information and review products. Measuring the similarity between short texts is a fundamental task due to its importance in many applications, such as text retrieval, topic discovery, and event detection. However, short texts generally comprise sparse, noisy, and ambiguous information. Hence, effectively measuring the distance between short texts is a challenging task. In this paper, we exploit the advantageous corpus-wide word co-occurrence information into document-level feature enrichment to mitigate the challenges caused by the sparseness of short texts for distance measurement. We propose a novel context-aware weighted Biterm method for short text Distance Measurement (BDM). In BDM, we extract biterms (ie, word pairs) from a short text corpus and exploit a biterm topic model to determine the global weights of biterms in the corpus. We then determine the local importance of a biterm in different contexts (ie, short texts) based on the corpus-level biterm weight. The distance between two short texts is computed using the context-aware weighted biterms. Experimental results on three real-world datasets demonstrate better accuracy and effectiveness of the proposed BDM.

History

Journal

Concurrency Computation

Volume

34

Article number

e5765

Pagination

1-11

Location

London, Eng.

ISSN

1532-0626

eISSN

1532-0634

Language

English

Publication classification

C1 Refereed article in a scholarly journal

Issue

8

Publisher

Wiley