Deakin University
Browse

File(s) under permanent embargo

Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

conference contribution
posted on 2015-08-25, 00:00 authored by Adham Beykikhoshk, Ognjen Arandjelovic, Quoc-Dinh Phung, Svetha VenkateshSvetha Venkatesh
Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.

History

Event

IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2015 : Paris, France)

Pagination

1354 - 1361

Publisher

Association for Computing Machinery (ACM)

Location

Paris, France

Place of publication

New York, N.Y.

Start date

2015-08-25

End date

2015-08-28

ISBN-13

9781450338547

Language

eng

Publication classification

E Conference publication; E1 Full written paper - refereed

Copyright notice

2015, ACM

Editor/Contributor(s)

J Pei, F Silvestri, J Tang

Title of proceedings

ASONAM 2015: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC