Openly accessible

Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

Beykikhoshk, Adham, Arandjelović, Ognjen, Phung, Dinh and Venkatesh, Svetha 2015, Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis, in ASONAM 2015: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Association for Computing Machinery (ACM), New York, N.Y., pp. 1354-1361, doi: 10.1145/2808797.2808908.

Attached Files
Name Description MIMEType Size Downloads
beykikhoshk-overcomingdata-post-2015.pdf Accepted version application/pdf 8.76MB 0

Title Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis
Author(s) Beykikhoshk, Adham
Arandjelović, Ognjen
Phung, DinhORCID iD for Phung, Dinh orcid.org/0000-0002-9977-8247
Venkatesh, SvethaORCID iD for Venkatesh, Svetha orcid.org/0000-0001-8675-6631
Conference name IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2015 : Paris, France)
Conference location Paris, France
Conference dates 28-25 Aug. 2013
Title of proceedings ASONAM 2015: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Editor(s) Pei, Jian
Silvestri, Fabrizio
Tang, Jie
Publication date 2015
Start page 1354
End page 1361
Total pages 8
Publisher Association for Computing Machinery (ACM)
Place of publication New York, N.Y.
Keyword(s) cs.SI
Summary Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.
ISBN 9781450338547
Language eng
DOI 10.1145/2808797.2808908
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category E1 Full written paper - refereed
ERA Research output type E Conference publication
Grant ID ARC LP140100240
Copyright notice ©2015, ACM
Free to Read? Yes
Persistent URL http://hdl.handle.net/10536/DRO/DU:30083448

Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 6 times in TR Web of Science
Scopus Citation Count Cited 12 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 291 Abstract Views, 8 File Downloads  -  Detailed Statistics
Created: Fri, 13 May 2016, 16:37:28 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.