6 million spam tweets: a large ground truth for timely Twitter spam detection

Chen, Chao, Zhang, Jun, Chen, Xiao, Xiang, Yang and Zhou, Wanlei 2015, 6 million spam tweets: a large ground truth for timely Twitter spam detection, in ICC 2015 : IEEE Proceedings of the International Conference on Communications, IEEE, Piscataway, N.J., pp. 7065-7070, doi: 10.1109/ICC.2015.7249453.

Attached Files
Name Description MIMEType Size Downloads

Title 6 million spam tweets: a large ground truth for timely Twitter spam detection
Author(s) Chen, Chao
Zhang, JunORCID iD for Zhang, Jun orcid.org/0000-0002-2189-7801
Chen, Xiao
Xiang, YangORCID iD for Xiang, Yang orcid.org/0000-0001-5252-0831
Zhou, WanleiORCID iD for Zhou, Wanlei orcid.org/0000-0002-1680-2521
Conference name IEEE International Conference on Communications (2015 : London, England)
Conference location London, England
Conference dates 8-12 Jun. 2015
Title of proceedings ICC 2015 : IEEE Proceedings of the International Conference on Communications
Publication date 2015
Start page 7065
End page 7070
Total pages 6
Publisher IEEE
Place of publication Piscataway, N.J.
Summary Twitter has changed the way of communication and getting news for people's daily life in recent years. Meanwhile, due to the popularity of Twitter, it also becomes a main target for spamming activities. In order to stop spammers, Twitter is using Google SafeBrowsing to detect and block spam links. Despite that blacklists can block malicious URLs embedded in tweets, their lagging time hinders the ability to protect users in real-time. Thus, researchers begin to apply different machine learning algorithms to detect Twitter spam. However, there is no comprehensive evaluation on each algorithms' performance for real-time Twitter spam detection due to the lack of large groundtruth. To carry out a thorough evaluation, we collected a large dataset of over 600 million public tweets. We further labelled around 6.5 million spam tweets and extracted 12 light-weight features, which can be used for online detection. In addition, we have conducted a number of experiments on six machine learning algorithms under various conditions to better understand their effectiveness and weakness for timely Twitter spam detection. We will make our labelled dataset for researchers who are interested in validating or extending our work.
ISBN 9781467364324
ISSN 1550-3607
Language eng
DOI 10.1109/ICC.2015.7249453
Field of Research 100602 Input, Output and Data Devices
Socio Economic Objective 890205 Information Processing Services (incl. Data Entry and Capture)
HERDC Research category E1 Full written paper - refereed
ERA Research output type E Conference publication
Copyright notice ©2015, IEEE
Persistent URL http://hdl.handle.net/10536/DRO/DU:30081434

Document type: Conference Paper
Collections: School of Information Technology
2018 ERA Submission
Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 12 times in TR Web of Science
Scopus Citation Count Cited 22 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 475 Abstract Views, 7 File Downloads  -  Detailed Statistics
Created: Mon, 04 Apr 2016, 12:38:28 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.