A performance evaluation of machine learning-based streaming spam tweets detection

Chen, Chao, Zhang, Jun, Xie,Yi, Xiang, Yang, Zhou, Wanlei, Hassan, Mohammad Mehedi, AlElaiwi, Abdulhameed and Alrubaian, Majed 2016, A performance evaluation of machine learning-based streaming spam tweets detection, IEEE transactions on computational social systems, vol. 2, no. 3, pp. 65-76, doi: 10.1109/TCSS.2016.2516039.

Attached Files
Name Description MIMEType Size Downloads

Title A performance evaluation of machine learning-based streaming spam tweets detection
Author(s) Chen, Chao
Zhang, JunORCID iD for Zhang, Jun orcid.org/0000-0002-2189-7801
Xiang, YangORCID iD for Xiang, Yang orcid.org/0000-0001-5252-0831
Zhou, WanleiORCID iD for Zhou, Wanlei orcid.org/0000-0002-1680-2521
Hassan, Mohammad Mehedi
AlElaiwi, Abdulhameed
Alrubaian, Majed
Journal name IEEE transactions on computational social systems
Volume number 2
Issue number 3
Start page 65
End page 76
Total pages 12
Publisher IEEE
Place of publication Piscataway, N.J
Publication date 2016-09
ISSN 2329-924X
Summary The popularity of Twitter attracts more and more spammers. Spammers send unwanted tweets to Twitter users to promote websites or services, which are harmful to normal users. In order to stop spammers, researchers have proposed a number of mechanisms. The focus of recent works is on the application of machine learning techniques into Twitter spam detection. However, tweets are retrieved in a streaming way, and Twitter provides the Streaming API for developers and researchers to access public tweets in real time. There lacks a performance evaluation of existing machine learning-based streaming spam detection methods. In this paper, we bridged the gap by carrying out a performance evaluation, which was from three different aspects of data, feature, and model. A big ground-truth of over 600 million public tweets was created by using a commercial URL-based security tool. For real-time spam detection, we further extracted 12 lightweight features for tweet representation. Spam detection was then transformed to a binary classification problem in the feature space and can be solved by conventional machine learning algorithms. We evaluated the impact of different factors to the spam detection performance, which included spam to nonspam ratio, feature discretization, training data size, data sampling, time-related data, and machine learning algorithms. The results show the streaming spam tweet detection is still a big challenge and a robust detection technique should take into account the three aspects of data, feature, and model.
Language eng
DOI 10.1109/TCSS.2016.2516039
Field of Research 080303 Computer System Security
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2016, IEEE
Persistent URL http://hdl.handle.net/10536/DRO/DU:30083971

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 20 times in TR Web of Science
Scopus Citation Count Cited 29 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 515 Abstract Views, 1 File Downloads  -  Detailed Statistics
Created: Mon, 06 Jun 2016, 14:14:31 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.