Deakin University

File(s) under permanent embargo

Near real-time twitter spam detection with machine learning techniques

journal contribution
posted on 2020-01-01, 00:00 authored by Nan Sun, Guanjun Lin, Junyang Qiu, P Rimba
© 2020, © 2020 Informa UK Limited, trading as Taylor & Francis Group. The popularity of social media networks, such as Twitter, leads to an increasing number of spamming activities. Researchers employed various machine learning methods to detect Twitter spam. However, majorities of existing researches are limited to theoretically study, few of them can apply detection techniques to real-time scenario. In this paper, we bridge the gap by proposing a near real-time Twitter spam detection system, which provides near real-time tweets data acquisition, light-weight features extraction from a specific Twitter account, training detection model, and online visualizing detection results. In this system, account-based and content-based features are extracted to facilitate spam detection. The models that are applied to our Twitter spam detection system are trained based on 1.5 million public tweets and nine mainstream algorithms. In addition, in order to efficiently reduce training time spent on massive data and save the cost of model updating, a parallel computing technique is introduced to train and update the models in this system. Empirical results verify that the model can achieve satisfactory performance based on our datasets. Furthermore, we implement a near real-time Twitter spam detection system which can better protect users from combating spams. This system also acts as a tweets collection tool, allowing researchers to test the performance of trained classifiers in realistic scenarios.



International Journal of Computers and Applications


1 - 11


Taylor & Francis


London, Eng.





Publication classification

C1 Refereed article in a scholarly journal