Deakin University
Browse

An ensemble learning approach for addressing the class imbalance problem in twitter spam detection

Version 2 2024-06-06, 05:42
Version 1 2016-08-31, 13:18
chapter
posted on 2024-06-06, 05:42 authored by S Liu, Y Wang, C Chen, Y Xiang
Being an important source for real-time information dissemination in recent years, Twitter is inevitably a prime target of spammers. It has been showed that the damage caused by Twitter spam can reach far beyond the social media platform itself. To mitigate the threat, a lot of recent studies use machine learning techniques to classify Twitter spam and report very satisfactory results. However, most of the studies overlook a fundamental issue that is widely seen in real-world Twitter data, i.e., the class imbalance problem. In this paper, we show that the unequal distribution between spam and non-spam classes in the data has a great impact on spam detection rate. To address the problem, we propose an ensemble learning approach, which involves three steps. In the first step, we adjust the class distribution in the imbalanced data set using various strategies, including random oversampling, random undersampling and fuzzy-based oversampling. In the next step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine all the classification models. Experimental results obtained using real-world Twitter data indicate that the proposed approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.

History

Volume

9722

Chapter number

13

Pagination

215-228

ISSN

0302-9743

eISSN

1611-3349

ISBN-13

9783319402529

Language

eng

Notes

This publication is included in part 1 of the ACISP Australasian Congerence held 4-6 July, Melbourne, Vic.

Publication classification

B Book chapter, B1 Book chapter

Copyright notice

2016, Springer

Extent

32

Editor/Contributor(s)

Liu J, Steinfeld R

Publisher

Springer

Place of publication

Berlin, Germany

Title of book

Information security and privacy

Series

Lecture notes in computer science