Deakin University
Browse

File(s) under permanent embargo

6 million spam tweets: a large ground truth for timely Twitter spam detection

conference contribution
posted on 2015-01-01, 00:00 authored by Chao Chen, Jun Zhang, Chao Chen, Yang Xiang, Wanlei Zhou
Twitter has changed the way of communication and getting news for people's daily life in recent years. Meanwhile, due to the popularity of Twitter, it also becomes a main target for spamming activities. In order to stop spammers, Twitter is using Google SafeBrowsing to detect and block spam links. Despite that blacklists can block malicious URLs embedded in tweets, their lagging time hinders the ability to protect users in real-time. Thus, researchers begin to apply different machine learning algorithms to detect Twitter spam. However, there is no comprehensive evaluation on each algorithms' performance for real-time Twitter spam detection due to the lack of large groundtruth. To carry out a thorough evaluation, we collected a large dataset of over 600 million public tweets. We further labelled around 6.5 million spam tweets and extracted 12 light-weight features, which can be used for online detection. In addition, we have conducted a number of experiments on six machine learning algorithms under various conditions to better understand their effectiveness and weakness for timely Twitter spam detection. We will make our labelled dataset for researchers who are interested in validating or extending our work.

History

Event

IEEE International Conference on Communications (2015 : London, England)

Pagination

7065 - 7070

Publisher

IEEE

Location

London, England

Place of publication

Piscataway, N.J.

Start date

2015-06-08

End date

2015-06-12

ISSN

1550-3607

ISBN-13

9781467364324

Language

eng

Publication classification

E Conference publication; E1 Full written paper - refereed

Copyright notice

2015, IEEE

Title of proceedings

ICC 2015 : IEEE Proceedings of the International Conference on Communications

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC