posted on 2016-01-01, 00:00authored byC Zhang, X Su, Y Hu, Zili ZhangZili Zhang, Y Deng
Spam, also known as unsolicited bulk e-mail (UBE), has recently become a serious threat that negatively impacts the usability of legitimate mails. In this article, an evidential spam-filtering framework is proposed. As a useful tool to handle uncertainty, the Dempster–Shafer theory of evidence (D–S theory) is integrated into the proposed approach. Five representative features from an e-mail header are analyzed. With a machine-learning algorithm, e-mail headers with known classifications are used to train the framework. When using the framework for a given e-mail header, its representative features are quantified. Although in classical probability theory, possibilities are forcedly assigned even when information is not adequate, in our approach, for every word in an e-mail subject, basic probability assignments (BPA) are assigned in a more flexible way, thus providing a more reasonable result. Finally, BPAs are combined and transformed into pignistic probabilities for decision-making. Empirical trials on real-world datasets show the efficiency of the proposed framework.
History
Journal
Cybernetics and systems: an international journal
Volume
47
Pagination
427-444
Location
Abingdon, Eng.
ISSN
0196-9722
eISSN
1087-6553
Language
eng
Publication classification
C Journal article, C1.1 Refereed article in a scholarly journal