Wikipedia vandal early detection: from user behavior to user embedding
conference contribution
posted on 2018-01-01, 00:00 authored by S Yuan, P Zheng, X Wu, Y Xiang© 2017, Springer International Publishing AG. Wikipedia is the largest online encyclopedia that allows anyone to edit articles. In this paper, we propose the use of deep learning to detect vandals based on their edit history. In particular, we develop a multi-source long-short term memory network (M-LSTM) to model user behaviors by using a variety of user edit aspects as inputs, including the history of edit reversion information, edit page titles and categories. With M-LSTM, we can encode each user into a low dimensional real vector, called user embedding. Meanwhile, as a sequential model, M-LSTM updates the user embedding each time after the user commits a new edit. Thus, we can predict whether a user is benign or vandal dynamically based on the up-to-date user embedding. Furthermore, those user embeddings are crucial to discover collaborative vandals. Code and data related to this chapter are available at: https://bitbucket.org/bookcold/vandal_detection.
History
Volume
10534Pagination
832-846Location
Skopje, MacedoniaStart date
2017-09-18End date
2017-09-22ISSN
0302-9743eISSN
1611-3349ISBN-13
9783319712482Language
engPublication classification
E Conference publication, E1.1 Full written paper - refereedCopyright notice
2017, Springer International Publishing AGTitle of proceedings
ECML PKDD 2017 : Proceedings, Part I : Machine Learning and Knowledge Discovery in DatabasesEvent
Machine Learning and Knowledge Discovery in Databases. Joint European Conference (2017 : Skopje, Macedonia)Publisher
SpringerPlace of publication
Cham, SwitzerlandSeries
Lecture Notes in Computer ScienceUsage metrics
Keywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC