Authorship analysis of aliases: Does topic influence accuracy?
Version 2 2024-06-04, 15:49Version 2 2024-06-04, 15:49
Version 1 2018-06-12, 09:57Version 1 2018-06-12, 09:57
journal contribution
posted on 2024-06-04, 15:49 authored by R Layton, PA Watters, Richard DazeleyRichard DazeleyCopyright © Cambridge University Press 2013. Aliases play an important role in online environments by facilitating anonymity, but also can be used to hide the identity of cybercriminals. Previous studies have investigated this alias matching problem in an attempt to identify whether two aliases are shared by an author, which can assist with identifying users. Those studies create their training data by randomly splitting the documents associated with an alias into two sub-aliases. Models have been built that can regularly achieve over 90% accuracy for recovering the linkage between these 'random sub-aliases'. In this paper, random sub-alias generation is shown to enable these high accuracies, and thus does not adequately model the real-world problem. In contrast, creating sub-aliases using topic-based splitting drastically reduces the accuracy of all authorship methods tested. We then present a methodology that can be performed on non-topic controlled datasets, to produce topic-based sub-aliases that are more difficult to match. Finally, we present an experimental comparison between many authorship methods to see which methods better match aliases under these conditions, finding that local n-gram methods perform better than others.
History
Journal
Natural language engineeringVolume
21Pagination
497-518Location
London, Eng.ISSN
1351-3249eISSN
1469-8110Language
engPublication classification
C Journal article, C1.1 Refereed article in a scholarly journalCopyright notice
2013, Cambridge University PressIssue
4Publisher
CambridgeUsage metrics
Keywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC