Deakin University
Browse

File(s) under permanent embargo

Authorship analysis of aliases: Does topic influence accuracy?

journal contribution
posted on 2015-08-01, 00:00 authored by R Layton, P A Watters, Richard DazeleyRichard Dazeley
Copyright © Cambridge University Press 2013. Aliases play an important role in online environments by facilitating anonymity, but also can be used to hide the identity of cybercriminals. Previous studies have investigated this alias matching problem in an attempt to identify whether two aliases are shared by an author, which can assist with identifying users. Those studies create their training data by randomly splitting the documents associated with an alias into two sub-aliases. Models have been built that can regularly achieve over 90% accuracy for recovering the linkage between these 'random sub-aliases'. In this paper, random sub-alias generation is shown to enable these high accuracies, and thus does not adequately model the real-world problem. In contrast, creating sub-aliases using topic-based splitting drastically reduces the accuracy of all authorship methods tested. We then present a methodology that can be performed on non-topic controlled datasets, to produce topic-based sub-aliases that are more difficult to match. Finally, we present an experimental comparison between many authorship methods to see which methods better match aliases under these conditions, finding that local n-gram methods perform better than others.

History

Journal

Natural language engineering

Volume

21

Issue

4

Pagination

497 - 518

Publisher

Cambridge

Location

London, Eng.

ISSN

1351-3249

eISSN

1469-8110

Language

eng

Publication classification

C Journal article; C1.1 Refereed article in a scholarly journal

Copyright notice

2013, Cambridge University Press