Authorship analysis of aliases: Does topic influence accuracy?

Layton, R; Watters, PA; Dazeley, Richard

Authorship analysis of aliases: Does topic influence accuracy?

journal contribution

posted on 2024-06-04, 15:49 authored by R Layton, PA Watters, Richard DazeleyRichard Dazeley

Copyright © Cambridge University Press 2013. Aliases play an important role in online environments by facilitating anonymity, but also can be used to hide the identity of cybercriminals. Previous studies have investigated this alias matching problem in an attempt to identify whether two aliases are shared by an author, which can assist with identifying users. Those studies create their training data by randomly splitting the documents associated with an alias into two sub-aliases. Models have been built that can regularly achieve over 90% accuracy for recovering the linkage between these 'random sub-aliases'. In this paper, random sub-alias generation is shown to enable these high accuracies, and thus does not adequately model the real-world problem. In contrast, creating sub-aliases using topic-based splitting drastically reduces the accuracy of all authorship methods tested. We then present a methodology that can be performed on non-topic controlled datasets, to produce topic-based sub-aliases that are more difficult to match. Finally, we present an experimental comparison between many authorship methods to see which methods better match aliases under these conditions, finding that local n-gram methods perform better than others.

History

Journal

Natural language engineering

Volume

21

Pagination

497-518

Location

London, Eng.

ISSN

1351-3249

eISSN

1469-8110

Language

eng

Publication classification

C Journal article, C1.1 Refereed article in a scholarly journal

Copyright notice

2013, Cambridge University Press

Issue

4

Publisher

Cambridge

Usage metrics

Keywords

School of Information Technology

Authorship analysis of aliases: Does topic influence accuracy?

History

Journal

Volume

Pagination

Location

ISSN

eISSN

Language

Publication classification

Copyright notice

Issue

Publisher

Usage metrics

Categories

Keywords

Licence

Exports