Automated unsupervised authorship analysis using evidence accumulation clustering

Layton, R; Watters, P; Dazeley, Richard

Automated unsupervised authorship analysis using evidence accumulation clustering

journal contribution

posted on 2013-01-01, 00:00 authored by R Layton, P Watters, Richard DazeleyRichard Dazeley

Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of clusters of authorship within a corpus by analysts. However, there is a need in many fields for more sophisticated unsupervised methods to automate the discovery, profiling and organisation of related information through clustering of documents by authorship. An automated and unsupervised methodology for clustering documents by authorship is proposed in this paper. The methodology is named NUANCE, for n-gram Unsupervised Automated Natural Cluster Ensemble. Testing indicates that the derived clusters have a strong correlation to the true authorship of unseen documents.

History

Journal

Natural language engineering

Volume

19

Pagination

95-120

Location

Cambridge, Eng.

ISSN

1351-3249

eISSN

1469-8110

Language

eng

Publication classification

C Journal article, C1.1 Refereed article in a scholarly journal

Copyright notice

2011, Cambridge University Press.

Issue

1

Publisher

Cambridge University Press

Usage metrics

Keywords

School of Information Technology

Automated unsupervised authorship analysis using evidence accumulation clustering

History

Journal

Volume

Pagination

Location

ISSN

eISSN

Language

Publication classification

Copyright notice

Issue

Publisher

Usage metrics

Categories

Keywords

Licence

Exports