Deakin University
Browse

File(s) under permanent embargo

Using psycholinguistic features for profiling first language of authors

journal contribution
posted on 2012-06-01, 00:00 authored by R Torney, P Vamplew, John YearwoodJohn Yearwood
This study empirically evaluates the effectiveness of different feature types for the classification of the first language of an author. In particular, it examines the utility of psycholinguistic features, extracted by the Linguistic Inquiry and Word Count (LIWC) tool, that have not previously been applied to the task of author profiling. As LIWC is a tool that has been developed in the psycholinguistic field rather than the computational linguistics field, it was hypothesized that it would be effective, both as a single type feature set because of its psycholinguistic basis, and in combination with other feature sets, because it should be sufficiently different to add insight rather than redundancy. It was found that LIWC features were competitive with previously used feature types in identifying the first language of an author, and that combined feature sets including LIWC features consistently showed better accuracy rates and average F measures than were achieved by the same feature sets without the LIWC features. As a secondary issue, this study also examined how effectively first language classification scaled up to a larger number of possible languages. It was found that the classification scheme scaled up effectively to the entire 16 language collection from the International Corpus of Learner English, when compared with results achieved on just 5 languages in previous research.

History

Journal

Journal of the Association for Information Science and Technology

Volume

63

Issue

6

Pagination

1256 - 1269

Publisher

Wiley-Blackwell

Location

London, Eng.

ISSN

2330-1635

eISSN

1532-2890

Language

eng

Publication classification

C Journal article; C1.1 Refereed article in a scholarly journal

Copyright notice

2012, ASIS&T