Deakin University
Browse

File(s) under permanent embargo

Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, 'off-the-shelf', parts-of-speech tagging tools

conference contribution
posted on 2007-01-01, 00:00 authored by K Glass, Shaun BangayShaun Bangay
This paper evaluates six commonly available parts-of-speech tagging tools over corpora other than those upon which they were originally trained. In particular this investigation measures the performance of the selected tools over varying styles and genres of text without retraining, under the assumption that domain specific training data is not always available. An investigation is performed to determine whether improved results can be achieved by combining the set of tagging tools into ensembles that use voting schemes to determine the best tag for each word. It is found that while accuracy drops due to non-domain specific training, and tag-mapping between corpora, accuracy remains very high, with the support vector machine-based tagger, and the decision tree-based tagger performing best over different corpora. It is also found that an ensemble containing a support vector machine-based tagger, a probabilistic tagger, a decision-tree based tagger and a rule-based tagger produces the largest increase in accuracy and the largest reduction in error across different corpora, using the Precision-Recall voting scheme.

History

Event

International Symposium of the Pattern Recognition Association of South Africa (18th : 2007 : Pietermaritzburg, South Africa)

Pagination

19 - 24

Publisher

PRASA

Location

Pietermaritzburg, South Africa

Place of publication

Durban, South Africa

Start date

2007-11-28

End date

2007-11-30

ISBN-13

9781868406562

Language

eng

Publication classification

E1.1 Full written paper - refereed

Copyright notice

2007, PRASA

Editor/Contributor(s)

J Tapamo, F Nicolls

Title of proceedings

PRASA 2007 : Proceedings of the 18th International Symposium of the Pattern Recognition Association of South Africa

Usage metrics

    Research Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC