You are not logged in.

Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, “off-the-shelf”, parts-of-speech tagging tools

Glass, Kevin and Bangay, Shaun 2008, Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, “off-the-shelf”, parts-of-speech tagging tools, South african computer journal, vol. 40, pp. 4-10.

Attached Files
Name Description MIMEType Size Downloads

Title Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, “off-the-shelf”, parts-of-speech tagging tools
Author(s) Glass, Kevin
Bangay, Shaun
Journal name South african computer journal
Volume number 40
Start page 4
End page 10
Total pages 7
Publisher Computer Society of South Africa
Place of publication Halfway House, South Africa
Publication date 2008
ISSN 1015-7999
Summary his paper evaluates six commonly available parts-of-speech tagging tools over corpora other than those upon which they were originally trained. In particular this investigation measures the performance of the selected tools over varying styles and genres of text without retraining, under the assumption that domain specific training data is not always available. An investigation is performed to determine whether improved results can be achieved by combining the set of tagging tools into ensembles that use voting schemes to determine the best tag for each word. It is found that while accuracy drops due to non-domain specific training, and tag-mapping between corpora, accuracy remains very high, with the support vector machine-based tagger, and the decision tree-based tagger performing best over different corpora. It is also found that an ensemble containing a support vector machine-based tagger, a probabilistic tagger, a decision-tree based tagger and a rule-based tagger produces the largest increase in accuracy and the largest reduction in error across different corpora, using the Precision-Recall voting scheme.
Language eng
Field of Research 080107 Natural Language Processing
Socio Economic Objective 890299 Computer Software and Services not elsewhere classified
HERDC Research category C1.1 Refereed article in a scholarly journal
Copyright notice ©2008, Computer Society of South Africa
Persistent URL http://hdl.handle.net/10536/DRO/DU:30039228

Document type: Journal Article
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 189 Abstract Views, 3 File Downloads  -  Detailed Statistics
Created: Mon, 24 Oct 2011, 11:44:36 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.