You are not logged in.

Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, "off-the-shelf", parts-of-speech tagging tools

Glass, Kevin and Bangay, Shaun 2007, Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, "off-the-shelf", parts-of-speech tagging tools, in PRASA 2007 : Proceedings of the 18th International Symposium of the Pattern Recognition Association of South Africa, PRASA, Durban, South Africa, pp. 19-24.

Attached Files
Name Description MIMEType Size Downloads

Title Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, "off-the-shelf", parts-of-speech tagging tools
Author(s) Glass, Kevin
Bangay, Shaun
Conference name International Symposium of the Pattern Recognition Association of South Africa (18th : 2007 : Pietermaritzburg, South Africa)
Conference location Pietermaritzburg, South Africa
Conference dates 28-30 Nov. 2007
Title of proceedings PRASA 2007 : Proceedings of the 18th International Symposium of the Pattern Recognition Association of South Africa
Editor(s) Tapamo, J. R.
Nicolls, F.
Publication date 2007
Conference series International Symposium of the Pattern Recognition Association of South Africa
Start page 19
End page 24
Total pages 170
Publisher PRASA
Place of publication Durban, South Africa
Summary This paper evaluates six commonly available parts-of-speech tagging tools over corpora other than those upon which they were originally trained. In particular this investigation measures the performance of the selected tools over varying styles and genres of text without retraining, under the assumption that domain specific training data is not always available. An investigation is performed to determine whether improved results can be achieved by combining the set of tagging tools into ensembles that use voting schemes to determine the best tag for each word. It is found that while accuracy drops due to non-domain specific training, and tag-mapping between corpora, accuracy remains very high, with the support vector machine-based tagger, and the decision tree-based tagger performing best over different corpora. It is also found that an ensemble containing a support vector machine-based tagger, a probabilistic tagger, a decision-tree based tagger and a rule-based tagger produces the largest increase in accuracy and the largest reduction in error across different corpora, using the Precision-Recall voting scheme.
ISBN 9781868406562
Language eng
Field of Research 080107 Natural Language Processing
Socio Economic Objective 890299 Computer Software and Services not elsewhere classified
HERDC Research category E1.1 Full written paper - refereed
Copyright notice ©2007, PRASA
Persistent URL http://hdl.handle.net/10536/DRO/DU:30039198

Document type: Conference Paper
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 198 Abstract Views, 10 File Downloads  -  Detailed Statistics
Created: Mon, 24 Oct 2011, 11:42:57 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.