A hybrid approach for NER system for Scarce Resourced Language-URDU: Integrating n-gram with rules and gazetteers
journal contribution
posted on 2015-10-01, 00:00authored bySaeeda Naz, Arif Iqbal Umar, Imran Razzak
We present a hybrid NER (Name Entity Recognition) system for Urdu script by integration of n-gram model (unigram and bigram), rules and gazetteers. We used prefix and suffix characters for rule construction instead of first name and last name lists or potential terms on the output list that is produced by n-gram model. Evaluation of the system is performed on two corpora, the IJCNLP NE (Named Entity) corpus and CRL NE corpus in Urdu text. The system achieved 92.65 and 87.6% using hybrid unigram and 92.47 and 86.83% using hybrid bigram on IJCNLP NE corpus and CRL NE corpus, respectively.
History
Journal
Mehran University Research Journal Of Engineering & Technology
Volume
34
Pagination
349-358
Location
Jamshoro, Pakistan
ISSN
0254-7821
eISSN
2413-7219
Language
eng
Publication classification
C1.1 Refereed article in a scholarly journal
Copyright notice
2015, Mehran University of Engineering & Technology