File(s) under permanent embargo
A hybrid approach for NER system for Scarce Resourced Language-URDU: Integrating n-gram with rules and gazetteers
journal contribution
posted on 2015-10-01, 00:00 authored by Saeeda Naz, Arif Iqbal Umar, Imran RazzakImran RazzakWe present a hybrid NER (Name Entity Recognition) system for Urdu script by integration of n-gram model (unigram and bigram), rules and gazetteers. We used prefix and suffix characters for rule construction instead of first name and last name lists or potential terms on the output list that is produced by n-gram model. Evaluation of the system is performed on two corpora, the IJCNLP NE (Named Entity) corpus and CRL NE corpus in Urdu text. The system achieved 92.65 and 87.6% using hybrid unigram and 92.47 and 86.83% using hybrid bigram on IJCNLP NE corpus and CRL NE corpus, respectively.
History
Journal
Mehran University Research Journal Of Engineering & TechnologyVolume
34Issue
4Pagination
349 - 358Publisher
Mehran University of Engineering and TechnologyLocation
Jamshoro, PakistanISSN
0254-7821eISSN
2413-7219Language
engPublication classification
C1.1 Refereed article in a scholarly journalCopyright notice
2015, Mehran University of Engineering & TechnologyUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC