A hybrid approach for NER system for Scarce Resourced Language-URDU: Integrating n-gram with rules and gazetteers

Naz, Saeeda; Umar, Arif Iqbal; Razzak, Imran

File(s) under permanent embargo

A hybrid approach for NER system for Scarce Resourced Language-URDU: Integrating n-gram with rules and gazetteers

journal contribution

posted on 2015-10-01, 00:00 authored by Saeeda Naz, Arif Iqbal Umar, Imran RazzakImran Razzak

We present a hybrid NER (Name Entity Recognition) system for Urdu script by integration of n-gram model (unigram and bigram), rules and gazetteers. We used prefix and suffix characters for rule construction instead of first name and last name lists or potential terms on the output list that is produced by n-gram model. Evaluation of the system is performed on two corpora, the IJCNLP NE (Named Entity) corpus and CRL NE corpus in Urdu text. The system achieved 92.65 and 87.6% using hybrid unigram and 92.47 and 86.83% using hybrid bigram on IJCNLP NE corpus and CRL NE corpus, respectively.

History

Journal

Mehran University Research Journal Of Engineering & Technology

Volume

34

Issue

4

Pagination

349 - 358

Publisher

Mehran University of Engineering and Technology

Location

Jamshoro, Pakistan

ISSN

0254-7821

eISSN

2413-7219

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Copyright notice

2015, Mehran University of Engineering & Technology

Usage metrics

Keywords

Entity Recognition Named Entities N-Gram Model Gazetteer Lists Science & Technology Technology Engineering, Multidisciplinary Engineering Linguistics--Study and teaching Arabic language Research--Methodology

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

A hybrid approach for NER system for Scarce Resourced Language-URDU: Integrating n-gram with rules and gazetteers

History

Journal

Volume

Issue

Pagination

Publisher

Location

ISSN

eISSN

Language

Publication classification

Copyright notice

Usage metrics

Categories

Keywords

Licence

Exports