Deakin University
Browse

File(s) under permanent embargo

A hybrid approach for NER system for Scarce Resourced Language-URDU: Integrating n-gram with rules and gazetteers

journal contribution
posted on 2015-10-01, 00:00 authored by Saeeda Naz, Arif Iqbal Umar, Imran RazzakImran Razzak
We present a hybrid NER (Name Entity Recognition) system for Urdu script by integration of n-gram model (unigram and bigram), rules and gazetteers. We used prefix and suffix characters for rule construction instead of first name and last name lists or potential terms on the output list that is produced by n-gram model. Evaluation of the system is performed on two corpora, the IJCNLP NE (Named Entity) corpus and CRL NE corpus in Urdu text. The system achieved 92.65 and 87.6% using hybrid unigram and 92.47 and 86.83% using hybrid bigram on IJCNLP NE corpus and CRL NE corpus, respectively.

History

Journal

Mehran University Research Journal Of Engineering & Technology

Volume

34

Issue

4

Pagination

349 - 358

Publisher

Mehran University of Engineering and Technology

Location

Jamshoro, Pakistan

ISSN

0254-7821

eISSN

2413-7219

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Copyright notice

2015, Mehran University of Engineering & Technology

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC