Lexicon reduction for Urdu/Arabic script based character recognition: A multilingual OCR

Naz, Saeeda, Umar, Arif Iqbal and Razzak, Muhammad Imran 2016, Lexicon reduction for Urdu/Arabic script based character recognition: A multilingual OCR, Mehran University Research Journal Of Engineering & Technology, vol. 35, no. 2, pp. 209-216.

Attached Files
Name Description MIMEType Size Downloads

Title Lexicon reduction for Urdu/Arabic script based character recognition: A multilingual OCR
Author(s) Naz, Saeeda
Umar, Arif Iqbal
Razzak, Muhammad ImranORCID iD for Razzak, Muhammad Imran orcid.org/0000-0002-3930-6600
Journal name Mehran University Research Journal Of Engineering & Technology
Volume number 35
Issue number 2
Start page 209
End page 216
Total pages 8
Publisher Mehran University of Engineering and Technology
Place of publication Jamshoro, Pakistan
Publication date 2016-04
ISSN 0254-7821
2413-7219
Keyword(s) Science & Technology
Technology
Engineering, Multidisciplinary
Engineering
Urdu Optical Character Recognition
Multilingual Optical Character Recognition
Naskh
Nasta'liq
Optical character recognition devices
Scripting languages (Computer science)
Research--Methodology
Multilingualism
Summary Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using GCT (Ghost Character Theory) concept. Arabic and its sibling script languages share the similar character dataset i.e. the character set are difference in diacritic and writing styles like Naskh or Nasta'liq. Based on the proposed method, the lexicon for Arabic and Arabic script based languages can be minimized approximately up to 20 times. The proposed multilingual Arabic script OCR approach have been evaluated for online Arabic and its derivative language like Urdu using BPNN. The result showed that proposed method helps to not only the reduction of lexicon but also helps to develop the Multilanguage character recognition system for Arabic Script.
Language eng
Indigenous content off
Field of Research 0801 Artificial Intelligence and Image Processing
HERDC Research category C1.1 Refereed article in a scholarly journal
Copyright notice ©2016, Mehran University of Engineering & Technology
Persistent URL http://hdl.handle.net/10536/DRO/DU:30146647

Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 1 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 28 Abstract Views, 0 File Downloads  -  Detailed Statistics
Created: Tue, 12 Jan 2021, 10:24:33 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.