razzak-urdunastaliq-2016.pdf (2.31 MB)
Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks
journal contribution
posted on 2016-01-01, 00:00 authored by S Naz, A I Umar, R Ahmed, Imran RazzakImran Razzak, S F Rashid, F ShafaitThe recognition of Arabic script and its derivatives such as Urdu, Persian, Pashto etc. is a difficult task due to complexity of this script. Particularly, Urdu text recognition is more difficult due to its Nasta’liq writing style. Nasta’liq writing style inherits complex calligraphic nature, which presents major issues to recognition of Urdu text owing to diagonality in writing, high cursiveness, context sensitivity and overlapping of characters. Therefore, the work done for recognition of Arabic script cannot be directly applied to Urdu recognition. We present Multi-dimensional Long Short Term Memory (MDLSTM) Recurrent Neural Networks with an output layer designed for sequence labeling for recognition of printed Urdu text-lines written in the Nasta’liq writing style. Experiments show that MDLSTM attained a recognition accuracy of 98% for the unconstrained Urdu Nasta’liq printed text, which significantly outperforms the state-of-the-art techniques.
History
Journal
SpringerPlusVolume
5Article number
2010Pagination
1 - 16Publisher
SpringerLocation
London, Eng.Publisher DOI
Link to full text
ISSN
2193-1801eISSN
2193-1801Language
engPublication classification
C1.1 Refereed article in a scholarly journalCopyright notice
2016, The Author(s)Usage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC