Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks

Naz, S; Umar, A I; Ahmed, R; Razzak, Imran; Rashid, S F; Shafait, F

razzak-urdunastaliq-2016.pdf (2.31 MB)

Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks

journal contribution

posted on 2016-01-01, 00:00 authored by S Naz, A I Umar, R Ahmed, Imran RazzakImran Razzak, S F Rashid, F Shafait

The recognition of Arabic script and its derivatives such as Urdu, Persian, Pashto etc. is a difficult task due to complexity of this script. Particularly, Urdu text recognition is more difficult due to its Nasta’liq writing style. Nasta’liq writing style inherits complex calligraphic nature, which presents major issues to recognition of Urdu text owing to diagonality in writing, high cursiveness, context sensitivity and overlapping of characters. Therefore, the work done for recognition of Arabic script cannot be directly applied to Urdu recognition. We present Multi-dimensional Long Short Term Memory (MDLSTM) Recurrent Neural Networks with an output layer designed for sequence labeling for recognition of printed Urdu text-lines written in the Nasta’liq writing style. Experiments show that MDLSTM attained a recognition accuracy of 98% for the unconstrained Urdu Nasta’liq printed text, which significantly outperforms the state-of-the-art techniques.

History

Journal

SpringerPlus

Volume

5

Article number

2010

Pagination

1 - 16

Publisher

Springer

Location

London, Eng.

Publisher DOI

https://doi.org/10.1186/s40064-016-3442-4

Link to full text

http://www.dx.doi.org/10.1186/s40064-016-3442-4

ISSN

2193-1801

eISSN

2193-1801

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Copyright notice

2016, The Author(s)

Usage metrics

Keywords

Urdu OCR BLSTM MDLSTM CTC Science & Technology Multidisciplinary Sciences Science & Technology - Other Topics

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks

History

Journal

Volume

Article number

Pagination

Publisher

Location

Publisher DOI

Link to full text

ISSN

eISSN

Language

Publication classification

Copyright notice

Usage metrics

Categories

Keywords

Licence

Exports