Deakin University
Browse
razzak-noveldataset-2019.pdf (2.88 MB)

A novel dataset for English-Arabic scene text recognition (EASTR)-42K and its evaluation using invariant feature extraction on detected extremal regions

Download (2.88 MB)
journal contribution
posted on 2019-01-01, 00:00 authored by S B Ahmed, S Naz, Imran RazzakImran Razzak, R B Yusof
© 2019 IEEE. The recognition of text in natural scene images is a practical yet challenging task due to the large variations in backgrounds, textures, fonts, and illumination. English as a secondary language is extensively used in Gulf countries along with Arabic script. Therefore, this paper introduces English-Arabic scene text recognition 42K scene text image dataset. The dataset includes text images appeared in English and Arabic scripts while maintaining the prime focus on Arabic script. The dataset can be employed for the evaluation of text segmentation and recognition task. To provide an insight to other researchers, experiments have been carried out on the segmentation and classification of Arabic as well as English text and report error rates like 5.99% and 2.48%, respectively. This paper presents a novel technique by using adapted maximally stable extremal region (MSER) technique and extracts scale-invariant features from MSER detected region. To select discriminant and comprehensive features, the size of invariant features is restricted and considered those specific features which exist in the extremal region. The adapted MDLSTM network is presented to tackle the complexities of cursive scene text. The research on Arabic scene text is in its infancy, thus this paper presents benchmark work in the field of text analysis.

History

Journal

IEEE access

Volume

7

Pagination

19801 - 19820

Publisher

IEEE

Location

Piscataway, N.J.

ISSN

2169-3536

eISSN

2169-3536

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Copyright notice

2019, IEEE

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC