An end-to-end text spotter with text relation networks

Jiang, J; Wei, B; Yu, M; Li, Gang; Li, B; Liu, C; Li, M; Huang, W

An end-to-end text spotter with text relation networks

journal contribution

posted on 2021-01-01, 00:00 authored by J Jiang, B Wei, M Yu, Gang LiGang Li, B Li, C Liu, M Li, W Huang

AbstractReading text in images automatically has become an attractive research topic in computer vision. Specifically, end-to-end spotting of scene text has attracted significant research attention, and relatively ideal accuracy has been achieved on several datasets. However, most of the existing works overlooked the semantic connection between the scene text instances, and had limitations in situations such as occlusion, blurring, and unseen characters, which result in some semantic information lost in the text regions. The relevance between texts generally lies in the scene images. From the perspective of cognitive psychology, humans often combine the nearby easy-to-recognize texts to infer the unidentifiable text. In this paper, we propose a novel graph-based method for intermediate semantic features enhancement, called Text Relation Networks. Specifically, we model the co-occurrence relationship of scene texts as a graph. The nodes in the graph represent the text instances in a scene image, and the corresponding semantic features are defined as representations of the nodes. The relative positions between text instances are measured as the weights of edges in the established graph. Then, a convolution operation is performed on the graph to aggregate semantic information and enhance the intermediate features corresponding to text instances. We evaluate the proposed method through comprehensive experiments on several mainstream benchmarks, and get highly competitive results. For example, on the , our method surpasses the previous top works by 2.1% on the word spotting task.

History

Journal

Cybersecurity

Volume

4

Article number

ARTN 7

Pagination

1 - 13

Location

Berlin, Germany

Open access

Yes

Link to full text

http://doi.org/10.1186/s42400-021-00073-x

ISSN

2096-4862

eISSN

2523-3246

Language

English

Publication classification

C1 Refereed article in a scholarly journal

Issue

1

Publisher

SPRINGERNATURE

Usage metrics

Keywords

Science & Technology Technology Computer Science, Information Systems Computer Science, Interdisciplinary Applications Computer Science, Software Engineering Computer Science Scene text spotting Graph convolutional network Visual reasoning 4603 Computer vision and multimedia computation

An end-to-end text spotter with text relation networks

History

Journal

Volume

Article number

Pagination

Location

Open access

Link to full text

ISSN

eISSN

Language

Publication classification

Issue

Publisher

Usage metrics

Categories

Keywords

Licence

Exports