skip to main content
10.1145/3570991.3571010acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
short-paper

Designing a Vision Transformer based Enhanced Text Extractor for Product Images

Published: 04 January 2023 Publication History

Abstract

Product images, such as those which appear in e-commerce sites, exhibit unique characteristics that are typically not present in natural images. The primary distinguishing characteristic is the presence of text (e.g., brand names, price, constituents) along with high local entropy (i.e., too much visual information in the form of both text and brightly coloured pictures condensed in a small region). Extracting the text from these images may have multiple benefits: catalogue enrichment, product matching, offensive content identification, and more. However, the images are sometimes unclear and blurry where it is difficult to recognise the text even with human perception, and these texts are often written in non-standard fonts (at times each character in a word has a different colour and/or style), or are oriented at odd angles or appear on curved surfaces; moreover, many of these words such as, the brand names, do not appear in dictionaries. In this work, we present a vision transformer based text extractor that can handle the aforementioned challenges for product images effectively, and outperforms our earlier model considerably. We further compare our new end-to-end text extraction solution with those of Google and Azure text extraction cloud offerings, and showcase its efficacy both in terms of accuracy and latency.

References

[1]
Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis. In ICCV. 4714–4722.
[2]
Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. 2019. Character Region Awareness for Text Detection. In CVPR. 9365–9374.
[3]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR abs/2010.11929(2020).
[4]
Pranay Dugar, Rajesh Shreedhar Bhat, Asit Sharad Tarsode, Uddipto Dutta, Kunal Banerjee, Anirban Chatterjee, and Vijay Srinivas Agneeswaran. 2021. From Pixels to Words: A Scalable Journey of Text Information from Product Images to Retail Catalog. In CIKM. 3787–3795.
[5]
Pranay Dugar, Aditya Vikram, Anirban Chatterjee, Kunal Banerjee, and Vijay Agneeswaran. 2022. Don’t Miss the Fine Print! An Enhanced Framework to Extract Text from Low Resolution Images. In VISIGRAPP (5: VISAPP). 664–671.
[6]
Shancheng Fang, Hongtao Xie, Yuxin Wang, Zhendong Mao, and Yongdong Zhang. 2021. Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition. In CVPR. 7098–7107.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770–778.
[8]
Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2016. Reading Text in the Wild with Convolutional Neural Networks. Int. J. Comput. Vis. 116, 1 (2016), 1–20.
[9]
Wei Liu, Chaofeng Chen, and Kwan-Yee K Wong.2018. A character-aware neural network for distorted scene text recognition. In AAAI.
[10]
Wei Liu, Chaofeng Chen, Kwan-Yee K Wong, Zhizhong Su, and Junyu Han. 2016. Star-net: A spatial attention residue network for scene text recognition. In BMVC, Vol. 2.
[11]
Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. 2021. Do Vision Transformers See Like Convolutional Neural Networks?. In NeurIPS. 12116–12128.
[12]
Baoguang Shi, Xiang Bai, and Cong Yao. 2017. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In TPAMI, Vol. 39. 2298–2304.
[13]
Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2016. Robust scene text recognition with automatic rectification. In CVPR. 4168–4176.
[14]
Vibhuti Vasisth and Nishtha Das. 2020. India: Country Of Origin To Be Specified On E-Commerce Websites For Product Listings. https://www.mondaq.com/india/international-trade-investment/968240/country-of-origin-to-be-specified-on-e-commerce-websites-for-product-listings. Accessed: 2021-07-08.

Index Terms

  1. Designing a Vision Transformer based Enhanced Text Extractor for Product Images

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)
    January 2023
    357 pages
    ISBN:9781450397971
    DOI:10.1145/3570991
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 January 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. computer vision
    2. supervised learning
    3. text extraction
    4. vision transformer

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Conference

    CODS-COMAD 2023

    Acceptance Rates

    Overall Acceptance Rate 197 of 680 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 107
      Total Downloads
    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media