skip to main content
10.1145/3430984.3431001acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

MoDest: Multi-module Design Validation for Documents

Published:02 January 2021Publication History

ABSTRACT

Information extraction (IE) from Visually Rich Documents (VRDs) is a common need for businesses, where extracted information is used for various purposes such as verification, design validation, or compliance. Most of the research in IE from VRDs has focused on textual documents such as invoices and receipts, while extracting information from multi-modal VRDs remains a challenging task. This research presents a novel end-to-end design validation framework for multi-modal VRDs containing textual and visual components, for compliance against a pre-defined set of rules. The proposed Multi-mOdule DESign validaTion (referred to as MoDest) framework constitutes two steps: (i) information extraction using five modules for obtaining the textual and visual components, followed by (ii) validating the extracted components against a pre-defined set of design rules. Given an input multi-modal VRD image, the MoDest framework either accepts or rejects its design while providing an explanation for the decision. The proposed framework is tested for design validation for a particular type of VRDs: banking cards, under the real-world constraint of limited and highly imbalance training data with more than 99% of card designs belonging to one class (accepted). Experimental evaluation on real world images from our in-house dataset demonstrates the effectiveness of the proposed MoDest framework. Analysis drawn from the real-world deployment of the framework further strengthens its utility for design validation.

References

  1. Mary Elaine Califf and Raymond J Mooney. 2003. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research 4 (2003), 177–210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. John Canny. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence6 (1986), 679–698.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Adulwit Chinapas, Pattarawit Polpinit, Narong Intiruk, and K Saikaew. 2019. Personal Verification System Using ID Card and Face Photo. International Journal of Machine Learning and Computing 9 (2019), 407–412.Google ScholarGoogle ScholarCross RefCross Ref
  4. Vincent Poulain d’Andecy, Emmanuel Hartmann, and Marçal Rusinol. 2018. Field extraction by hybrid incremental and a-priori structural templates. In IAPR International Workshop on Document Analysis Systems. 251–256.Google ScholarGoogle Scholar
  5. Brian Davis, Bryan Morse, Scott Cohen, Brian Price, and Chris Tensmeyer. 2019. Deep visual template-free form parsing. In International Conference on Document Analysis and Recognition. 134–141.Google ScholarGoogle ScholarCross RefCross Ref
  6. Christopher G Harris, Mike Stephens, 1988. A combined corner and edge detector.. In Alvey vision conference, Vol. 15. 10–5244.Google ScholarGoogle Scholar
  7. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google ScholarGoogle Scholar
  8. Xiaojing Liu, Feiyu Gao, Qiong Zhang, and Huasha Zhao. 2019. Graph Convolution for Multimodal Information Extraction from Visually Rich Documents. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 32–39.Google ScholarGoogle ScholarCross RefCross Ref
  9. Bodhisattwa Prasad Majumder, Navneet Potti, Sandeep Tata, James Bradley Wendt, Qi Zhao, and Marc Najork. 2020. Representation Learning for Information Extraction from Form-like Documents. In Annual Meeting of the Association for Computational Linguistics. 6495–6504.Google ScholarGoogle ScholarCross RefCross Ref
  10. F Meyer. 1978. Contrast feature extraction. Quantitative Analysis of Micro-structures in Material Sciences, Biology and Medicine (1978).Google ScholarGoogle Scholar
  11. Ann Nosseir and Omar Adel. 2018. Automatic Extraction of Arabic Number from Egyptian ID Cards. In International Conference on Software and Information Engineering. 56–61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nobuyuki Otsu. 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1(1979), 62–66.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ritesh Sarkhel and Arnab Nandi. 2019. Visual segmentation for information extraction from heterogeneous visually rich documents. In International Conference on Management of Data. 247–262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ray Smith. 2007. An overview of the Tesseract OCR engine. In International Conference on Document Analysis and Recognition, Vol. 2. 629–633.Google ScholarGoogle ScholarCross RefCross Ref
  15. Irwin Sobel. 2014. History and definition of the sobel operator. Retrieved from the World Wide Web 1505 (2014).Google ScholarGoogle Scholar
  16. Niloofar Tavakolian, Azadeh Nazemi, and Donal Fitzpatrick. 2020. Real-time information retrieval from Identity cards. arXiv preprint arXiv:2003.12103(2020).Google ScholarGoogle Scholar
  17. Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. In IEEE Conference on Computer Vision and Pattern Recognition. 5551–5560.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)
    January 2021
    453 pages

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 2 January 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate197of680submissions,29%
  • Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format