ABSTRACT
Information extraction (IE) from Visually Rich Documents (VRDs) is a common need for businesses, where extracted information is used for various purposes such as verification, design validation, or compliance. Most of the research in IE from VRDs has focused on textual documents such as invoices and receipts, while extracting information from multi-modal VRDs remains a challenging task. This research presents a novel end-to-end design validation framework for multi-modal VRDs containing textual and visual components, for compliance against a pre-defined set of rules. The proposed Multi-mOdule DESign validaTion (referred to as MoDest) framework constitutes two steps: (i) information extraction using five modules for obtaining the textual and visual components, followed by (ii) validating the extracted components against a pre-defined set of design rules. Given an input multi-modal VRD image, the MoDest framework either accepts or rejects its design while providing an explanation for the decision. The proposed framework is tested for design validation for a particular type of VRDs: banking cards, under the real-world constraint of limited and highly imbalance training data with more than 99% of card designs belonging to one class (accepted). Experimental evaluation on real world images from our in-house dataset demonstrates the effectiveness of the proposed MoDest framework. Analysis drawn from the real-world deployment of the framework further strengthens its utility for design validation.
- Mary Elaine Califf and Raymond J Mooney. 2003. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research 4 (2003), 177–210.Google ScholarDigital Library
- John Canny. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence6 (1986), 679–698.Google ScholarDigital Library
- Adulwit Chinapas, Pattarawit Polpinit, Narong Intiruk, and K Saikaew. 2019. Personal Verification System Using ID Card and Face Photo. International Journal of Machine Learning and Computing 9 (2019), 407–412.Google ScholarCross Ref
- Vincent Poulain d’Andecy, Emmanuel Hartmann, and Marçal Rusinol. 2018. Field extraction by hybrid incremental and a-priori structural templates. In IAPR International Workshop on Document Analysis Systems. 251–256.Google Scholar
- Brian Davis, Bryan Morse, Scott Cohen, Brian Price, and Chris Tensmeyer. 2019. Deep visual template-free form parsing. In International Conference on Document Analysis and Recognition. 134–141.Google ScholarCross Ref
- Christopher G Harris, Mike Stephens, 1988. A combined corner and edge detector.. In Alvey vision conference, Vol. 15. 10–5244.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google Scholar
- Xiaojing Liu, Feiyu Gao, Qiong Zhang, and Huasha Zhao. 2019. Graph Convolution for Multimodal Information Extraction from Visually Rich Documents. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 32–39.Google ScholarCross Ref
- Bodhisattwa Prasad Majumder, Navneet Potti, Sandeep Tata, James Bradley Wendt, Qi Zhao, and Marc Najork. 2020. Representation Learning for Information Extraction from Form-like Documents. In Annual Meeting of the Association for Computational Linguistics. 6495–6504.Google ScholarCross Ref
- F Meyer. 1978. Contrast feature extraction. Quantitative Analysis of Micro-structures in Material Sciences, Biology and Medicine (1978).Google Scholar
- Ann Nosseir and Omar Adel. 2018. Automatic Extraction of Arabic Number from Egyptian ID Cards. In International Conference on Software and Information Engineering. 56–61.Google ScholarDigital Library
- Nobuyuki Otsu. 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1(1979), 62–66.Google ScholarCross Ref
- Ritesh Sarkhel and Arnab Nandi. 2019. Visual segmentation for information extraction from heterogeneous visually rich documents. In International Conference on Management of Data. 247–262.Google ScholarDigital Library
- Ray Smith. 2007. An overview of the Tesseract OCR engine. In International Conference on Document Analysis and Recognition, Vol. 2. 629–633.Google ScholarCross Ref
- Irwin Sobel. 2014. History and definition of the sobel operator. Retrieved from the World Wide Web 1505 (2014).Google Scholar
- Niloofar Tavakolian, Azadeh Nazemi, and Donal Fitzpatrick. 2020. Real-time information retrieval from Identity cards. arXiv preprint arXiv:2003.12103(2020).Google Scholar
- Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. In IEEE Conference on Computer Vision and Pattern Recognition. 5551–5560.Google ScholarCross Ref
Recommendations
Measuring the effectiveness of various design validation approaches for PowerPCTM microprocessor arrays
DATE '98: Proceedings of the conference on Design, automation and test in EuropeAlthough several methods for array design validation have been proposed and had great success in the past, little evidence has been reported for the effectiveness of these methods with respect to the detection of design errors. In this paper, we propose ...
Software design validation tool
International Conference on Reliable SoftwareDECA is a computer program which is used in conjunction with a top-down dominated design methodology. The program organizes, validates, and produces a document depicting the design of a software system. The use of DECA significantly enhances the quality ...
Software design validation tool
Proceedings of the international conference on Reliable softwareDECA is a computer program which is used in conjunction with a top-down dominated design methodology. The program organizes, validates, and produces a document depicting the design of a software system. The use of DECA significantly enhances the quality ...
Comments