Skip to main content
Log in

Intervention of light convolutional neural network in document survey form processing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Public survey is popular in different domain to have a feedback from service users in the aim to improve the quality of service provided or to figure out what they think. In general, the data processing takes time due of manual intervention and it may distort the result because an error of processing may occur, so the aim of this research is to propose a new finding how to process rapidly a paper survey form to have the statistic result on time. The paper template design must have at least one question with “Yes” and “No” check box or radio button as response field. People have a liberal choice to reply: normal, no response or both. In each box in form can contain a different type of handwriting and the way to recognize them is within mathematics tools combined with neural networks. First of all, the physical paper which contents the information must be converted to image by digitization operation. Any type format of image is accepted then it will be handled by preprocessing module before coming to the Lite Convolutional Neural Network. Once the neural networks finish the treatment, a postprocessing module computes the information and insert into database. The singularity of this technique is the intervention of Lite Convolutional Neural Network as recognition tool which has more advantage compared with standard deep learning in terms of computation time, accuracy/precision and number of parameters. The validation of this approach is done with RVL-CDIP dataset, real survey of mobile money and satellite TV service, the accuracy reaches 96%, 97% and 98% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Data Availability

Data is contained within the article.

References

  1. Afzal MZ, Capobianco S, Malik MI, Marinai S, Breuel TM, Dengel A, Liwicki M (2015) Deepdocclassifier: Document classification with deep convolutional neural network. 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 1111–1115 (2015). https://doi.org/10.1109/ICDAR.2015.7333933

  2. Aldoski J (2022) Image classification accuracy assessment, Thesis at Bangor University, New York

  3. Appalaraju S, Jasani B, Kota BU, Xie Y, Manmatha R (2021) Docformer: End-toend transformer for document understanding. IEEE/CVF International Conference on Computer Vision (ICCV). pp. 993–1003 (October 2021)

  4. Asgher U, Khalil K, Jawad M, Riaz A, Butt S, Ayaz Y, Naseer N, Nazir S (2020) Enhanced Accuracy for Multiclass Mental Workload Detection Using Long Short-Term Memory for Brain-Computer Interface. Front Neurosci 14:584. https://doi.org/10.3389/fnins.00584

    Article  Google Scholar 

  5. Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. IEEE/CVF International Conference on Computer Vision (ICCV)

  6. Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 9357–9366 (2019). https://doi.org/10.1109/CVPR.2019.00959

  7. Bunke H, Patrick W, Debashish N, Sargur S, Venu G (1997). Analysis of Printed Forms. https://doi.org/10.1142/9789812830968_0018

    Article  Google Scholar 

  8. Casey RG, Ferguson DR, Mohiuddin KM, Walach E (2007) Intelligent forms processing system. Mach Vis Appl 5:143–155

    Article  Google Scholar 

  9. Chen JL, Lee HJ (1998) An efficient algorithm for form structure extraction using strip projection. Pattern Recognit 31(9):1353–1368

    Article  Google Scholar 

  10. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (Jun 2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423

  11. Goutte C, Gaussier E (2005) A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation. Lect Notes Comput Sci 3408:345–359. https://doi.org/10.1007/978-3-540-31865-1_25

    Article  Google Scholar 

  12. Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  13. Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 2015, pp. 991–995. https://doi.org/10.1109/ICDAR.2015.7333910

  14. Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 991–995 (2015). https://doi.org/10.1109/ICDAR.2015.7333910

  15. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016 pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

  16. Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision – ECCV 2014. Springer International Publishing, Cham, pp 497–511

    Chapter  Google Scholar 

  17. Hwang W, Kim S, Yim J, Seo M, Park S, Park S, Lee J, Lee B, Lee H (2019) Post-ocr parsing: building simple and robust parser via bio tagging. Document Intelligence at NeurIPS, 2019. https://www.semanticscholar.org/paper/Post-OCR-parsing%3A-building-simple-and-robust-parser-Hwang-Kim/da5d93e2931c12b81774a6857db0175875fdf71a

  18. Hwang W, Lee H, Yim J, Kim G, Seo M (2021) Cost-effective end-to-end information extraction for semi-structured document images. Conference on Empirical Methods in Natural Language Processing. pp. 3375–3383. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (Nov 2021). https://doi.org/10.18653/v1/2021.emnlp-main.271, https://aclanthology.org/2021.emnlp-main.271

  19. Hwang W, Yim J, Park S, Yang S, Seo M (2021) Spatial dependency parsing for semi-structured document information extraction. Association for Computational Linguistics: ACL-IJCNLP. pp. 330–343. Association for Computational Linguistics, Online (Aug 2021). https://doi.org/10.18653/v1/2021.emnlp-main.271, https://aclanthology.org/2021.emnlp-main.271

  20. Hwang W, Yim J, Park S, Yang S, Seo M (2021) Spatial dependency parsing for semi-structured document information extraction. Association for Computational Linguistics: ACL-IJCNLP 2021. pp. 330–343. Association for Computational Linguistics, Online (Aug 2021). https://doi.org/10.18653/v1/2021.findings-acl.28, https://aclanthology.org/2021.findings-acl.28

  21. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. Workshop on Deep Learning, NIPS

  22. Kang L, Kumar J, Ye P, Li Y, Doermann DS (2014) Convolutional neural networks for document image classification. 22nd International Conference on Pattern Recognition pp. 3168–3172 (2014)

  23. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) Icdar.(2015). Competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 1156–1160. https://doi.org/10.1109/ICDAR.2015.7333942 13.

  24. Kastrati Z, Imran AS, Yayilgan SY (2019) The impact of deep learning on document classification using semantically rich representations. Inf Process Manage 56(5):1618–1632

    Article  Google Scholar 

  25. Kathait S (2018) Tiwari S (2018) Application of Image Processing and Convolution Networks in Intelligent Character Recognition for Digitized Forms Processing. Int J Comput Appl 179:7–13. https://doi.org/10.5120/ijca2018915460

    Article  Google Scholar 

  26. Kim G et al (2022) OCR-Free Document Understanding Transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13688. Springer, Cham. https://doi.org/10.1007/978-3-031-19815-1_29

  27. Kingma DP, Ba J (2015) Adam. A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015), http://arxiv.org/abs/1412.6980

  28. Li X, Doermann D, Oh W, Gao W (1999) A Robust Method for Unknown Forms Analysis," Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318), Bangalore, India, 1999, pp. 531-534, doi: 10.1109/ICDAR.1999.791842.

  29. Li P, Gu J, Kuen J, Morariu VI, Zhao H, Jain R, Manjunatha V, Liu H (2021) Selfdoc: Self-supervised document representation learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5648–5656 (2021). https://doi.org/10.1109/CVPR46437.2021.00560

  30. Li X, Yan H, Xie W, Kang L, Tian Y (2020) An Improved Pulse-Coupled Neural Network Model for Pansharpening. Sensors (Basel, Switzerland) 20(10):2764. https://doi.org/10.3390/s20102764

    Article  Google Scholar 

  31. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. AAAI Conference on Artificial Intelligence 31(1) (Feb 2017). https://doi.org/10.1609/aaai.v31i1.11196, https://ojs.aaai.org/index.php/AAAI/article/view/11196

  32. Lindblad T, Kinser JM (2005) Image Processing Using Pulse-Coupled Neural Networks, 2nd edn. Springer, Berlin Heidelberg New York, pp 11–23

    MATH  Google Scholar 

  33. Liu W, Chen C, Wong KYK, Su Z, Han J (2016) Star-net: A spatial attention residue network for scene text recognition. In: Richard C. Wilson, E.R.H., Smith, W.A.P. (eds.). British Machine Vision Conference (BMVC). pp. 43.1–43.13. BMVA Press (September 2016). https://doi.org/10.5244/C.30.43, https://doi.org/10.5244/C.30.43

  34. Mahgoub A, Ebeid A, Abdel B, Hossam ED, El-Badawy, El-Sayed (2008) An intersecting cortical model based framework for human face recognition. J Systemics Cybern Inform 6. https://www.researchgate.net/publication/253933623_An_Intersecting_Cortical_Model_Based_Framework_for_Human_Face_Recognition/citation/download

  35. Mahum R, Irtaza A, Nawaz M, Nazir T, Masood M, Mehmood A (2021) A generic framework for Generation of Summarized Video Clips using Transfer Learning (SumVClip). 1–8. https://doi.org/10.1109/MAJICC53071.2021.9526264

  36. Majumder BP, Potti N, Tata S, Wendt JB, Zhao Q, Najork M (2020) Representation learning for information extraction from form-like documents. Association for Computational Linguistics. pp. 6495–6504. Association for Computational Linguistics, Online (Jul 2020). https://doi.org/10.18653/v1/2020.acl-main.580, https://www.aclweb.org/anthology/2020.acl-main.580

  37. Mathew M, Karatzas D, Jawahar C (2021) Docvqa: A dataset for vqa on document images. IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 2200–2209 (2021)

  38. Mondal, Ajoy, Jawahar, C (2022) Deep Neural Features for Document Image Analysis. https://doi.org/10.21203/rs.3.rs-1576151/v1.

  39. Monica MS, Melisa, Sarat KS (2014) Pulse Coupled Neural Networks and its Applications. Expert Systems with Applications. Volume 41, Issue 8, pp 3965-3974. https://doi.org/10.1016/j.eswa.2013.12.027

  40. Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. Proceedings of the IEEE International Conference on Computer Vision (ICCV) (December 2013)

  41. Rafidison MA, Ramafiarisona HM (2021) Modified Convolutional Neural Network For Ariary Banknotes Authentication. Int J Innov Eng Res Technol 8(1):62–69

    Google Scholar 

  42. Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39:2298–2304

    Article  Google Scholar 

  43. Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4168–4176 (2016). https://doi.org/10.1109/CVPR.2016.452

  44. Sruthi PS (2015) Grid infrastructure based intelligent character recognition: a novel algorithm for extraction of handwritten and typewritten characters using neural networks. https://www.semanticscholar.org/paper/Grid-Infrastructure-Based-Intelligent-Character-ASruthi/98a951a37bdd94fe0a2707bd3a59b6cd3e8ba5a0#citing-papers

  45. Stéphane T, Christophe J (2018) L’IFOP. https://www.ifop.com/qui-sommes-nous/

  46. Tanaka M, Watanabe T, Baba Y, Kurita T, Mishima T (1999) Autonomous foveating system and integration of the foveated images. IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), 1999, pp. 559–564 vol.1. https://doi.org/10.1109/ICSMC.1999.814153

  47. Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. pp. 56–72. Springer International Publishing, Cham (2016)

  48. Ulf E, Jason M.K, Jenny A, Nils Z (2004) The intersecting cortical model in image processing. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Volume 525, Issues 1–2, Pages 392–396, ISSN 0168–9002. https://doi.org/10.1016/j.nima.2004.03.102

  49. Wang Z et al (2014) Plant recognition based on intersecting cortical model. International Joint Conference on Neural Networks (IJCNN), pp. 975–980. https://doi.org/10.1109/IJCNN.2014.6889656

  50. Wang J, Hu X (2017) Gated recurrent convolution neural network for ocr. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017), https://proceedings.neurips.cc/paper/2017/file/c24cd76e1ce41366a4bbe8a49b02a028-Paper.pdf

  51. Wang D, Srihari SN (1994) Analysis of Form Images. Int J Pattern Recognit Artif Intell 8:1031–1052

    Article  Google Scholar 

  52. Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M (2020) Layoutlm: Pre-training of text and layout for document image understanding. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. p. 1192–1200. KDD ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394486.3403172

  53. Xu Y, Xu Y, Lv T, Cui L, Wei F, Wang G, Lu Y, Florencio D, Zhang C, Che W, Zhang M, Zhou L (2021) LayoutLMv2: Multi-modal pre-training for visually-rich document understanding. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 2579–2591. Association for Computational Linguistics, Online (Aug 2021). https://doi.org/10.18653/v1/2021.acl-long.201, https://aclanthology.org/2021.acl-long.201

  54. Xu Y, Xu Y, Lv T, Cui L, Wei F, Wang G, Lu Y, Florencio D, Zhang C, Che W, Zhang M, Zhou L (2021) LayoutLMv2: Multi-modal pre-training for visually-rich document understanding. Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 2579–2591. Association for Computational Linguistics, Online (Aug 2021). https://doi.org/10.18653/v1/2021.acl-long.201, https://aclanthology.org/2021.acl-long.201

  55. Yide M, Kun Z, Zhaobin W (2010) Application of Pulse Coupled Neural Networks. DOI: https://doi.org/10.1007/978-3-642-13745-7. Publisher: Springer Berlin, Heidelberg. eBook Packages: Computer Science, Computer Science (R0).

  56. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4159–4167

  57. Zhao Z, Tian X, Guo B (2002) A study on printed form processing and reconstruction. Proceedings. International Conference on Machine Learning and Cybernetics, 2002, pp. 1730–1732 vol.4. https://doi.org/10.1109/ICMLC.2002.1175332.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. A. Rafidison.

Ethics declarations

Conflict of Interest

Authors declare no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rafidison, M.A., Rakotomihamina, A.H., Rajaonarison, F.T.M. et al. Intervention of light convolutional neural network in document survey form processing. Multimed Tools Appl 82, 32583–32605 (2023). https://doi.org/10.1007/s11042-023-16076-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16076-4

Keywords

Navigation