Intelligent automation of invoice parsing using computer vision techniques

Chazhoor, Anisha; Sarobin, Vergin Raja

doi:10.1007/s11042-022-12916-x

Intelligent automation of invoice parsing using computer vision techniques

Published: 04 April 2022

Volume 81, pages 29383–29403, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Anisha Chazhoor¹ &
Vergin Raja Sarobin¹

621 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Manual parsing of invoices is a tedious, arduous and error-prone task. Due to the academic and business importance of this problem, it has attracted the attention of machine learning enthusiasts. There are several complexities and challenges in the automated parsing of invoices. Some of them include a paucity of useful datasets, eclectic template formats, and poor performance of algorithms in real life scenarios. This problem can be solved by the automatic traversal of the invoices by object detection algorithms such as YOLO, SSD and R-CNN. These state-of-the-art algorithms will be trained to detect various fields or entities present in an invoice. In this paper, a dataset of 315 invoices has been generated using web testing tools. The dataset has been annotated for eight entities: billing address, shipping address, invoice date, invoice number, product name, price, quantity, and total amount. The text boxes detected by the models is converted to machine encoded text, using text extraction methods such as Optical Character Recognition (OCR). Hyperparameter tuning has been performed to improve model accuracy. The models have been evaluated on myriad metrics such as mean Average Precision (mAP), common objects in context (COCO) evaluation metrics and total loss during training and validation. The loss vs iteration graph has been visualized using Tensorboard. A front-end application encapsulates all the functions of the research paper and allows testing of various models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural Language Processing

Identifying the most accurate machine learning classification technique to detect network threats

Article Open access 05 March 2024

Artificial Intelligence and Fraud Detection

References

Aslan E, Karakaya T, Unver E, Akgul YS (2015) An optimization approach for invoice image analysis. In: 2015 23nd Signal Processing and Communications Applications Conference (SIU). IEEE, pp 1130–1133
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Chen Z, Wu K, Li Y, Wang M, Li W (2019) SSD-MSN: an improved multi-scale object detection network based on SSD. IEEE Access 7:80622–80632
Article Google Scholar
Conway A (1993) Page grammars and page parsing. a syntactic approach to document layout recognition. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR'93). IEEE, pp 761–764
Du Y, Li J, Zhang N (2020) Enhancement of SSD by fusing feature maps in multiple directions. In: 2020 39th Chinese Control Conference (CCC). IEEE, pp 7294–7298
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Guo H, Qin X, Liu J, Han J, Liu J, Ding E (2019) Eaten: Entity-aware attention for single shot visual text extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 254–259
Huang Y, Yan Q, Li Y, Chen Y, Wang X, Gao L, Tang Z (2019) A YOLO-based table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 813–818
Kanimozhi S, Gayathri G, Mala T (2019) Multiple Real-time object identification using Single shot Multi-Box detection. In: 2019 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE, pp 1–5
Li Z, Xu X, Xie L, Su H (2019) Learning slimming SSD through pruning and knowledge distillation. In: 2019 Chinese Automation Congress (CAC). IEEE, pp 2701–2705
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin F, Zheng X, Wu Q (2020) Small object detection in aerial view based on improved YoloV3 neural network. In: 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA). IEEE, pp 522–525
Liu Y (2018) An improved faster R-CNN for object detection. In: 2018 11th International Symposium on Computational Intelligence and Design (ISCID). IEEE, Vol 2, pp 119–123
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37
Liu B, Zhao W, Sun Q (2017) Study of object detection based on Faster R-CNN. In: 2017 Chinese Automation Congress (CAC). IEEE, pp 6233–6236
Palm RB, Laws F, Winther O (2019) Attend, copy, parse end-to-end information extraction from documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 329–336
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28
Saini R, Dobson D, Morrey J, Liwicki M, Liwicki FS (2019) ICDAR 2019 historical document reading challenge on large structured Chinese family records. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1499–1504
Shuai Q, Wu X (2020) Object detection system based on SSD algorithm. In: 2020 International Conference on Culture-oriented Science & Technology (ICCST). IEEE, pp 141–144
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Takasu A, Satoh S, Katsura E (1995) A rule learning method for academic document image processing. In: Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE, Vol 1, pp 239–242
Wang Q, Chen M, Nie F, Li X (2018) Detecting coherent groups in crowd scenes by multiview clustering. IEEE Trans Pattern Anal Mach Intell 42(1):46–58
Article Google Scholar
Wang Q, Han T, Qin Z, Gao J, Li X (2020) Multitask attention network for lane detection and fitting. IEEE transactions on neural networks and learning systems.
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
Anisha Chazhoor & Vergin Raja Sarobin

Authors

Anisha Chazhoor
View author publications
You can also search for this author in PubMed Google Scholar
Vergin Raja Sarobin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vergin Raja Sarobin.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chazhoor, A., Sarobin, V.R. Intelligent automation of invoice parsing using computer vision techniques. Multimed Tools Appl 81, 29383–29403 (2022). https://doi.org/10.1007/s11042-022-12916-x

Download citation

Received: 14 May 2021
Revised: 26 November 2021
Accepted: 09 March 2022
Published: 04 April 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11042-022-12916-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intelligent automation of invoice parsing using computer vision techniques

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Identifying the most accurate machine learning classification technique to detect network threats

Artificial Intelligence and Fraud Detection

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Intelligent automation of invoice parsing using computer vision techniques

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Identifying the most accurate machine learning classification technique to detect network threats

Artificial Intelligence and Fraud Detection

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation