Abstract
Manual parsing of invoices is a tedious, arduous and error-prone task. Due to the academic and business importance of this problem, it has attracted the attention of machine learning enthusiasts. There are several complexities and challenges in the automated parsing of invoices. Some of them include a paucity of useful datasets, eclectic template formats, and poor performance of algorithms in real life scenarios. This problem can be solved by the automatic traversal of the invoices by object detection algorithms such as YOLO, SSD and R-CNN. These state-of-the-art algorithms will be trained to detect various fields or entities present in an invoice. In this paper, a dataset of 315 invoices has been generated using web testing tools. The dataset has been annotated for eight entities: billing address, shipping address, invoice date, invoice number, product name, price, quantity, and total amount. The text boxes detected by the models is converted to machine encoded text, using text extraction methods such as Optical Character Recognition (OCR). Hyperparameter tuning has been performed to improve model accuracy. The models have been evaluated on myriad metrics such as mean Average Precision (mAP), common objects in context (COCO) evaluation metrics and total loss during training and validation. The loss vs iteration graph has been visualized using Tensorboard. A front-end application encapsulates all the functions of the research paper and allows testing of various models.
Similar content being viewed by others
References
Aslan E, Karakaya T, Unver E, Akgul YS (2015) An optimization approach for invoice image analysis. In: 2015 23nd Signal Processing and Communications Applications Conference (SIU). IEEE, pp 1130–1133
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Chen Z, Wu K, Li Y, Wang M, Li W (2019) SSD-MSN: an improved multi-scale object detection network based on SSD. IEEE Access 7:80622–80632
Conway A (1993) Page grammars and page parsing. a syntactic approach to document layout recognition. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR'93). IEEE, pp 761–764
Du Y, Li J, Zhang N (2020) Enhancement of SSD by fusing feature maps in multiple directions. In: 2020 39th Chinese Control Conference (CCC). IEEE, pp 7294–7298
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Guo H, Qin X, Liu J, Han J, Liu J, Ding E (2019) Eaten: Entity-aware attention for single shot visual text extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 254–259
Huang Y, Yan Q, Li Y, Chen Y, Wang X, Gao L, Tang Z (2019) A YOLO-based table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 813–818
Kanimozhi S, Gayathri G, Mala T (2019) Multiple Real-time object identification using Single shot Multi-Box detection. In: 2019 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE, pp 1–5
Li Z, Xu X, Xie L, Su H (2019) Learning slimming SSD through pruning and knowledge distillation. In: 2019 Chinese Automation Congress (CAC). IEEE, pp 2701–2705
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin F, Zheng X, Wu Q (2020) Small object detection in aerial view based on improved YoloV3 neural network. In: 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA). IEEE, pp 522–525
Liu Y (2018) An improved faster R-CNN for object detection. In: 2018 11th International Symposium on Computational Intelligence and Design (ISCID). IEEE, Vol 2, pp 119–123
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37
Liu B, Zhao W, Sun Q (2017) Study of object detection based on Faster R-CNN. In: 2017 Chinese Automation Congress (CAC). IEEE, pp 6233–6236
Palm RB, Laws F, Winther O (2019) Attend, copy, parse end-to-end information extraction from documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 329–336
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28
Saini R, Dobson D, Morrey J, Liwicki M, Liwicki FS (2019) ICDAR 2019 historical document reading challenge on large structured Chinese family records. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1499–1504
Shuai Q, Wu X (2020) Object detection system based on SSD algorithm. In: 2020 International Conference on Culture-oriented Science & Technology (ICCST). IEEE, pp 141–144
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Takasu A, Satoh S, Katsura E (1995) A rule learning method for academic document image processing. In: Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE, Vol 1, pp 239–242
Wang Q, Chen M, Nie F, Li X (2018) Detecting coherent groups in crowd scenes by multiview clustering. IEEE Trans Pattern Anal Mach Intell 42(1):46–58
Wang Q, Han T, Qin Z, Gao J, Li X (2020) Multitask attention network for lane detection and fitting. IEEE transactions on neural networks and learning systems.
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chazhoor, A., Sarobin, V.R. Intelligent automation of invoice parsing using computer vision techniques. Multimed Tools Appl 81, 29383–29403 (2022). https://doi.org/10.1007/s11042-022-12916-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12916-x