Skip to main content
Log in

Intelligent automation of invoice parsing using computer vision techniques

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Manual parsing of invoices is a tedious, arduous and error-prone task. Due to the academic and business importance of this problem, it has attracted the attention of machine learning enthusiasts. There are several complexities and challenges in the automated parsing of invoices. Some of them include a paucity of useful datasets, eclectic template formats, and poor performance of algorithms in real life scenarios. This problem can be solved by the automatic traversal of the invoices by object detection algorithms such as YOLO, SSD and R-CNN. These state-of-the-art algorithms will be trained to detect various fields or entities present in an invoice. In this paper, a dataset of 315 invoices has been generated using web testing tools. The dataset has been annotated for eight entities: billing address, shipping address, invoice date, invoice number, product name, price, quantity, and total amount. The text boxes detected by the models is converted to machine encoded text, using text extraction methods such as Optical Character Recognition (OCR). Hyperparameter tuning has been performed to improve model accuracy. The models have been evaluated on myriad metrics such as mean Average Precision (mAP), common objects in context (COCO) evaluation metrics and total loss during training and validation. The loss vs iteration graph has been visualized using Tensorboard. A front-end application encapsulates all the functions of the research paper and allows testing of various models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Aslan E, Karakaya T, Unver E, Akgul YS (2015) An optimization approach for invoice image analysis. In: 2015 23nd Signal Processing and Communications Applications Conference (SIU). IEEE, pp 1130–1133

  2. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  3. Chen Z, Wu K, Li Y, Wang M, Li W (2019) SSD-MSN: an improved multi-scale object detection network based on SSD. IEEE Access 7:80622–80632

    Article  Google Scholar 

  4. Conway A (1993) Page grammars and page parsing. a syntactic approach to document layout recognition. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR'93). IEEE, pp 761–764

  5. Du Y, Li J, Zhang N (2020) Enhancement of SSD by fusing feature maps in multiple directions. In: 2020 39th Chinese Control Conference (CCC). IEEE, pp 7294–7298

  6. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  7. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  8. Guo H, Qin X, Liu J, Han J, Liu J, Ding E (2019) Eaten: Entity-aware attention for single shot visual text extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 254–259

  9. Huang Y, Yan Q, Li Y, Chen Y, Wang X, Gao L, Tang Z (2019) A YOLO-based table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 813–818

  10. Kanimozhi S, Gayathri G, Mala T (2019) Multiple Real-time object identification using Single shot Multi-Box detection. In: 2019 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE, pp 1–5

  11. Li Z, Xu X, Xie L, Su H (2019) Learning slimming SSD through pruning and knowledge distillation. In: 2019 Chinese Automation Congress (CAC). IEEE, pp 2701–2705

  12. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  13. Lin F, Zheng X, Wu Q (2020) Small object detection in aerial view based on improved YoloV3 neural network. In: 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA). IEEE, pp 522–525

  14. Liu Y (2018) An improved faster R-CNN for object detection. In: 2018 11th International Symposium on Computational Intelligence and Design (ISCID). IEEE, Vol 2, pp 119–123

  15. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37

  16. Liu B, Zhao W, Sun Q (2017) Study of object detection based on Faster R-CNN. In: 2017 Chinese Automation Congress (CAC). IEEE, pp 6233–6236

  17. Palm RB, Laws F, Winther O (2019) Attend, copy, parse end-to-end information extraction from documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 329–336

  18. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

  19. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  20. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28

  21. Saini R, Dobson D, Morrey J, Liwicki M, Liwicki FS (2019) ICDAR 2019 historical document reading challenge on large structured Chinese family records. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1499–1504

  22. Shuai Q, Wu X (2020) Object detection system based on SSD algorithm. In: 2020 International Conference on Culture-oriented Science & Technology (ICCST). IEEE, pp 141–144

  23. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  24. Takasu A, Satoh S, Katsura E (1995) A rule learning method for academic document image processing. In: Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE, Vol 1, pp 239–242

  25. Wang Q, Chen M, Nie F, Li X (2018) Detecting coherent groups in crowd scenes by multiview clustering. IEEE Trans Pattern Anal Mach Intell 42(1):46–58

    Article  Google Scholar 

  26. Wang Q, Han T, Qin Z, Gao J, Li X (2020) Multitask attention network for lane detection and fitting. IEEE transactions on neural networks and learning systems.

  27. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vergin Raja Sarobin.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chazhoor, A., Sarobin, V.R. Intelligent automation of invoice parsing using computer vision techniques. Multimed Tools Appl 81, 29383–29403 (2022). https://doi.org/10.1007/s11042-022-12916-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12916-x

Keywords

Navigation