Skip to main content

Towards Intelligent Processing of Electronic Invoices: The General Framework and Case Study of Short Text Deep Learning in Brazil

  • Conference paper
  • First Online:
Web Information Systems and Technologies (WEBIST 2020, WEBIST 2021)

Abstract

An electronic invoice (E-invoice) is a kind of document that records the transactions of goods or services and then stores and exchanges them electronically. E-invoice is an emerging practice and presents a valuable source of information for many areas. Dealing with these invoices is usually a very challenging task. Information reported is often incomplete or presents mistakes. Before any meaningful treatment of these invoices, it is necessary to evaluate the product represented in each file. This research puts forward a conceptual framework to explain how to apply machine learning technology to extract meaningful information from invoices at different levels of aggregation. Related work in the field is contextualized within a given framework. A study case based on real data from Electronic invoice (NF-e) and Electronic Consumer Invoice (NFC-e) documents in Brazil, related to B2B and retail transactions. We compared traditional term frequency models with the Convolutions sentence classification models. Our experiments show that even if invoice text descriptions are short and there are a lot of errors and typos, simple term frequency models can achieve high baseline results on product code assignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agapito, G., Calabrese, B., Guzzi, P.H., Graziano, S., Cannataro, M.: Association rule mining from large datasets of clinical invoices document. In: Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019, pp. 2232–2238 (2019). https://doi.org/10.1109/BIBM47256.2019.8982934

  2. Bardelli, C., Rondinelli, A., Vecchio, R., Figini, S.: Automatic electronic invoice classification using machine learning models. Mach. Learn. Knowl. Extr. 2(4), 617–629 (2020). https://doi.org/10.3390/make2040033, https://www.mdpi.com/2504-4990/2/4/33

  3. Chang, W.T., Yeh, Y.P., Wu, H.Y., Lin, Y.F., Dinh, T.S., Lian, I.: An automated alarm system for food safety by using electronic invoices. PLoS ONE 15(1), e0228035 (2020). https://doi.org/10.1371/journal.pone.0228035

    Article  Google Scholar 

  4. Cuylen, A., Kosch, L., Breitner, M.H.: Development of a maturity model for electronic invoice processes. Electron. Mark. 26(2), 115–127 (2015). https://doi.org/10.1007/s12525-015-0206-x

    Article  Google Scholar 

  5. Da Rocha, C.C., et al.: SQL query performance on Hadoop: an analysis focused on large databases of Brazilian electronic invoices. In: ICEIS 2018 - Proceedings of the 20th International Conference on Enterprise Information Systems 1(ICEIS), pp. 29–37 (2018). https://doi.org/10.5220/0006690400290037

  6. Enamoto, L., Weigang, L., Filho, G.P.R.: Generic framework for multilingual short text categorization using convolutional neural network. Multimedia Tools Appl. 80(9), 13475–13490 (2021). https://doi.org/10.1007/s11042-020-10314-9

    Article  Google Scholar 

  7. Faruqui, M., Tsvetkov, Y., Rastogi, P., Dyer, C.: Problems with evaluation of word embeddings using word similarity tasks, pp. 30–35 (2016). https://doi.org/10.18653/v1/w16-2506

  8. Feng, Y., Jiang, P., Gu, Z., Dai, Y.: Study of recognition of electronic invoice image. In: 2021 IEEE Information Technology, Networking, Electronic and Automation Control Conference, ITNEC, vol. 5, pp. 1582–1586 (2021). https://doi.org/10.1109/ITNEC52019.2021.9586969

  9. Grida, M., Soliman, H., Hassan, M.: Short text mining: state of the art and research opportunities. J. Comput. Sci. 15(10), 1450–1460 (2019). https://doi.org/10.3844/jcssp.2019.1450.1460

    Article  Google Scholar 

  10. He, Y., Wang, C., Li, N., Zeng, Z.: Attention and memory-augmented networks for dual-view sequential learning. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 125–134 (2020). https://doi.org/10.1145/3394486.3403055

  11. Kieckbusch, D.S., Filho, G.P.R., Oliveira, V.D., Weigang, L.: SCAN-NF: a CNN-based system for the classification of electronic invoices through short-text product description. In: Mayo, F.J.D., Marchiori, M., Filipe, J. (eds.) Proceedings of the 17th International Conference on Web Information Systems and Technologies, WEBIST 2021, 26–28 October 2021, pp. 501–508. SCITEPRESS (2021). https://doi.org/10.5220/0010715200003058

  12. Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2011), pp. 1746–1751 (2014). https://doi.org/10.3115/v1/d14-1181

  13. Marinho, M.C., Di Oliveira, V., Neto, S.A.P.B., Weigang, L., Borges, V.R.P.: Visual analysis of electronic invoices to identify suspicious cases of tax frauds. In: Rocha, Á., Ferrás, C., Méndez Porras, A., Jimenez Delgado, E. (eds.) ICITS 2022. LNNS, vol. 414, pp. 185–195. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96293-7_18

    Chapter  Google Scholar 

  14. Naseem, U., Razzak, I., Musial, K., Imran, M.: Transformer based deep intelligent contextual embedding for Twitter sentiment analysis. Future Gen. Comput. Syst. 113, 58–69 (2020). https://doi.org/10.1016/j.future.2020.06.050

    Article  Google Scholar 

  15. Oliveira, V.D., Chaim, R.M., Weigang, L., Neto, S.A.P.B., Filho, G.P.R.: Towards a smart identification of tax default risk with machine learning. In: Mayo, F.J.D., Marchiori, M., Filipe, J. (eds.) Proceedings of the 17th International Conference on Web Information Systems and Technologies, WEBIST 2021, 26–28 October 2021, pp. 422–429. SCITEPRESS (2021). https://doi.org/10.5220/0010712200003058

  16. Paalman, J., Mullick, S., Zervanou, K., Zhang, Y.: Term based semantic clusters for very short text classification. In: International Conference Recent Advances in Natural Language Processing, RANLP, vol. 2019, pp. 878–887 (2019). https://doi.org/10.26615/978-954-452-056-4_102

  17. Palm, R.B., Laws, F., Winther, O.: Attend, copy, parse end-to-end information extraction from documents. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 329–336 (2019). https://doi.org/10.1109/ICDAR.2019.00060, https://www.scopus.com/inward/record.uri?eid=2-s2.0-85079851980 &doi=10.1109%2FICDAR.2019.00060 &partnerID=40 &md5=29b092a6c8a3c0caf86779867d63d202

  18. Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: 2008 Proceeding of the 17th International Conference on World Wide Web, WWW 2008, pp. 91–99 (2008). https://doi.org/10.1145/1367497.1367510

  19. Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, pp. 377–386 (2006). https://doi.org/10.1145/1135777.1135834

  20. Schulte, J., et al.: ELINAC: autoencoder approach for electronic invoices data clustering. Appl. Sci. 12, 3008 (2022). https://doi.org/10.3390/app12063008

    Article  Google Scholar 

  21. SEFAZ: Manual de Orientação do Contribuinte - Padrões Técnicos de Comunicação. ENCAT (2015)

    Google Scholar 

  22. Tang, P., et al.: Anomaly detection in electronic invoice systems based on machine learning. Inf. Sci. 535, 172–186 (2020). https://doi.org/10.1016/j.ins.2020.03.089

    Article  Google Scholar 

  23. Tang, X., Zhu, Y., Hu, X., Li, P.: An integrated classification model for massive short texts with few words. In: ACM International Conference Proceeding Series, pp. 14–20 (2019). https://doi.org/10.1145/3366715.3366734

  24. Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 2915–2921 (2017). https://doi.org/10.24963/ijcai.2017/406

  25. Yih, W.T., Meek, C.: Improving similarity measures for short segments of text. In: Proceedings of the National Conference on Artificial Intelligence, vol. 2, pp. 1489–1494 (2007)

    Google Scholar 

  26. Yu, J., Qiao, Y., Shu, N., Sun, K., Zhou, S., Yang, J.: Neural network based transaction classification system for chinese transaction behavior analysis. In: Proceedings - 2019 IEEE International Congress on Big Data, BigData Congress 2019 - Part of the 2019 IEEE World Congress on Services, pp. 64–71 (2019). https://doi.org/10.1109/BigDataCongress.2019.00021

  27. Yue, Y., Zhang, Y., Hu, X., Li, P.: Extremely short Chinese text classification method based on bidirectional semantic extension. In: Journal of Physics: Conference Series. vol. 1437 (2020). https://doi.org/10.1088/1742-6596/1437/1/012026

  28. Zhang, H., Dong, B., Feng, B., Yang, F., Xu, B.: Classification of financial tickets using weakly supervised fine-grained networks. IEEE Access 8, 129469–129477 (2020). https://doi.org/10.1109/ACCESS.2020.3007528, https://www.scopus.com/inward/record.uri?eid=2-s2.0-85089215581 &doi=10.1109%2FACCESS.2020.3007528 &partnerID=40 &md5=9fffb4e8a98ac64be2fa28de21f4e632

  29. Zhang, X., LeCun, Y.: Text understanding from scratch (2016). http://arxiv.org/abs/1502.01710

  30. Zhou, M., Hu, X., Zhu, Y., Li, P.: A novel classification method for short texts with few words. In: Proceedings of 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2019, pp. 861–865 (2019). https://doi.org/10.1109/ITNEC.2019.8729520

  31. Zhu, Y., Li, Y., Yue, Y., Qiang, J., Yuan, Y.: A hybrid classification method via character embedding in Chinese short text with few words. IEEE Access 8, 92120–92128 (2020). https://doi.org/10.1109/ACCESS.2020.2994450

    Article  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq) under grant number 309545/2021-8. Thanks to Mr. Sergio Neto and other colleagues from the Department of Economy of the Federal District in Brasilia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Santos Kieckbusch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kieckbusch, D.S., Filho, G.P.R., Di Oliveira, V., Weigang, L. (2023). Towards Intelligent Processing of Electronic Invoices: The General Framework and Case Study of Short Text Deep Learning in Brazil. In: Marchiori, M., Domínguez Mayo, F.J., Filipe, J. (eds) Web Information Systems and Technologies. WEBIST WEBIST 2020 2021. Lecture Notes in Business Information Processing, vol 469. Springer, Cham. https://doi.org/10.1007/978-3-031-24197-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24197-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24196-3

  • Online ISBN: 978-3-031-24197-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics