Skip to main content

A Novel Natural Language Processing Strategy to Improve Digital Accounting Classification Approach for Supplier Invoices ERP Transaction Process

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2023 (ICCSA 2023)

Abstract

Natural language processing (NLP) is a developing field that offers increasing potential to simplify accounting-related tasks. This research studies a novel NLP approach to classify invoice categories based on the invoice text description. The preprocessing steps can be divided into three parts, namely text cleaning, semantic enrichment using the labels as an information source, and text augmentation. A total of 12 different training datasets were prepared based on the raw invoice data, each reflecting an output of a unique combination of the preprocessing steps. Each training dataset was then sent for modelling with one traditional classifier and two deep learning classifiers, namely Linear Support Vector Machine (LSVM), Bi-directional Long Short-Term Memory (Bi-LSTM) and Bidirectional Encoder Representations from Transformers (BERT). Overall, the best approach yielded an improvement of up to 6.7 percentage points (ppts) for accuracy and 20 ppts for macro F1 score. Noise and overfitting were successfully reduced when only English text was retained for modelling. Using label data to semantically enrich invoice text descriptions improved the model’s generalizability. The lexical synonym substitution approach proved more effective in preserving semantics compared to the word embedding approach for short text augmentations. BERT outperformed Bi-LSTM and LSVM and performance improved further with an increase in training data, confirming the superiority of deep learning classifier performance compared to traditional classifiers. Multi-class balancing by lexical-based data augmentation improved the model generalizability, evidenced by a high macro F1 score. This novel discovery contributes to the area of automating invoice text classification, which up until today has remained largely a manual task in practice. The classification approach is well suited to be integrated with other artificial intelligence solutions like Optical Character Recognition (OCR) and Robotic Process Automation (RPA) to form a completely automated invoice processing system. Since invoice classification is a repetitive and non-value-added process, the combination of this novel text classification method with RPA can reduce overhead costs by approximately 90%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sharda, R., Delen, D., Turban, E., Aronson, J. E., Liang, T.-P., King, D.: Business Intelligence, Analytics, and Data Science: A Managerial Perspective (Fourth). Pearson (2018)

    Google Scholar 

  2. Taylor, C. Structured vs Unstructured Data. Datamation. https://www.datamation.com/big-data/structured-vs-unstructured-data/. Accessed 21 May 2021

  3. Guo, L., Shi, F., Tu, J: Textual analysis and machine learning: crack unstructured data in finance and accounting. J. Finance Data Sci. 2(3), 153–170 (2016)

    Google Scholar 

  4. Zhou, Y., Cui, S., Wang, Y.: Machine learning based embedded code multi-label classification. IEEE Access 9, 150187–150200 (2021)

    Article  Google Scholar 

  5. Zhang, Y., Xiong, F., Xie, Y., Fan, X., Gu, H.: The impact of artificial intelligence and blockchain on the accounting profession. IEEE Access 8, 110461–110477 (2020)

    Article  Google Scholar 

  6. Li, L., Feng, Y., Lv, Y., Cong, X., Fu, X., Qi, J.: Automatically detecting peer-to-peer lending intermediary risk - top management team profile textual features perspective. IEEE Access 7, 72551–72560 (2019)

    Article  Google Scholar 

  7. Baviskar, D., Ahirrao, S., Potdar, V., Kotecha, K.: Efficient automated processing of the unstructured documents using artificial intelligence: a systematic literature review and future directions. IEEE Access 9, 72894–72936 (2021)

    Article  Google Scholar 

  8. Korhonen, T., Selos, E., Laine, T., Suomala, P.: Exploring the programmability of management accounting work for increasing automation: an interventionist case study. Acc. Audit. Accountability J. 34(2), 253–280 (2021)

    Article  Google Scholar 

  9. Samant, S.S., Bhanu Murthy, N.L., Malapati, A.: Improving term weighting schemes for short text classification in vector space model. IEEE Access 7, 166578–166592 (2019)

    Article  Google Scholar 

  10. Balakrishnan, V., Shi, Z., Law, C.L., Lim, R., Teh, L.L., Fan, Y.: A deep learning approach in predicting products’ sentiment ratings: a comparative analysis. J. Supercomput. 78(5), 7206–7226 (2021). https://doi.org/10.1007/s11227-021-04169-6

    Article  Google Scholar 

  11. Garcia-Mendez, S., Fernandez-Gavilanes, M., Juncal-Martinez, J., Gonzalez-Castano, F.J., Seara, O.B.: Identifying banking transaction descriptions via support vector machine short-text classification based on a specialized labelled corpus. IEEE Access 8, 61642–61655 (2020)

    Article  Google Scholar 

  12. Mehanna, Y.S., Mahmuddin, M.B.: A semantic conceptualization using tagged bag-of-concepts for sentiment analysis. IEEE Access 9, 118736–118756 (2021)

    Google Scholar 

  13. Subedi, B., Sathishkumar, V.E., Maheshwari, V., Kumar, M.S., Jayagopal, P., Allayear, S.M.: Feature learning-based generative adversarial network data augmentation for class-based few-shot learning. Math. Probl. Eng. 2022, 1–20 (2022)

    Article  Google Scholar 

  14. Xiang, R., Chersoni, E., Lu, Q., Huang, C.R., Li, W., Long, Y.: Lexical data augmentation for sentiment analysis. J. Am. Soc. Inf. Sci. 72(11), 1432–1447 (2021)

    Google Scholar 

  15. Wan, C., Wang, Y., Liu, Y., Ji, J., Feng, G.: Composite feature extraction and selection for text classification. IEEE Access 7, 35208–35219 (2019)

    Article  Google Scholar 

  16. Wang, J., Li, Y., Shan, J., Bao, J., Zong, C., Zhao, L.: Large-scale text classification using scope-based convolutional neural network: a deep learning approach. IEEE Access 7, 171548–171558 (2019)

    Article  Google Scholar 

  17. Luo, J., Bouazizi, M., Ohtsuki, T.: Data augmentation for sentiment analysis using sentence compression-based SeqGAN with data screening. IEEE Access 9, 99922–99931 (2021)

    Article  Google Scholar 

  18. Liu, C.-L., Fink, G.A., Govindaraju, V., Jin, L.: Special issue on deep learning for document analysis and recognition. Int. J. Doc. Anal. Recogn. (IJDAR) 21(3), 159–160 (2018). https://doi.org/10.1007/s10032-018-0310-5

    Article  Google Scholar 

  19. Somayajula, S.A., Song, L., Xie, P.: A multi-level optimization framework for end-to-end text augmentation. Trans. Assoc. Comput. Linguist. 10, 343–358 (2022)

    Article  Google Scholar 

  20. Tan, K.L., Lee, C.P., Lim, K.M., Anbananthen, K.S.M.: Sentiment analysis with ensemble hybrid deep learning model. IEEE Access 10, 103694–103704 (2022)

    Article  Google Scholar 

  21. Yan, C., Chen, Y., Zhou, L.: Differentiated fashion recommendation using knowledge graph and data augmentation. IEEE Access 7, 102239–102248 (2019)

    Article  Google Scholar 

  22. Lee, S., Liu, L., Choi, W.: Iterative translation-based data augmentation method for text classification tasks. IEEE Access 9, 160437–160445 (2021)

    Google Scholar 

  23. El-Alami, F.-Z., El Alaoui, S.O., En Nahnahi, N.: Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization. J. King Saud Univ.-Comput. Inf. Sci. 34(10), 8422–8428 (2022)

    Google Scholar 

  24. Amani, F.A., Fadlalla, A.M.: Data mining applications in accounting: a review of the literature and organizing framework. Int. J. Acc. Inf. Syst. 24, 32–58 (2017)

    Google Scholar 

  25. Sharda, R., Delen, D., Turban, E.: Business Intelligence, Analytics, and Data Science: A Managerial Perspective. Pearson (2017)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by Sunway University and Sunway Business School under Kick Start Grant Scheme (KSGS) NO: GRTIN-KSGS-DBA[S]-02-2022. This work is also part of the Sustainable Business Research Cluster and Research Centre for Human-Machine Collaboration (HUMAC) at Sunway University. We also wish to thank those who have supported this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tiong Yew Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chi, W.W., Tang, T.Y., Salleh, N.M., Hwang, H.J. (2023). A Novel Natural Language Processing Strategy to Improve Digital Accounting Classification Approach for Supplier Invoices ERP Transaction Process. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2023. ICCSA 2023. Lecture Notes in Computer Science, vol 13956 . Springer, Cham. https://doi.org/10.1007/978-3-031-36805-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36805-9_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36804-2

  • Online ISBN: 978-3-031-36805-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics