Abstract
Each business must submit an annual sustainability report to assess its influence on the economy, environment, society, and human rights. In the sustainability report, fuel, water, heating, waste disposal, materials, and electricity use are calculated annually. To do so, assess and record all invoices for the aforementioned categories. Classifying and processing a large quantity of unstructured documents to extract information is needed. Manually or using machine learning methods. This paper aims to investigate different approaches using machine learning algorithms to allocate each submitted invoice to a sustainable category and extract the required information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dutta Baruah, I: Mastering clustering with a segmentation problem (2021). https://www.kdnuggets.com/2021/08/mastering-clustering-segmentation-problem.html. Accessed 6 Jan 2022
Amfori.org: Sustainability Disclosure Becoming the Norm (2021). https://www.amfori.org/news/sustainability-disclosure-becoming-norm. Accessed 7 July 2022
Cataltepe, Z., Aygun, E.: An improvement of centroid-based classification algorithm for text classification. In: 2007 IEEE 23rd International Conference on Data Engineering Workshop, pp. 952–956 (2007)
Wikipedia.org: Ward’s method (2020). https://en.wikipedia.org/wiki/Ward%27s_method. Accessed 5 Jul 2022
Yoganand, C.S., Praveen, N., Saranya, N., Ganesh Karthikeyan, V.: Survey on document classification based on keyword and key phrase extraction using various algorithms. Int. J. Eng. Res. Technol. (IJERT), 3(2) (2014)
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey. Information 10(4), 150 (2019). https://doi.org/10.3390/info10040150
Patel, C., Patel, A., Patel, D.: Optical character recognition by open source OCR tool tesseract: a case study. Int. J. Comput. Appl. 55(10) (2012)
Scoones, I.: Sustainability. Dev. Pract. 17(4–5), 589–596 (2007). https://doi.org/10.1080/09614520701469609. November 2010
du Toit, J.: Bayesian Gaussian mixture models (without the math) using Infer.NET (2020). https://towardsdatascience.com/bayesian-gaussian-mixture-models-without-the-math-using-infer-net-7767bb7494a0. Accessed 8 July 2022
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vikrant, Mim, S.S., Logofatu, D. (2023). Efficient Approaches to Categorize Unstructured Documents into Sustainable Categories by Using Machine Learning. In: Braubach, L., Jander, K., Bădică, C. (eds) Intelligent Distributed Computing XV. IDC 2022. Studies in Computational Intelligence, vol 1089. Springer, Cham. https://doi.org/10.1007/978-3-031-29104-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-29104-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29103-6
Online ISBN: 978-3-031-29104-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)