Skip to main content

Efficient Approaches to Categorize Unstructured Documents into Sustainable Categories by Using Machine Learning

  • Conference paper
  • First Online:
Intelligent Distributed Computing XV (IDC 2022)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1089))

Included in the following conference series:

  • 178 Accesses

Abstract

Each business must submit an annual sustainability report to assess its influence on the economy, environment, society, and human rights. In the sustainability report, fuel, water, heating, waste disposal, materials, and electricity use are calculated annually. To do so, assess and record all invoices for the aforementioned categories. Classifying and processing a large quantity of unstructured documents to extract information is needed. Manually or using machine learning methods. This paper aims to investigate different approaches using machine learning algorithms to allocate each submitted invoice to a sustainable category and extract the required information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dutta Baruah, I: Mastering clustering with a segmentation problem (2021). https://www.kdnuggets.com/2021/08/mastering-clustering-segmentation-problem.html. Accessed 6 Jan 2022

  2. Amfori.org: Sustainability Disclosure Becoming the Norm (2021). https://www.amfori.org/news/sustainability-disclosure-becoming-norm. Accessed 7 July 2022

  3. Cataltepe, Z., Aygun, E.: An improvement of centroid-based classification algorithm for text classification. In: 2007 IEEE 23rd International Conference on Data Engineering Workshop, pp. 952–956 (2007)

    Google Scholar 

  4. Wikipedia.org: Ward’s method (2020). https://en.wikipedia.org/wiki/Ward%27s_method. Accessed 5 Jul 2022

  5. Yoganand, C.S., Praveen, N., Saranya, N., Ganesh Karthikeyan, V.: Survey on document classification based on keyword and key phrase extraction using various algorithms. Int. J. Eng. Res. Technol. (IJERT), 3(2) (2014)

    Google Scholar 

  6. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey. Information 10(4), 150 (2019). https://doi.org/10.3390/info10040150

    Article  Google Scholar 

  7. Patel, C., Patel, A., Patel, D.: Optical character recognition by open source OCR tool tesseract: a case study. Int. J. Comput. Appl. 55(10) (2012)

    Google Scholar 

  8. Scoones, I.: Sustainability. Dev. Pract. 17(4–5), 589–596 (2007). https://doi.org/10.1080/09614520701469609. November 2010

    Article  Google Scholar 

  9. du Toit, J.: Bayesian Gaussian mixture models (without the math) using Infer.NET (2020). https://towardsdatascience.com/bayesian-gaussian-mixture-models-without-the-math-using-infer-net-7767bb7494a0. Accessed 8 July 2022

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Vikrant or Doina Logofatu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vikrant, Mim, S.S., Logofatu, D. (2023). Efficient Approaches to Categorize Unstructured Documents into Sustainable Categories by Using Machine Learning. In: Braubach, L., Jander, K., Bădică, C. (eds) Intelligent Distributed Computing XV. IDC 2022. Studies in Computational Intelligence, vol 1089. Springer, Cham. https://doi.org/10.1007/978-3-031-29104-3_21

Download citation

Publish with us

Policies and ethics