skip to main content
10.1145/3543712.3543721acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicctaConference Proceedingsconference-collections
research-article

Automatic Risks Detection and Comparison Techniques for General Conditions of Technical Documents in Purchasing Order

Authors Info & Claims
Published:20 September 2022Publication History

ABSTRACT

This research is to develop a technique that recognizes the technical documents as part of purchasing order (PO) exchanged between the owner (buyer) and the supplier (seller) in capital investment, such as maintenance and replacement of equipment, automatically detect specific potential risk contained clauses and shows the comparison results. This research has selected the proof of concept (PoC) technology to (1) the performance guarantee clauses for the purchasing equipment and (2) the delivery schedule requirement clauses to be checked and compared with the utmost cares when reviewing technical documents by the plant owner. The PoC research was implemented based on the Python programming language in conjunction with the spaCy libraries, and further was developed to a cloud-platform-based application for user implementation. This technique preprocesses technical documents of PO in PDF format and, after recognizing and converts into the entire text, detects and extracts the risks-related sentences with logic created by analyzing the patterns of PoC clauses. This research also built a database of all units and formats that can be used in PoC clauses and developed knowledge-based rules to normalize PoC clauses expressed differently in two (buyer's and seller's) and documents. Finally, the result of comparing PoC clauses unified in the same unit and format is output to an Excel or CSV file. Also, these techniques and comparison results were verified through the confusion matrix and accuracy-check. This study is expected to reduce the workload and improve practitioners' productivity in engineering procurement processes for capital investment projects.

References

  1. David Brennan. 2020. Process Industry Economics: Principles, Concepts and Applications (2nd ed.). Elsevier.Google ScholarGoogle Scholar
  2. Nikil Kumar, Philip Besuner, Steven Lefton, Dwight Agan, and Douglas Hilleman. 2012. Power plant cycling costs (No. NREL/SR-5500-55433). National Renewable Energy Lab.(NREL), Golden, CO (United States).Google ScholarGoogle Scholar
  3. Olga Kononova, Tanjin He, HaoyanHuo, Amalie Trewartha, Elsa A. Olivetti, and Gerbrand Ceder.2021. Opportunities and Challenges of Text Mining in Materials Research. iScience 24, no. 3, 102155. https://doi.org/10.1016/j.isci.2021.102155.Google ScholarGoogle Scholar
  4. AshwinIttoo, Le Minh Nguyen, and Antal van den Bosch. 2016. Text Analytics in Industry: Challenges, Desiderata and Trends. Computers in Industry 78, 96–107. https://doi.org/10.1016/j.compind.2015.12.001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Al Omran, Fouad Nasser, and Christoph Treude. 2017. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. In Proceedings of the IEEE/ACM 14th International Conference on Mining Software Repositories (MSR’ 17), Buenos Aires, Argentina. https://doi.org/10.1109/msr.2017.42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. DuyguAltinok. 2021. MasteringspaCy: An end-to-end practical guide to implementing NLP applications using the Python ecosystem. Packt Publishing Ltd.Google ScholarGoogle Scholar
  7. Louis Hickman, ThapaStuti, Louis Tay, Mengyang Cao, and Padmini Srinivasan. 2020. Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations. Organizational Research Methods 25, no. 1, 114–146. https://doi.org/10.1177/1094428120971683.Google ScholarGoogle ScholarCross RefCross Ref
  8. ChalermpolTapsai. 2018. Information Processing and Retrieval from CSV File by Natural Language. In Proceedings of the IEEE 3rd International Conference on Communication and Information Systems (ICCIS’18), Singapore. https://doi.org/10.1109/icomis.2018.8644947.Google ScholarGoogle Scholar
  9. Christopher D. Manning, and HinrichSchutze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. spaCy - Industrial-Strength Natural Language Processing. Explosion. Retrieved December 27, 2021 from https://spacy.io/Google ScholarGoogle Scholar
  11. Lingraj Dora, Sanjay Agrawal, Rutuparna Panda, and Ajith Abraham. 2018. Nested Cross-Validation Based Adaptive Sparse Representation Algorithm and Its Application to Pathological Brain Classification. Expert Systems with Applications 114. 313–321. https://doi.org/10.1016/j.eswa.2018.07.039.Google ScholarGoogle Scholar
  12. Scott Vanderbeck, Joseph Bockhorst, and Chad Oldfather. 2011. A Machine Learning Approach to Identifying Sections in Legal Briefs. In Proceedings of the 22nd Midwest Artificial Intelligence and Cognitive Science Conference (MAICS’11), Cincinnati, Ohio, USA.Google ScholarGoogle Scholar
  13. Sofia  Visa,  Brian  Ramsay,  Anca  Ralescu  and  Esther  van  der  Knaap. 2011. Confusion Matrix‐based Feature Selection. In Proceedings of the 22nd Midwest Artificial Intelligence and Cognitive Science Conference (MAICS’11), Cincinnati, Ohio, USA.Google ScholarGoogle Scholar
  14. Marina Sokolova, Nathalie Japkowicz, and Stan Szpakowicz. 2006. Beyond Accuracy, F-Score and Roc: A Family of Discriminant Measures for Performance Evaluation. Lecture Notes in Computer Science, 4304, 1015–1021. https://doi.org/10.1007/11941439_114.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICCTA '22: Proceedings of the 2022 8th International Conference on Computer Technology Applications
    May 2022
    286 pages
    ISBN:9781450396226
    DOI:10.1145/3543712

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 20 September 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format