ABSTRACT
This research is to develop a technique that recognizes the technical documents as part of purchasing order (PO) exchanged between the owner (buyer) and the supplier (seller) in capital investment, such as maintenance and replacement of equipment, automatically detect specific potential risk contained clauses and shows the comparison results. This research has selected the proof of concept (PoC) technology to (1) the performance guarantee clauses for the purchasing equipment and (2) the delivery schedule requirement clauses to be checked and compared with the utmost cares when reviewing technical documents by the plant owner. The PoC research was implemented based on the Python programming language in conjunction with the spaCy libraries, and further was developed to a cloud-platform-based application for user implementation. This technique preprocesses technical documents of PO in PDF format and, after recognizing and converts into the entire text, detects and extracts the risks-related sentences with logic created by analyzing the patterns of PoC clauses. This research also built a database of all units and formats that can be used in PoC clauses and developed knowledge-based rules to normalize PoC clauses expressed differently in two (buyer's and seller's) and documents. Finally, the result of comparing PoC clauses unified in the same unit and format is output to an Excel or CSV file. Also, these techniques and comparison results were verified through the confusion matrix and accuracy-check. This study is expected to reduce the workload and improve practitioners' productivity in engineering procurement processes for capital investment projects.
- David Brennan. 2020. Process Industry Economics: Principles, Concepts and Applications (2nd ed.). Elsevier.Google Scholar
- Nikil Kumar, Philip Besuner, Steven Lefton, Dwight Agan, and Douglas Hilleman. 2012. Power plant cycling costs (No. NREL/SR-5500-55433). National Renewable Energy Lab.(NREL), Golden, CO (United States).Google Scholar
- Olga Kononova, Tanjin He, HaoyanHuo, Amalie Trewartha, Elsa A. Olivetti, and Gerbrand Ceder.2021. Opportunities and Challenges of Text Mining in Materials Research. iScience 24, no. 3, 102155. https://doi.org/10.1016/j.isci.2021.102155.Google Scholar
- AshwinIttoo, Le Minh Nguyen, and Antal van den Bosch. 2016. Text Analytics in Industry: Challenges, Desiderata and Trends. Computers in Industry 78, 96–107. https://doi.org/10.1016/j.compind.2015.12.001.Google ScholarDigital Library
- Al Omran, Fouad Nasser, and Christoph Treude. 2017. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. In Proceedings of the IEEE/ACM 14th International Conference on Mining Software Repositories (MSR’ 17), Buenos Aires, Argentina. https://doi.org/10.1109/msr.2017.42.Google ScholarDigital Library
- DuyguAltinok. 2021. MasteringspaCy: An end-to-end practical guide to implementing NLP applications using the Python ecosystem. Packt Publishing Ltd.Google Scholar
- Louis Hickman, ThapaStuti, Louis Tay, Mengyang Cao, and Padmini Srinivasan. 2020. Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations. Organizational Research Methods 25, no. 1, 114–146. https://doi.org/10.1177/1094428120971683.Google ScholarCross Ref
- ChalermpolTapsai. 2018. Information Processing and Retrieval from CSV File by Natural Language. In Proceedings of the IEEE 3rd International Conference on Communication and Information Systems (ICCIS’18), Singapore. https://doi.org/10.1109/icomis.2018.8644947.Google Scholar
- Christopher D. Manning, and HinrichSchutze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.Google ScholarDigital Library
- spaCy - Industrial-Strength Natural Language Processing. Explosion. Retrieved December 27, 2021 from https://spacy.io/Google Scholar
- Lingraj Dora, Sanjay Agrawal, Rutuparna Panda, and Ajith Abraham. 2018. Nested Cross-Validation Based Adaptive Sparse Representation Algorithm and Its Application to Pathological Brain Classification. Expert Systems with Applications 114. 313–321. https://doi.org/10.1016/j.eswa.2018.07.039.Google Scholar
- Scott Vanderbeck, Joseph Bockhorst, and Chad Oldfather. 2011. A Machine Learning Approach to Identifying Sections in Legal Briefs. In Proceedings of the 22nd Midwest Artificial Intelligence and Cognitive Science Conference (MAICS’11), Cincinnati, Ohio, USA.Google Scholar
- Sofia Visa, Brian Ramsay, Anca Ralescu and Esther van der Knaap. 2011. Confusion Matrix‐based Feature Selection. In Proceedings of the 22nd Midwest Artificial Intelligence and Cognitive Science Conference (MAICS’11), Cincinnati, Ohio, USA.Google Scholar
- Marina Sokolova, Nathalie Japkowicz, and Stan Szpakowicz. 2006. Beyond Accuracy, F-Score and Roc: A Family of Discriminant Measures for Performance Evaluation. Lecture Notes in Computer Science, 4304, 1015–1021. https://doi.org/10.1007/11941439_114.Google ScholarDigital Library
Recommendations
Information retrieval in technical documents: from the user's query to the information-unit tagging
SIGDOC '03: Proceedings of the 21st annual international conference on DocumentationInformation retrieval systems within voluminous textual documents raise specific problems, such as the choice of the retrieval-unit and the relevance of each response. For the selection of the retrieval-unit, several solutions have been proposed, such ...
Consumer informedness and diverse consumer purchasing behaviors: Traditional mass-market, trading down, and trading out into the long tail
As truly informed consumers are increasingly able to find exactly what they want and willing to pay premium prices to obtain products with perfect fit for them, companies have responded with new product portfolio strategies and new pricing strategies, ...
Proactive and reactive purchasing planning under dependent demand, price, and yield risks
The trend of globalization and outsourcing makes supply unreliable and companies begin to have supplier diversity embedded into their procurement departments. Traditionally, contract suppliers are a major supply channel for many companies, while the ...
Comments