Abstract
This paper presents the results of developing and evaluating an automatic approach that identifies causality boundaries from causality expressions. This approach focuses on explicitly expressed causalities extracted from Root Cause Analysis (RCA) reports in engineering domains. Causality expressions contain Cause and Effect pairs and multiple expressions can occur in a single sentence. Causality boundaries are semantically annotated text fragments explicitly indicating which parts of a fragment denote Causes and corresponding Effects. To identify these, linguistic analysis using natural language processing (NLP) is required. Current off-the-shelf NLP tools are mostly developed based on the language models of general-purpose texts, e.g. newspapers. The lack of portability of these tools to engineering domains has been identified as a barrier to achieving comparable analysis accuracy in new domains. One of the reasons for this is the rare and unpredictable behaviours of certain words in closed domains. Ill-formed sentences, abbreviations and capitalization of common words also contribute to the difficulty. The proposed approach addresses this problem by using a probability-based method that learns the probability distribution of the boundaries not only from the NLP analysis but also from the local contexts that exploit language conventions occurred in the RCA reports. Using a collection of RCA reports obtained from an aerospace company, a test showed that the proposed approach achieved 86% accuracy outperforming a baseline approach that relied only on the NLP analysis.
Similar content being viewed by others
References
Allen, J. (1987). Natural language understanding. Benjamin/Cummings Publishing Company, Inc.
Bruseberg, A., & Johnson, P. (2003). Understanding human error in context: Approaches to support interaction design using air accident reports. In 12th International Symposium on Aviation Psychology USA, pp. 166–171.
Buyko, E., Wermter, J., Poprat M., & Hahn U. (2006). Automatically adapting an NLP core engine to the biology domain. In Proceedings of the BioLink & Bio-Ontologies, SIG Meeting 2006 Brazil.
Chang D. and Choi K. (2006). Incremental cue phrase learning and bootstrapping method for causality extraction using cue phrase and word pair probabilities. Information Processing and Management 42(3): 662–678
Cole, S. V., Roya, M. D., Valtorta, M. G., & Huhns, M. N. (2006). A lightweight tool for automatically extracting causal relations from text. IEEE Xplore, 31 March 2005-2 April 2005, 125–129.
Forman G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3: 1289–1305
Girju, R. (2003). Automatic detection of causal relations for question answering. In 41st Annual Meeting of the Association for Computational Linguistics, Workshop on Multilingual Summarization and Question nswering–Machine learning and beyond, Sapporo, Japan.
Grishman, R. (1997). Information extraction: Techniques and challenges. Lecture Notes in Artificial Intelligence, (Vol. 1299), Springer-Verlag.
Kaplan R.M. and Berry-Rogghe G. (1991). Knowledge-based acquisition of causal relationships in text. Knowledge Acquisition 3(3): 317–337
Khoo, C. S., Chan, S., & Niu, Y. (2000). Extracting causal knowledge from a medical database using graphical patterns. In 38th Annual Meeting on Association for Computational Linguistics, Hong Kong, pp. 336–343.
Khoo C.S., Kornfilt J., Oddy R.N. and Myaeng S.H. (1998). Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary and Linguistic Computing 13(4): 177–186
Kim, S., Bracewell, R. H., & Wallace, K. M. (2007a). Improving the reuse of root cause analysis using semantic annotation. In First Semantic Technology Conference, Vienna, Austria.
Kim, S., Bracewell, R. H., & Wallace, K. M. (2007b). A framework for automatic causality extraction using semantic similarity. In ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, USA.
Johnson, C. A. (1999). First step towards the integration of accident reports and constructive design, Documents. In SAFECOMP.
Liddy E.D. (1998). Enhanced text retrieval using natural language processing. Bulletin of the American Society for Information Science and Technology 24(4): 14–16
Macus M.P., Santorini B. and Marcinkiewicz M.A. (1994). Building a large annotated corpus of english: The Penn Treebank. Computational Linguistics 19(2): 313–330
Marcu, D., & Echihabi, A. (2002). An unsupervised approach to recognising discourse relations. In Proceedings the 40th Annual Meeting of the Association for Computational Linguistics, USA, pp. 368–375.
Meier, E. A. (2001). Contrastive study of causal subordination in english and norwegian, PhD Thesis, University of Oslo.
Miller, G. A., Beckwith, R. W., Fellbaum, C., Gross, D., & Miller, K. (1993). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–312.16.
Mitchell, T. (1997). Machine learning. McGraw Hill.
Mynatt C.R., Doherty M.E. and Tweeny R.D. (1977). Confirmation bias in a simulated research environment: An experimental study of scientific inference. Quarterly Journal of Experimental Psychology 29: 85–95
Nickerson R.S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology 2(2): 175–220
Palmer, D., & Hearst, M. (1994). Adaptive sentence boundary disambiguation. In Proceedings of the Fourth Conference on Applied Natural Language Processing, Stuttgart, Germany, pp. 78-83.
Pechsiri, C., Kawtrakul, A., & Piriyakul, R. (2006). Mining causality knowledge from textual data. In Twenty-fourth IASTED International Multi-Conference Artificial Intelligence and Applications, Austria, pp. 85–90.
Salton, G. (1989). Advanced information-retrieval models. In G. Salton (Ed.), Automatic text processing (Chapter 10). Addison-Wesley Publishing Company.
Sekine, S., & Grishman, R. (2001). A corpus-based probabilistic grammar with only two non-terminals. In Fourth International Workshop on Parsing Technologies, Czech Republic, pp. 216–223.
80-20 Software. (2003). 80-20 Retriever Enterprise Edition. available from http://www.80-20.com/brochures/PersonalEmailSearchSolution.pdf.
Takashi, I., Kentaro, I., & Yuji, M. (2003). What kinds and amounts of causal knowledge can be acquired from text by using connective markers as clues? In International Conference on Discovery Science, pp. 180–193.
Taproot (2007). Taproot root cause analysis tool. http://www.taproot.com/.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, S., Aurisicchio, M. & Wallace, K. Towards automatic causality boundary identification from root cause analysis reports. J Intell Manuf 20, 581–591 (2009). https://doi.org/10.1007/s10845-008-0143-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10845-008-0143-z