Skip to main content
Log in

Towards automatic causality boundary identification from root cause analysis reports

  • Published:
Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Abstract

This paper presents the results of developing and evaluating an automatic approach that identifies causality boundaries from causality expressions. This approach focuses on explicitly expressed causalities extracted from Root Cause Analysis (RCA) reports in engineering domains. Causality expressions contain Cause and Effect pairs and multiple expressions can occur in a single sentence. Causality boundaries are semantically annotated text fragments explicitly indicating which parts of a fragment denote Causes and corresponding Effects. To identify these, linguistic analysis using natural language processing (NLP) is required. Current off-the-shelf NLP tools are mostly developed based on the language models of general-purpose texts, e.g. newspapers. The lack of portability of these tools to engineering domains has been identified as a barrier to achieving comparable analysis accuracy in new domains. One of the reasons for this is the rare and unpredictable behaviours of certain words in closed domains. Ill-formed sentences, abbreviations and capitalization of common words also contribute to the difficulty. The proposed approach addresses this problem by using a probability-based method that learns the probability distribution of the boundaries not only from the NLP analysis but also from the local contexts that exploit language conventions occurred in the RCA reports. Using a collection of RCA reports obtained from an aerospace company, a test showed that the proposed approach achieved 86% accuracy outperforming a baseline approach that relied only on the NLP analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allen, J. (1987). Natural language understanding. Benjamin/Cummings Publishing Company, Inc.

  • Bruseberg, A., & Johnson, P. (2003). Understanding human error in context: Approaches to support interaction design using air accident reports. In 12th International Symposium on Aviation Psychology USA, pp. 166–171.

  • Buyko, E., Wermter, J., Poprat M., & Hahn U. (2006). Automatically adapting an NLP core engine to the biology domain. In Proceedings of the BioLink & Bio-Ontologies, SIG Meeting 2006 Brazil.

  • Chang D. and Choi K. (2006). Incremental cue phrase learning and bootstrapping method for causality extraction using cue phrase and word pair probabilities. Information Processing and Management 42(3): 662–678

    Article  Google Scholar 

  • Cole, S. V., Roya, M. D., Valtorta, M. G., & Huhns, M. N. (2006). A lightweight tool for automatically extracting causal relations from text. IEEE Xplore, 31 March 2005-2 April 2005, 125–129.

  • Forman G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3: 1289–1305

    Article  Google Scholar 

  • Girju, R. (2003). Automatic detection of causal relations for question answering. In 41st Annual Meeting of the Association for Computational Linguistics, Workshop on Multilingual Summarization and Question nswering–Machine learning and beyond, Sapporo, Japan.

  • Grishman, R. (1997). Information extraction: Techniques and challenges. Lecture Notes in Artificial Intelligence, (Vol. 1299), Springer-Verlag.

  • Kaplan R.M. and Berry-Rogghe G. (1991). Knowledge-based acquisition of causal relationships in text. Knowledge Acquisition 3(3): 317–337

    Article  Google Scholar 

  • Khoo, C. S., Chan, S., & Niu, Y. (2000). Extracting causal knowledge from a medical database using graphical patterns. In 38th Annual Meeting on Association for Computational Linguistics, Hong Kong, pp. 336–343.

  • Khoo C.S., Kornfilt J., Oddy R.N. and Myaeng S.H. (1998). Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary and Linguistic Computing 13(4): 177–186

    Article  Google Scholar 

  • Kim, S., Bracewell, R. H., & Wallace, K. M. (2007a). Improving the reuse of root cause analysis using semantic annotation. In First Semantic Technology Conference, Vienna, Austria.

  • Kim, S., Bracewell, R. H., & Wallace, K. M. (2007b). A framework for automatic causality extraction using semantic similarity. In ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, USA.

  • Johnson, C. A. (1999). First step towards the integration of accident reports and constructive design, Documents. In SAFECOMP.

  • Liddy E.D. (1998). Enhanced text retrieval using natural language processing. Bulletin of the American Society for Information Science and Technology 24(4): 14–16

    Article  Google Scholar 

  • Macus M.P., Santorini B. and Marcinkiewicz M.A. (1994). Building a large annotated corpus of english: The Penn Treebank. Computational Linguistics 19(2): 313–330

    Google Scholar 

  • Marcu, D., & Echihabi, A. (2002). An unsupervised approach to recognising discourse relations. In Proceedings the 40th Annual Meeting of the Association for Computational Linguistics, USA, pp. 368–375.

  • Meier, E. A. (2001). Contrastive study of causal subordination in english and norwegian, PhD Thesis, University of Oslo.

  • Miller, G. A., Beckwith, R. W., Fellbaum, C., Gross, D., & Miller, K. (1993). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–312.16.

    Google Scholar 

  • Mitchell, T. (1997). Machine learning. McGraw Hill.

  • Mynatt C.R., Doherty M.E. and Tweeny R.D. (1977). Confirmation bias in a simulated research environment: An experimental study of scientific inference. Quarterly Journal of Experimental Psychology 29: 85–95

    Article  Google Scholar 

  • Nickerson R.S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology 2(2): 175–220

    Article  Google Scholar 

  • Palmer, D., & Hearst, M. (1994). Adaptive sentence boundary disambiguation. In Proceedings of the Fourth Conference on Applied Natural Language Processing, Stuttgart, Germany, pp. 78-83.

  • Pechsiri, C., Kawtrakul, A., & Piriyakul, R. (2006). Mining causality knowledge from textual data. In Twenty-fourth IASTED International Multi-Conference Artificial Intelligence and Applications, Austria, pp. 85–90.

  • Salton, G. (1989). Advanced information-retrieval models. In G. Salton (Ed.), Automatic text processing (Chapter 10). Addison-Wesley Publishing Company.

  • Sekine, S., & Grishman, R. (2001). A corpus-based probabilistic grammar with only two non-terminals. In Fourth International Workshop on Parsing Technologies, Czech Republic, pp. 216–223.

  • 80-20 Software. (2003). 80-20 Retriever Enterprise Edition. available from http://www.80-20.com/brochures/PersonalEmailSearchSolution.pdf.

  • Takashi, I., Kentaro, I., & Yuji, M. (2003). What kinds and amounts of causal knowledge can be acquired from text by using connective markers as clues? In International Conference on Discovery Science, pp. 180–193.

  • Taproot (2007). Taproot root cause analysis tool. http://www.taproot.com/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanghee Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S., Aurisicchio, M. & Wallace, K. Towards automatic causality boundary identification from root cause analysis reports. J Intell Manuf 20, 581–591 (2009). https://doi.org/10.1007/s10845-008-0143-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10845-008-0143-z

Keywords

Navigation