Abstract
Identifying incorrect content (i.e., semantic error) in text is a difficult task because of the ambiguous nature of written natural language and the many factors that can make a statement semantically erroneous. Current methods identify semantic errors in a sentence by determining whether it contradicts the domain to which the sentence belongs. However, because these methods are constructed on expected logic contradictions, they cannot handle new or unexpected semantic errors. In this paper, we propose a new method for detecting semantic errors that is based on logic reasoning. Our proposed method converts text into logic clauses, which are later analyzed against a domain ontology by an automatic reasoner to determine its consistency. This approach can provide a complete analysis of the text, since it can analyze a single sentence or sets of multiple sentences. When there are multiple sentences to analyze, in order to avoid the high complexity of reasoning over a large set of logic clauses, we propose rules that reduce the set of sentences to analyze, based on the logic relationships between sentences. In our evaluation, we have found that our proposed method can identify a significant percentage of semantic errors and, in the case of multiple sentences, it does so without significant computational cost. We have also found that both the quality of the information extraction output and modeling elements of the ontology (i.e., property domain and range) affect the capability of detecting errors.
Similar content being viewed by others
References
Bechhofer S, van Harmelen F, Hendler J, Horrocks I, Patel-Schneider PF, McGuinness D, Stein L (2004) OWL Web Ontology Language. http://www.w3.org/TR/owl-ref/
Bos J (2008) Wide-coverage semantic analysis with boxer. In: Proceedings of the 2008 conference on semantics in text processing, STEP ’08. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 277–286
Bos J (2015) Open-domain semantic parsing with boxer. In: Proceedings of the 20th Nordic conference of computational linguistics, NODALIDA 2015, May 11–13, 2015, Institute of the Lithuanian Language, Vilnius, Lithuania, pp 301–304
Buitelaar P, Cimiano P, Racioppa S, Siegel M (2006) Ontology-based information extraction with SOBA. In: Proceedings of the international conference on language resources and evaluation (LREC). ELRA, pp 2321–2324
Carlson A, Betteridge J, Hruschka ER, M., M.T (2009) Coupling semi-supervised learning of categories and relations. In: Proceedings of the NAACL HLT 2009 workshop on semi-supervised learning for natural language processing (SemiSupLearn), pp 1–9
Fader A, Soderland S, Etzioni O (2011) Identifying relations for open information extraction. In: Conference on empirical methods in natural language processing (EMNLP), pp 1535–1545
Flouris G, Manakanatas D, Kondylakis H, Plexousakis D, Antoniou G (2008) Ontology change: classification and survey. Knowl Eng Rev 23(02):117–152
Fuchs NE, Kaljurand K, Schneider G (2006) Attempto controlled English meets the challenges of knowledge representation, reasoning, interoperability and user interfaces. In: Sutcliffe G, Goebel R (eds) FLAIRS conference. AAAI Press, pp 664–669
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220
Gutierrez F, Dou D, Fickas S, Griffiths G (2012) Providing grades and feedback for student summaries by ontology-based information extraction. In: Proceedings of the 21st ACM conference on information and knowledge management (CIKM), pp 1722–1726
Gutierrez F, Dou D, Fickas S, Griffiths G (2014) Online reasoning for ontology-based error detection in text. In: Proceedings of the 13th international conference on ontologies, databases and application of semantics (ODBASE), pp 562–579
Gutierrez F, Dou D, Fickas S, Martini A, Zong H (2013) Hybrid ontology-based information extraction for automated text grading. In: Proceedings of the 12th IEEE international conference on machine learning and applications (ICMLA), pp 359–364
Gutierrez F, Dou D, Fickas S, Wimalasuriya D, Zong H (2015) A hybrid ontology-based information extraction system. J Inf Sci 42:798–820
Haase P, Völker J (2008) Ontology learning and reasoning—dealing with uncertainty and inconsistency. In: Costa PC, D’Amato C, Fanizzi N, Laskey KB, Laskey KJ, Lukasiewicz T, Nickles M, Pool M (eds) Uncertainty reasoning for the semantic web I, pp 366–384
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings 14th conference on computational linguistics (COLING), pp 539–545
Hina S, Atwell E, Johnson O (2010) Secure information extraction from clinical documents using snomed ct gazetteer and natural language processing. In: 2010 International conference for internet technology and secured transactions, pp 1–5
Horridge M, Parsia B, Sattler U (2009) Explaining inconsistencies in OWL ontologies. Scalable Uncertain Manag 5785:124–137
Horrocks I, Patel-Schneider P (2004) Reducing owl entailment to description logic satisfiability. Web Seman Sci Serv Agents World Wide Web 1(4):345–357
Huang Z, van Harmelen F, ten Teije A (2005) Reasoning with inconsistent ontologies. In: Proceedings of the 19th international joint conference on artificial intelligence (IJCAI), pp 454–459
Kaljurand K, Fuchs NE (2007) Verbalizing OWL in Attempto Controlled English. In: Proceedings of third international workshop on OWL: experiences and directions, Innsbruck, Austria (6th–7th June 2007), vol 258
Koch M, Gilmer J, Soderland S, Weld DS (2014) Type-aware distantly supervised relation extraction with linked arguments. In: Conference on empirical methods in natural language processing (EMNLP)
Kuhn T (2007) AceRules: executing rules in controlled natural language. In: Massimo Marchiori CdSM, Pan JZ (eds) Proceedings of the first international conference on web reasoning and rule systems (RR2007), Lecture notes in computer science. Springer
Landauer TK, Laham D, Foltz PW (1998) Learning human-like knowledge by singular value decomposition: a progress report. In: Proceedings of the conference on advances in neural information processing systems (NIPS), pp 45–51
Mausam Schmitz M, Soderland S, Bart R, Etzioni O (2012) Open language learning for information extraction. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 523–534
Maynard D, Peters W, Li Y (2006) Metrics for evaluation of ontology-based information extraction. In: WWW workshop on evaluation of ontologies for the Web’ (EON). Edinburgh, Scotland, UK
Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th annual meeting of the association for computational linguistics (ACL) and the 4th international joint conference on natural language processing of the AFNLP, pp 1003–1011
Motik B, Shearer R, Horrocks I (2009) Hypertableau reasoning for description logics. J Artif Intell Res 36:165–228
Parsia B, Sirin E (2004) Pellet: an OWL DL Reasoner. In: 3rd international semantic web conference (ISWC)
Presutti V, Draicchio F, Gangemi A (2012) Knowledge extraction based on discourse representation theory and linguistic frames. In: Proceedings of the 18th international conference on knowledge engineering and knowledge management, EKAW’12, pp 114–129. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-33876-2_12
Reiter R (1987) A theory of diagnosis from first principles. Artif Intell 32(1):57–95
Ritter A, Downey D, Soderland S, Etzioni O (2008) It’s a contradiction—no, it’s not: a case study using functional relations. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 11–20
Saggion H, Funk A, Maynard D, Bontcheva K (2007) Ontology-based information extraction for business intelligence. The Semantic Web, pp 843–856
Schlobach S, Cornet R (2003) Non-standard reasoning services for the debugging of description logic terminologies. In: Proceedings of the 19th international joint conference on artificial intelligence (IJCAI), pp 355–362
Schlobach S, Huang Z, Cornet R, van Harmelen F (2007) Debugging incoherent terminologies. J Autom Reason 39(3):317–349
Smith A, Osborne M (2006) Using gazetteers in discriminative information extraction. In: Proceedings of the tenth conference on computational natural language learning, CoNLL-X ’06. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 133–140
Sohlberg MM, Ehlhardt L, Fickas S, Sutcliffe A (2003) A pilot study exploring electronic mail in users with acquired cognitive-linguistic impairments. Brain Injury 17(7):609–629
Surdeanu M, Tibshirani J, Nallapati R, Manning CD (2012) Multi-instance multi-label learning for relation extraction. In: Proceedings joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 455–465
Wimalasuriya DC, Dou D (2010) Ontology-based information extraction: an introduction and a survey of current approaches. J Inf Sci 36:306–323
Acknowledgements
This research is partially supported by the National Science Foundation Grant IIS-1118050 and Grant IIS-1013054. This research is also partially supported by the Fondo Nacional De Ciencia Y Tecnologia (FONDECYT), Chile, Grant No. 3170971. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the NSF or FONDECYT.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gutierrez, F., Dou, D., de Silva, N. et al. Online Reasoning for Semantic Error Detection in Text. J Data Semant 6, 139–153 (2017). https://doi.org/10.1007/s13740-017-0079-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-017-0079-6