Inferring the semantic properties of sentences by mining syntactic parse trees
Introduction
Proceeding from parsing to the semantic level is an important step toward natural language understanding, with immediate applications in tasks such as information extraction and question answering [1], [10], [30], [45]. Over the last decade, there has been a dramatic shift in computational linguistics from the manual construction of grammars and knowledge bases to partially or totally automating these processes using statistical learning methods trained on large annotated or un-annotated natural language corpora.
In this paper, we explore the possibility of high-level semantic classification of natural language sentences based on syntactic(constituency) parse trees. We address semantic classes appearing in information extraction (IE) and knowledge integration problems that usually require a deep natural-language understanding [6], [8], [12].
We attempt to combine the best of two worlds of linguistics and machine learning:
- 1)
Rely on rich linguistic data such as constituency parse trees, and
- 2)
Apply a systematic way to tackle this data, such as graph-oriented deterministic machine learning.
Notice that (1) gives us rather rich set of features compared to a bag-of-words approach or shallow parsing. We need to tackle such a rich set of features with its inherent structure by a structured machine learning approach. In this study we will evaluate how this richer set of tree-based features as a subject of graph-based learning outperforms keyword-based approaches in a number of text relevance problems.
Our approach is inspired by the notion of anti-unification [26], [29] which is capable of generalizing arbitrary formulas in a formal language. We extend this notion towards anti-unifying of arbitrary linguistic structures such as constituency parse tree. In this paper we propose a definition and algorithm of a syntactic generalization which allows us to treat syntactic natural language expressions in a unified way as logic formulas.
Learning based on syntactic parse tree generalization is different from kernel methods, which are nonparametric density estimation techniques that compute a kernel function between data instances (which can include keywords, as well as their syntactic parameters), where a kernel function can be considered a similarity measure. Given a set of labeled instances, kernel methods determine the label of a novel instance by comparing it to the labeled training instances using the kernel function. Nearest neighbor classification and support-vector machines (SVMs) are two popular examples of kernel methods [62], [63]. Compared to kernel methods, syntactic generalization can be considered structure-based and deterministic; linguistic features retain their structures and are not represented as values. Regarding the edit distance class of similarity methods, its analogue in anti-unification will be discussed in Section 6.1: such methods are better adjusted to objects with more peculiar structures, such as syntactic parse trees.
The main question considered in this study is whether these semantic patterns, unobservable at the level of keyword statistics, can be inferred from a complete parse tree structure. Moreover, the argumentative structures of the way authors communicate their conclusions (as expressed by their syntactic structures) are important in relating a sentence to the above classes. Studies [13], [14] have demonstrated that graph-based machine learning can predict the plausibility of complaint scenarios based on their argumentation structures. Furthermore, we observed that learning the communicative structure of inter-human conflict scenarios can successfully classify the scenarios into a series of domains, from complaints to security-related domains. These findings convince us that applying a similar graph-based machine learning technique to such structures as syntactic trees, which have even weaker links to high-level semantic properties than these settings, can deliver satisfactory classification results. Graph based learning has been applied in a number of domins beyond linguistics (see e.g. [60]).
Most of the current learning research on NLP employs particular statistical techniques inspired by research on speech recognition, such as hidden Markov models (HMMs) and probabilistic context-free grammars (PCFGs). A variety of learning methods, including decision tree and rule induction, neural networks, instance-based methods, Bayesian network learning, inductive logic programming, explanation-based learning, and genetic algorithms can also be applied to natural-language problems and can present significant advantages in particular applications [25], [46]. In addition to specific learning algorithms, a variety of general ideas from traditional machine learning, such as active learning, boosting, reinforcement learning, constructive induction, learning with background knowledge, theory refinement, experimental evaluation methods, and PAC learnability, may also be usefully applied to natural-language problems [10]. In this study, we employ the nearest neighbor type of learning, which is relatively simple, to focus our investigation on how expressive the similarity between syntactic structures can be in the detection of weak semantic signals. Other, more complex learning techniques can be applied, being more sensitive or more cautious, after we confirm that our measure of the syntactic similarity between texts is adequate.
The computational linguistics community has assembled large data sets for a range of interesting NLP problems. Some of these problems can be reduced to a standard classification task by appropriately constructing their features; however, others require using and/or producing complex data structures, such as complete parse trees and operations on these complete parse trees In this paper, we introduce the generalization operation to a pair of parse trees for two sentences and demonstrate its role in sentence classification. The operation of generalization is defined starting at the level of lemmas and continuing through chunks/phrases all the way up to paragraphs/texts.
Learning syntactic parse trees allows one to conduct semantic inference in a domain-independent manner without using ontologies or other manually built resources. Training sets for text classification problems still need to be collected, but class assignment can be automated. Simultaneously, in contrast to most semantic inference projects, we will be restricted to a very specific semantic domain (limited set of classes), solving a number of practical problems.
The paper is organized as follows. We introduce three distinct problems of different complexities in which one or another semantic feature must be inferred from natural language sentences. We then describe the algorithm of the generalization of parse trees, followed by the nearest neighbor learning of the generalization results. The paper concludes with a comparative analysis of classification in selected problem domains, a search engine description, and a brief review of other studies with semantic inferences.
Section snippets
Application areas of syntactic generalization
In this study, we leverage the parse tree generalization technique in the automation of content management and a delivery platform [15], [57] referred to as the Integrated Opinion Delivery Environment. This platform combines data mining of the web and social networks, content aggregation, reasoning, information extraction, questioning/answering and advertising to support distributed recommendation forums for a wide variety of products and services. In addition to human users, automated agents
Generalizing portions of text
To measure the similarity of abstract entities expressed by logic formulas, a least-general generalization was proposed for a number of machine learning approaches, including explanation-based learning and inductive logic programming. Least-general generalization was originally introduced by Plotkin [26]. It is the opposite of most-general unification [27]; therefore, it is also known as anti-unification. Anti-unification was first studied in Plotkin and Robinson [26], [27]. As its name
From generalization to logical form representation
We now demonstrate how the generalization framework can be combined with semantic representations, such as logic forms, to perform the learning of a text's meaning. We have demonstrated how semantic features can be deduced from syntactic parse trees when an appropriate similarity operation is found. However, in a number of applications, certain semantic knowledge is available in advance and therefore, does not have to be learned. In this section, we show how to combine preset semantic
Syntactic generalization-based search engine and its evaluation
The search engine based on syntactic generalization is designed to provide opinion data in an aggregated form obtained from various sources. Conventional search results and Google-sponsored link formats are selected because they are the most effective and are already accepted by a vast community of users.
Comparative performance analysis in text classification domains
To evaluate the expressiveness and sensitivity of the syntactic generalization operation and its associated scoring system, we applied the Nearest Neighbor algorithm to the series of text classification tasks outlined in Section 2 (Table 3). We formed several datasets for each problem, conducted independent evaluation for this dataset and averaged the resultant accuracy (F-measure). The training and evaluation datasets of the texts and the class assignments were made by the authors. Half of
Related work
Most of the work on automated semantic inference from syntax deals with much lower semantic levels than the semantic classes we manage in this study. de Salvo Braz et al. [21] present a principled, integrated approach to semantic entailment. These authors developed an expressive knowledge representation that provides a hierarchical encoding of the structural, relational and semantic properties of the text and populated it using a variety of machine learning-based tools. An inferential mechanism
Conclusions
In this study, we demonstrated that high-level sentences semantic features such as being informative can be learned from the low-level linguistic data of a complete parse tree. Unlike the traditional approaches to the multilevel derivation of semantics from syntax, we explored the possibility of linking low-level but detailed syntactic levels with high-level pragmatic and semantic levels directly.
In recent decades, most approaches to NL semantics relied on mapping to First Order Logic
Acknowledgments
We are grateful to our colleagues SO Kuznetsov, B Kovalerchuk and others for valuable discussions and to our anonymous reviewers for their suggestions. This research is partially funded by the EU Project No. 238887, a unique European Citizens' attention service (iSAC6+) IST-PSP. This research is also funded by the Spanish MICINN (Ministerio de Ciencia e Innovación)IPT-430000-2010-13 project Social powered Agents for Knowledge search Engine (SAKE), TIN2010-17903 Comparative approaches to the
Boris Galitsky has been contributing natural language-related technologies to Silicon Valley, USA start-ups over last two decades. In 1999 he co-founded iAskWeb which was providing tax and investment recommendations to customers of a few Fortune 500 companies. He contributed his linguistic technology to Xoopit, acquired by Yahoo, Uptake, acquired by Groupon, and LogLogic, acquired by Tibco. He received his PhD in natural language understanding in 1994 and ANECA/EU Associate Professorship degree
References (68)
Generalized subsumption and its applications on induction and redundancy
Artificial Intelligence
(1988)A Survey on Tree Edit Distance and Related Problems
Journal Theoretical Computer Science Archive
(2005)- et al.
A novel approach for classifying customer complaints through graphs similarities in argumentative dialogue
Decision Support Systems
(2009) - et al.
Concept-based learning of human behavior for customer relationship management. Special Issue on Information Engineering Applications Based on Lattices
Information Sciences
(2011) - et al.
Corpus-based semantic role approach in information retrieval
Data & Knowledge Engineering
(2007) - et al.
A general framework for subjective information extraction from unstructured English text
Data & Knowledge Engineering
(August 2007) - et al.
Exploring syntactic structured features over parse trees for relation extraction using kernel methods
Information Processing and Management: An International Journal
(March 2008) - et al.
The refined process structure tree
Data & Knowledge Engineering
(September 2009) - et al.
Refining non-taxonomic relation labels with external structured data to support ontology learning
Data & Knowledge Engineering
(August 2010) Natural Language Understanding
(1987)
Semantic Inference at the Lexical-Syntactic Level AAAI-05
Duplicate code detection using anti-unification
Open information extraction from the web
Semantic role labeling with chunk sequences
Generic parsing for multi-domain semantic interpretation
Semantic role labeling by tagging syntactic chunks
Machine learning and natural language
Machine Learning
Introduction to the CoNLL-2004 shared task: Semantic role labeling
Natural language question answering system: technique of semantic headers
Learning communicative actions of conflicting human agents
Journal of Experimental & Theoretical Artificial Intelligence
Using generalization of syntactic parse trees for taxonomy capture on the web
ICCS
Increasing the relevance of meta-search using parse trees
Semantic classification based on machine learning of parse trees
A logic-based semantic approach to recognizing textual entailment
The Necessity of Syntactic Parsing for Semantic Role Labeling IJCAI-05
An inference model for semantic entailment in natural language
DIRT: discovery of inference rules from text
A system of logic, racionative and inductive
Cogex: a logic prover for question answering
A note on inductive generalization
A machine-oriented logic based on the resolution principle
Journal of the Association for Computing Machinery
Investigating a generic paraphrase-based approach for relation extraction
Transformational systems and the algebraic structure of atomic formulas
Machine Intelligence
Learning surface text patterns for a Question Answering system
Cited by (31)
Machine identification of potential manufacturing process failure modes based on process constituent elements
2022, Advanced Engineering InformaticsA comprehensive study: Sentence compression with linguistic knowledge-enhanced gated neural network
2018, Data and Knowledge EngineeringImproving relevance in a content pipeline via syntactic generalization
2017, Engineering Applications of Artificial IntelligenceCitation Excerpt :Platforms like Hadoop and their implementations such as Cascading (Cascading 2013) and Mahout (Mahout 2013) are capable of parsing and learning a large amount of textual data, but the relevance and semantic features are behind. So in this study we focus on making relevance efficient and will not go into low-level implementation details (see them in (Galitsky et al., 2012)). We will evaluate how an implementation of machine learning of parse trees can improve a number of text-based content pipeline tasks.
Matching parse thickets for open domain question answering
2017, Data and Knowledge EngineeringCitation Excerpt :Hence, we use the DT so that certain sets of nodes in the DT correspond to questions where this text is a valid answer and certain sets of nodes correspond to an invalid answer. In our earlier studies [24,21] we applied graph learning to parse trees at the levels of both sentences and paragraphs; here we proceed to the structured graph-based match of parse thickets. Whereas for text classification problems, learning is natural, it is not obvious how one can learn by answering a given question, given a training set of valid and invalid question-answer pairs.
Generalization of parse trees for iterative taxonomy learning
2016, Information SciencesLearning parse structure of paragraphs and its applications in search
2014, Engineering Applications of Artificial IntelligenceCitation Excerpt :Therefore, we can refer to the syntactic tree generalization as an operation of anti-unification of syntactic trees. To optimize the calculation of the generalization score, we conducted a computational study to determine the POS weights and deliver the most accurate similarity measure possible between sentences (Galitsky et al., 2010, 2012). The problem was formulated as finding the optimal weights for nouns, adjectives, verbs and their forms (such as gerund and past tense) so that the resultant search relevance is maximized.
Boris Galitsky has been contributing natural language-related technologies to Silicon Valley, USA start-ups over last two decades. In 1999 he co-founded iAskWeb which was providing tax and investment recommendations to customers of a few Fortune 500 companies. He contributed his linguistic technology to Xoopit, acquired by Yahoo, Uptake, acquired by Groupon, and LogLogic, acquired by Tibco. He received his PhD in natural language understanding in 1994 and ANECA/EU Associate Professorship degree in 2011. Boris authored more than 70 publications, a book and multiple patents in the field of natural language understanding. Boris is currently a lead scientist at eBay.
Prof. Josep Lluís de la Rosa, [email protected], h-index = 15, MSc and Ph.D. in Computer Engineering from the Autonomous University of Barcelona (UAB), Barcelona, in 1989 and 1993, MBA in 2002. He is professor of the Universitat de Girona (UdG) and director of the ARLab (Agents Research Laboratory — GRCT69). He has published more than 100+ papers in international journals and 300+ papers in international conferences, 4 patents and 3 spin-off companies. He was visiting professor at Rensselaer Polytechnic Institute (RPI) in 2008–2010. His research interests focus on intelligent agents, understanding the agency property of introspection or self-awareness, as well as understanding its impact in the emergent behaviour of billions of agents by means of the computational ecologies models. Digital preservation, social networks and complementary currencies are the areas of application. He has participated in several successful EU projects like ONE, Open Negotiation Environments FP6-2005-IST-5, grant agreement num. 34744 (2006–2009) and PROTAGE Preservation of Digital Information with Intelligent Agents, (2007–2011).
Gábor Dobrocsi is an Informatics Engineer who earned his M.Sc. at the University of Miskolc (Hungary). After he received his degree he became a visitor scientist at the Rensselaer Polytechnic Institute (USA) where he participated in an academic research to develop an alternative review system for scientific publications. Then he joined to the development team of a high end commercial citizens’ information and assessment service system featuring natural language processing techniques at EASY Innova (Spain). Currently he is a PhD student at the University of Girona (Spain) and researching on the fields of Agent technologies, Social search and recommendation systems and natural language processing.