Elsevier

Knowledge-Based Systems

Volume 18, Issue 1, February 2005, Pages 19-35
Knowledge-Based Systems

A natural language help system shell through functional programming

https://doi.org/10.1016/j.knosys.2004.04.002Get rights and content

Abstract

This paper investigates the development of a natural language (NL) interface for mixed initiative dialogues within a constrained domain and demonstrates the applicability of the functional approach to NL system development. The system consists of two major components, a natural language subsystem comprises a general-purpose parser that interprets a ‘plug and play’ tagged BNF grammar (which may be ambiguous), to parse natural language input and extract semantic information. The knowledge-based subsystem uses the semantic tags extracted by the natural language subsystem to generate a focused query to select the most appropriate script for a guided dialogue with the user. The system was written entirely in a purely functional language, which resulted in a surprisingly small and simple program.

Introduction

With the deregulation of the Internet and the development of ever more complex and sophisticated systems such as enterprise resource planning, customer relationship management systems, OLAP systems and operating systems it was inevitable that a parallel requirement for more sophisticated online user assistance systems would follow. To this end, academic investigators and commercial developers have drawn upon diverse research areas including natural language understanding (NLU), ergonomics, database systems, human factors and knowledge-based systems to make their systems more intuitive, user friendly and intelligent. Resulting in the emergence of new technologies and platforms in several areas:

  • Email handling systems, e.g. Jeeves Answers (www.jeevesolutions.com), e-Dialogue's Quick Reply (http://www.edialogue.com/) and ROI Direct.com's Customer Response (http://www.tele-direct.com/), being deployed by organizations such as the US Congress and Office Depot that receive large volumes of email correspondence.

  • Natural-language web site search technology, e.g. Phrase technology's (www.iphrase.com), One-Step, utilized by Charles Schwab to enable its 75 million end users type simple natural language queries of its site.

  • Call Center enhancement, e.g. InQuira (www.inquira.com) has developed a natural language (NL) interaction technology that helps simultaneously improve both the quality of the call center representatives' responses to customer queries but also to decrease the time taken to reach the solution, avoiding the decision tree type of analysis that is usually performed.

  • Computer-based training systems that utilize natural language interaction have been receiving increased attention [1] and products such as Wex Tech's (http://www.wextech.com/kipr.htm) AnswerWorks ‘question answering engine’ are aimed at using natural language understanding to enhance web-based training.

  • The use of NL queries of databases remains a difficult problem [2], [52], [53], [54], [55], [56], however, researchers such as Owei, who developed a ‘conceptual query language-with-natural language (CQL/NL)’ [3] to assist in the filtering of natural language queries, have continued to develop solutions for subproblem categories.

All of these systems have at their core some form of computational natural language processing (NLP) system, an area that has a long history of investigation and research. The most basic technique for analyzing natural language is that of keyword matching, of which Weizenbaum's ELIZA [4], [5] system is a well-known example. The major criticism of this type of NLP is that the dialogues generated tend to be very shallow and superficial, not allowing users to probe for solutions, involving, for example, inductive meta-knowledge. The second level of NLP techniques was inspired by Chomsky's work on grammars [6] whose work has radically influenced NLP. The grammars being used to ‘parse’ or break down the structure of the sentence helping establish their meaning, as opposed to keyword matchers, which are based upon the expectation of keywords being present in the sentence presented to them and with very little meaning being extracted from the input. There have been many types of grammars used within NL research including ‘phrase structured grammars’ [7] transformational grammars [6], case grammars [8] and syntactic grammars [9] and as such parsing is a central construct upon which NLP is based. Through the use of these grammatical rules in conjunction with other knowledge sources, the function of words within an input stream can be determined and the relations between them used to extract some degree of meaning from the sentences. Building upon the work of these early systems, researchers have taken a variety of approaches to building computational NLP systems including Refs. [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22]. For a significant and comprehensive bibliography of texts in computational NLU refer to Mark Kantrowitz's ‘Bibliography of Research in Natural Language Generation’ [23], the survey from Varile and Zampolli [24] or the digital archive of research papers in computational linguistics at the University of Pennsylvania (http://www.cis.upenn.edu/~adwait/penntools.html).

This paper builds upon the previous research to present a new two-part computational NLP system based upon the use of an executable set of functional equations and through this notation demonstrates its applicability to the creation of knowledge-based online help systems. A prototype solution to the UNIX help assistant problem [13] is presented, this was felt to be a suitable domain through which the operational aspects of the approach could be tested as its solution space is formally defined, yet facilitates mixed initiative dialogues.

The natural language processing subsystem accepts as input a BNF description of the language. This approach has the advantage that the language module can be replaced or upgraded by any user who understands formal grammars without requiring any programming; it was largely abandoned in mainstream NLP research (partly) because of the ambiguous nature of natural languages. The approach taken here is to produce all possible parses of the input query. In the relatively restricted domain of help systems, input are not large: queries tend not to be multi-paragraph compositions, so although ambiguities may still produce more than one possible parse, the sometimes exponential explosion of possibilities is not a debilitating problem.

The BNF [25] accepted by the parser is extended with simple semantic tags that essentially say ‘if a successful parse comes through here, make a note of xxxx’. The output from the parser is not just a list of possible parse trees, but also a list of possible tag sets. Each tag set contains the semantic tags that were encountered in an ultimately successful parse of the input. For example, in the famous example ‘fruit flies like a banana’, the word ‘Flies’ can be either a verb or a noun. In the extended BNF, where ‘flies’ is listed as a possible verb, the tag ‘action-travel’ could be specified; where it is listed as a possible noun, the tag ‘actor-insect’ could appear. With a similar treatment for ‘fruit’, ‘like’, and ‘banana’, the parser would produce two possible tag sets:

  • (actor-insect, actor-enjoy, object-food)

  • (actor-food, action-travel, manner-food)

Of course, the difference between subjects and objects can be noted by the same means. Associating semantic tags with a position in a parse tree rather than a linear lexeme stream allows aubtle but important distinctions of meaning to be discerned: ‘the broken printer is grey’ may deliver useful information, but is obviously not reporting a problem, but ‘the grey printer is broken’ requires a definite reaction, even though both have the same words used as the same parts of speech (pos). The disadvantage to this is that input can only be processed if it conforms exactly to the given grammar. The ability to handle ambiguity without any problems means that much more forgiving grammars can be used, and on-line help systems are more likely to be used by grammatically competent users, but the need for ‘correct’ input can not be completely ignored.

This simplified, linear representation of meaning would clearly not be sufficient for a full natural language understanding system, but in the restricted domain of on-line help systems with its much smaller expected inputs, it provides an appropriate level of detail for further analysis.

Given the list of possible tag sets it is often found that all of them are the same: ambiguities in parsing do not always reflect semantic ambiguities. When all the syntax has been discarded and the input reduced to a set of tags that represent it's meaning there will often be no ambiguity left. The matching agent searches through a knowledge base of scripts and selects those whose indexes most closely match the tag sets. The knowledge base is stored as an association list connecting scripts to sets of semantic tags that must be matched as closely as possibly. This is another process that helps to resolve ambiguity: similar but not identical tag sets may select the same script. If more than one script is still selected the user may be asked to clarify their meaning by selecting from the topic summaries associated with each script a simple ‘Did you mean A or B—type question’), upon which a dialogue is entered into. The process is shown in Fig. 1.

The script selection process is a variation of the approach advocated by Hobbs (1995) for the creation of generic information extraction systems and extends of Plant's work on knowledge-based help systems [12].

Section snippets

Language choice

A major issue to decide upon is how to represent the BNF syntax as a data structure. However, a pre-requisite factor on the nature and form of the data structure is the language with which the whole system is to be implemented. There is a wide choice of languages with which the implementation could be performed:

  • Procedural (imperative) languages such as Pascal [26], [27], C [28], Perl [29]

  • Specialist languages for NLP such as LIFER [10].

  • Assumptive Logic Programming [30],

  • Object languages such as

A functional specification of BNF

Having decided upon the BNF form of grammar and the use of the functional language AFL with which to ultimately implement the parser, the next stage was to devise a representation of the BNF in AFL.

There are four components making up the BNF: (i) Terminal symbols, e.g. words like ‘dog’, (ii) Non terminal symbols, e.g. <noun> these being the name of structural units and denoted by enclosure within angle-brackets, (iii) The disjunction of two or more components, the ‘or’ being represented by the

The parser

Having devised the BNF specification and its functional representation the next step was to develop a parser which would accept input in the form of a sentence in English and the BNF for the grammar, check to see if the sentence is legal according to the grammar, parse it and extract semantic information from it.

The utilization of a parser-based system allows the system to extract meaning from the interaction and not just attempt to match key words at random. It is important to note the

The knowledge base processor

Having constructed the parser and a functional specification of the BNF grammar, the next stage was to make the system respond to the user queries with meaningful answers. This was achieved through the use of a knowledge base that used as its representational structure an association list.

The association list connects pre-constructed scripts and informative texts to patterns or templates for semantic tag lists. For example, a user wanting to know how to print a file may type ‘I want to print a

Towards a mixed initiative dialogue

In order to build a mixed initiative framework it was necessary to extend the knowledge base. However, it was felt that the specialized knowledge required for a dialogue should be separated from the general information and facts stored in the knowledge base [44].

One of the underlying aims of developing this functional online help system was to utilize the functional programming systems formality. This powerful aspect of the language was beneficial in several ways, for example the heterogeneous

Future work

Research in this area can be extended in both the development of the theory of NLP and in the development of functional programming as applied to the area of NLP. The two areas could be extended along the lines suggested by Hobbs (1995) and allow for the automatic generation of the functional equations of specified grammars, together with the automatic generation of scripts and templates from databanks as suggested in the emerging area of information extraction [47]. Adaptive systems that learn

References (57)

  • P. Postal

    Limitations of phrase structured grammars

  • C. Filmore

    The case for case

  • M.A.K. Halliday

    Categories of the theory of grammar

    Word

    (1961)
  • G.G. Hendrix et al.

    Developing a natural language interface to complex data

    ACM Transaction Database System

    (1978)
  • J. Slocum

    A Practical Comparison of Parsing Strategies

    (1981)
  • R.T. Plant, An investigation of knowledge-based help facilities. MSc Dissertation. Oxford University Computing...
  • K. Sikel

    How to compare the structure of parsing algorithms

  • M. Mosny

    Semantic information preprocessing for natural language interfaces to databases

    (1995)
  • C.I. Guinn

    Mechanisms for mixed-initiative human–computer collaborative discourse

  • P. Callaghan, An evaluation of LOLITA and related natural language processing systems. PhD Thesis. University of...
  • N. Webb et al.

    Natural language engineering: slot-filling in the YPA

    Proceedings of the Workshop on Natural Language Interfaces, Dialogue and Partner Modeling, at the Fachtagung fur Kunstliche Intelligenz KI'99 at the Fachtagung fur Kunstliche Intelligenz KI'99, Bonn, Germany

    (1999)
  • N. Lesh et al.

    Using plan recognition in human–computer collaboration

    (1999)
  • R. Mooney

    Learning for semantic interpretation: scaling up without dumbing down

  • M. Kantrowitz

    Bibliography of Research in Natural Language Generation

    (1993)
  • J. Backus

    The syntax and semantics of the proposed international language in Zurich

    (1959)
  • J.R. Hobbs, Monotone decreasing quantifiers in a scope-free logical form, in: K. van Deemter, S. Peters (Eds.),...
  • J. Welsh et al.

    Introduction to Pascal

    (1982)
  • Cited by (4)

    • A reliability index for fuzzy expert group criterion

      2021, IOP Conference Series: Materials Science and Engineering
    • A framework to automate the parsing of Arabic language sentences

      2009, International Arab Journal of Information Technology
    • A natural language model of computing with words in web pages

      2006, PACLIC 20 - Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation
    View full text