Conversational natural language understanding interfacing city event information

doi:10.1016/S0169-023X(02)00050-2

Data & Knowledge Engineering

Volume 42, Issue 3, September 2002, Pages 343-360

https://doi.org/10.1016/S0169-023X(02)00050-2 Get rights and content

Abstract

The article describes aspects of the development of a conversational natural language understanding (NLU) system done during the first year of the European research project CATCH-2004 (Converse in AThens Cologne and Helsinki) [http://www.catch2004.org]. The project is co-funded by the European Union in the scope of the IST programme (IST 1999-11103).

Its objectives focus on multi-modal, multi-lingual conversational natural language access to information systems. The paper emphasises on architecture, and telephony-based speech and NLU components as well as aspects of the implementation of a city event information (CEI) system in English, Finnish, German and Greek. The CEI system accesses two different databases in Athens and Helsinki using a common retrieval interface. Furthermore the paper singles out methodologies involved for acoustic and language model of the speech recognition component, parsing techniques and dialog modelling for the conversational natural language subsystem. For the implementation it outlines an incremental system refinement methodology necessary to adapt the system components to real-life data. It addresses the implementation of language specific characteristics and a common dialog design for all four languages, but also deals with aspects towards a multilingual conversational system. Finally, it presents prospects for further developments of the project.

Introduction

Information services are more and more available and becoming increasingly complex. It has therefore become desirable to provide users with conversational access to information by means of human-like behaviour of the system. It is the intention to resolve user queries in a mixed initiative dialog by sharing applicable information of the system and leading the dialog to retrieve the desired information.

With several thin devices (such as telephones, smart wireless devices) that can potentially be used by anyone to access information, designing interfaces that enable any (expert or novice) user friendly and pervasive access to automated services information is becoming of paramount importance.

CATCH-2004 aims to develop a multilingual, conversational system providing access to multiple applications and sources of information. The system architecture is designed to support multiple client devices such as kiosks, telephones and smart wireless devices. It also will allow users to interact with multiple input modalities. The architecture is composed of two major frameworks: a server-side multi-modal portal providing a flexible middleware technology for interacting with multiple clients in multiple modalities and languages, and a telephony based conversational natural language understanding (NLU) system [2], [4].

The paper focuses on architectural and methodological aspects of the telephony based conversational NLU system, and describes the application developments been done for a city event information (CEI) system for two cities: Athens and Helsinki.

Section snippets

Architecture and components of a telephony based conversational natural language understanding system

In the following sections we describe architecture and components of the telephony based conversational NLU system.

Speech and language resources for conversational system

A conversational CEI system was implemented applying architecture and components described above. CEI demonstrators are to be deployed for two cities: Athens, supporting English and German, and Helsinki supporting English and Finnish.

The CEI design considers the intention of people generally interested in some cultural events to retrieve information for a particular date an event is taking place, location and time. Requests may demand information about the published events and can be

Tuning and performance

Currently the developments focus on the refinement of the speech and language components and in particular with reference to speech recognition, two-level parsing and dialog behaviour. As outlined in one of the sections above, an initial conversational system is used to collect acoustic real life data and to test the dialog design. These acoustic data are transcribed by simply correcting insertions, deletions or substitutions of the actual speech recognition. The following paragraphs address

Conclusion

In this paper we addressed architecture and components that have been used to build a conversational NLU application. As part of the CATCH-2004 project developments, a telephony-based conversational CEI system has been implemented for four languages accessing city event databases in Athens and Helsinki. A first implementation was done to bootstrap the application based on written data, which does not reflect necessarily the human-machine-dialog like behaviour.

As the refinement methodology is

Dr. Marion Mast received her diploma (1988) and doctoral degree (1993) in Computer Science both from the FAU University of Erlangen-Nuernberg, Germany. Between 1988 and 1996 she was member of the research staff of the Institute for Pattern Recognition, working on dialog management in different research projects (SUNDIAL, VERBMOBIL). In 1996 she joined the European Speech Research Team at the IBM Science Center in Heidelberg and focused on language modelling, natural language understanding,

References (9)

A. Berger et al.
A maximum entropy approach to natural language processing
Computational Linguistics
(1996)
CATCH2004 European Research Project––Official Website:...
V. Demesticha, H. Harrikari, M. Mast, L. Polymenakos, T. Ross, H. Schulz, J. Stadermann, Y. Vamvakoulas, A...
V. Demesticha, J. Gergic, J. Kleindienst, M. Mast, L. Polymenakos, H. Schulz, L. Seredi. Aspects of design and...

There are more references available in the full text version of this article.

Cited by (0)

Dr. Thomas Ross received his diploma in Computer Science (1989) from the University of Erlangen-Nuernberg, Germany and his doctoral degree (Dr. rer. nat., 1997) at the Faculty of Technical Sciences from the Medical University of Luebeck, Germany. He was member of the research staff at the Institute for Medical Informatics, working on projects in tele-radiology (KAMEDIN), pattern recognition and image processing. In 1997 he joined the European Speech Research Team at the IBM Science Center in Heidelberg with an emphasis on acoustic modelling in telephony environments. He is author of one book, and has published about 10 scientific papers in national and international conferences and journals.

Henrik Schulz obtained his degree in Computer Science from the Technical University of Ilmenau, Germany, in 1996. He was member of the research staff of the department for Digital Signal Processing working on speech recognition with emphasis on neural networks and statistical modelling. In 1997 he joined the HLT group of Anite-Systems Luxembourg focusing on speech recognition and natural language understanding within the European Research Project MELISSA. Since beginning of 2000 he works on language modelling and natural language understanding for multilingual conversational systems in scope of the IST project CATCH-2004 within the IBM European Speech Research in Heidelberg, Germany. He is author of several publications for national and international conferences and journals.

Dr. Lazaros Polymenakos, obtained his Electrical Engineering and Computer Science Degree from NTUA in 1989. He pursued graduate studies in the same area at MIT, where he obtained his Masters and Ph.D. Since 1995 he has worked with the IBM HLT group on methods for automatic speech and improving accuracy in speech recognition in embedded devices, and is the author of several publications and patents. In 1996 he was visiting professor at Rutgers University, NJ. In 1998 he joined IBM Hellas focusing on research and development for the Greek speech recognition system. Since January 2000 he is the technical leader of the CATCH-2004 IST project, and is managing the speech recognition research effort at IBM-Hellas.

Yannis A. Vamvakoulas, received a degree in Computer Engineering & Informatics (cum laude) from the Department of Computer Engineering & Informatics, University of Patra, Patra, Greece, in September 1997. Currently he is a Ph.D. candidate at NTUA and a research associate in IBM. His research interests are in the areas of software engineering, WWW technologies and man-machine interaction, where he has published a number of papers in international conferences.

Dr. Vasiliki Demesticha, obtained her Diploma in Physics from the University of Athens in 1994 and pursued her Ph.D. degree in Electrical and Computer Engineering from NTUA. The area covered during that research period was mainly focusing on mobile communications. Since 1994, she has been involved in a number of research and development programmes. Her research interests include telecommunications, software engineering, internet services and electronic commerce and is a co-author for many publications. Since beginning of 2000 she joined the HLT group in IBM-Hellas, where she is working on speech driven multi-language NLU applications with special emphasis on the Greek language.

Dr. Heli Harrikari, obtained her MA and Ph.D. degrees in linguistics from the University of Tampere, Finland in 1996, and the University of Helsinki, Finland in 2000 respectively. In addition, she has conducted extensive studies and research both in the University of Texas at Austin and the University of Massachusetts at Amherst, USA. Her academic work and publications have focused on the fields of phonology and phonetics, most extensively on theoretical phonology and various prosodic issues in speech processing. She joined Speech and Audio Systems Laboratory of Nokia Research Center in Helsinki in 2000, and has since worked on spoken dialog systems with natural language understanding capabilities.

Jan Stadermann was born in Hilden, Germany in 1974. He received the Diploma degree in electrical engineering in 1999 from Duisburg University, Germany. Since February 2000 he works there as researcher in the Department of Computer Science. His main interests are hybrid speech recognition systems and natural language understanding.

^☆: The article represents the view of the respective authors. It is an extended version of the proceedings of NLDB'01 [3].

View full text