The way we were: Speech technology, platforms and applications in the `Old' AT&T

https://doi.org/10.1016/S0167-6393(97)00035-6Get rights and content

Abstract

The last several years have been an exciting time at AT&T in the field of advanced speech applications for telecommunications: technical progress and platform/processor advances have enabled the identification, development and testing of a range of new services. During this period, prior to the divestiture of AT&T of Lucent Technologies and NCR, AT&T brought together, under a single corporate `roof', a research laboratory committed to advancing speech technology, business organizations building platforms to leverage this technology for telecommunications applications, and yet other business organizations with responsibility for deploying speech-enabled services to facilitate the use and reduce the cost of telecommunications services for both consumers and businesses. While this period of our corporate history has drawn to a close, we can look back to provide an overview of how technical progress, platform advances and network services needs and opportunities interacted to make speech technology an everyday experience for millions of people – and some of the lessons we learned along the way.

Résumé

Les dernières années ont été très intéressantes chez AT&T dans le domaine des applications avancées du vocal en télécommunications: des progrès techniques et des avancés au niveau des processeurs/plateformes ont permis l'identification, le développement et le test d'un ensemble de nouveaux services. Pendant cette période, et avant la séparation d'AT&T, de Lucent Technologies et NCR, AT&T rassemblait, sous un même `toit', un laboratoire de recherche engagé sur les technologies vocales avancées, des organisations d'affaires construisant des plateformes pour pousser ces technologies dans des applications en télécommunications, et d'autres organisations d'affaires ayant la responsabilité du déploiement de services à technologies vocales pour faciliter l'usage et réduire le coût des services de télécommunications pour les abonnés et les professionnels. Maintenant que cette période de notre histoire commune prend fin, nous pouvons regarder en arrière pour dresser une vue d'ensemble sur comment les progrès techniques, les avancées au niveau des plateformes et les besoins et occasions au niveau des services centralisés (à travers un réseau téléphonique) ont interagi pour que la technologie vocale devienne une expérience quotidienne pour des millions de gens – et pour décrire quelques leçons que nous avons appris au cours de ce chemin.

Introduction

Speech communication is the foundation of civilization. The use of tools built a technological society. The ability to converse with our tools has been little noted but will a have far-reaching impact. We have crossed a great technological threshold but it is anticlimactic. Science fiction has presaged it for decades and people communicate by speech so effortlessly that most do not think of hearing and understanding as a difficult problem. In the future, customers will want to interact with their machines as extensions of themselves in every possible way. Although now in an embryonic stage, the technology for humans to direct machines by speaking to them will be profoundly powerful and valuable.

This paper describes several successful services offered by AT&T based on speech recognition. A key element of developing successful services, that will not be covered, is managing the customer's expectations. When non-speech-research people hear the term automatic speech recognition their expectations are set by what they are familiar with – human speech recognition – against which the present state of the art pales in comparison. Achieving a successful service requires acquainting the buyer of the service and users of the service with the constraints of the technology so that they are not disappointed that it is not a complete human replacement.

Section snippets

Methodology

During our development of numerous successful speech technology applications both in the network and in commercial applications, the need for both technical and market trials to get the highest level of applications and technology performance has become increasingly evident. Whether the speech technology (be it speech recognition, speech synthesis, or speaker verification) can handle the task at hand to the customer's satisfaction cannot be determined in the laboratory. Frequently, laboratory

VRCP extensions

The basic automation of the telephone operator's task in the AT&T network by the project known as Voice Recognition Call Processing (VRCP) has been previously documented (Longenbaker et al., 1994). The initial system, first deployed in 1992, featured, ostensibly, a recognition vocabulary of {collect, calling card, third number, person and operator}. The vocabulary was actually somewhat larger, to allow for some common vocabulary synonyms, like collect call, third party and person to person, but

AT&T Voice Line™

The AT&T Voice Line™ trial (AVL), and later service, offered sophisticated speech technology for subscribers of the service. It was initially conceived to trial, for selected customers, custom calling features, such as, Voice Dialing from a Personal Calling List, secure account access though account numbers and speaker verification, Voice digit dialing, Voice Commands to control administration of personal account, and recognition of some globally known keywords to access features generally

Universal Card Services™

At AT&T's Universal Card Service any holder of a Universal Card can call to request financial information. Single digit speech recognition is used to traverse menu trees to locate the correct information source. Callers speak their 16 digit Universal Card number to the system to gain access to their accounts. Once an account number is recognized, the caller is asked to speak his or her five digit ZIP code as further confirmation of access privilege before information, like an account balance,

Other network applications of speech recognition

We have experience with other telephone network services that incorporate speech recognition and, like VRCP, the Voice Dialing system, and the transactions at Universal Card Services these services also require the Speech Recognition (SR) technology to be very robust to extraneous speech, non-responsive inputs, and varying levels of background noise.

The AT&T True Ties™ service (personal 800 numbers) uses speech recognition for entry of Personal Identification Numbers (PIN). For 800 numbers the

Implementation

All of the services described in this paper were implemented using Lucent Technologies' INTUITY™ CONVERSANT™ product, a voice processing system based on many open standards. This system can stand alone or be used as a building block to assemble large-scale services that connect to various network switches. Within the system, a set of signal processing boards is used to perform the real-time recognition in a multi-channel environment. Similar configurations are employed in both network and

Future directions

Speech communication with machines is profound. It allows many practical tasks both to lower costs of existing services and to provide new services previously uneconomical because they required a person.

The ultimate goal is to provide a machine replacement that is practically indistinguishable from a person, regardless of the application. We are a long way from that goal. However, we will be able to approach the goal in a few well defined and well designed applications. The speech technology we

References (4)

  • Huang, B.H., Perdue, R.J., Thomson, D.L., 1995. Deployable automatic speech recognition systems: Advances and...
  • Longenbaker, W.E., Perdue, R.J., Salchenberger, S.M., 1994. Automation of operator services: A successful application...
There are more references available in the full text version of this article.

Cited by (0)

View full text