A PC-based very large vocabulary isolated word speech recognition system

doi:10.1016/0167-6393(90)90027-7

Speech Communication

Volume 9, Issues 5–6, December 1990, Pages 521-530

https://doi.org/10.1016/0167-6393(90)90027-7 Get rights and content

Abstract

The Speech Recognition Group at Olivetti has designed an isolated word recognition system for Italian which can accept a very large vocabulary in excess of 60,000 words and recognize words in real time. The system fits into a 386-based personal computer and runs under MS-DOS. Among its most appealing features are the very quick and application-independent procedure for training the system to the user's voice and the easy way in which application dictionaries can be created. What makes short training possible is the fact that the system extracts target spectral distributions of phonemes from the training words and combines this speaker-dependent information with speaker-independent information concerning other aspects of the speech signal such as energy profile, phoneme durations, and phoneme similarity. The system accepts natural language without any restriction and can be used to dictate documents of any kind, provided the dictionary of the application domain is specified. Dictation, however, is not the only application we envisage. In this paper we describe our approach to speech recognition and provide some information on the actual implementation and performance of our system.

Zusammenfassung

Die Spracherkennungsgruppe der Fa. Olivetti hat ein Einzelworterkennungssystem für das Italienische entwickelt, das für ein sehr großes Vokabular (mehr als 60.000 Wörter) ausgelegt ist und die Wörter in Echtzeit erkennt. Das System läuft auf einem Arbeitsplatzrechner (386-AT) mit dem Betriebssystem MS-DOS. Eines der interessanten Merkmale des neuen Systems besteht darin, daß es, obwohl es im Prinzip sprecherabhängig ist, sich sehr schnell an einen neuen Sprecher adaptiert. Ein neuer Sprecher ist lediglich gehalten, 40 Wörter auszusprechen. Die spektrale Information, die das System aus diesen 40 Wörtern bezieht, stellt die einzige sprecherabhängige Information im gesamten Spracherkennungssystem dar; vervollständigt wird sie durch signifikante statistische Daten, die sich beretis im Rechner befinden und für eine größere Anzahl von Sprechern repräsentativ sind. Das System akzeptiert natürliche Sprache ohne Einschränkungen und kann dazu verwendet werden, Dokumente beliebigen Inhalts zu diktieren. Diktate sind jedoch nicht die einzigen Anwendungen, die wir im Auge haben. In diesem Beitrag beschreiben wir unser Verfahren der Spracherkennung und stellen einige Informationen zum aktuellen Stand der Implementierung bereit.

Résumé

Le groupe de reconnaissance de la parole d'Olivetti a réalisé un système de reconnaissance de mots isolés pour l'italien. Le système peut supporter un très large vocabulaire (plus de 60.000 mots) et travaille en temps réel. Le système tient dans un ordinateur personnel - type 386 et fonctionne sous MS-DOS. Parmi ses caractéristiques les plus intéressantes, nous pouvons citer la capacité d'adapter rapidement le système au locuteur et ce, de manière indépendante de l'application, ainsi que la facilité de créer des dictionnaires spécifiques aux applications. L'adaptation est rapide parce que le système extrait les distributions spectrales-cibles des phonèmes à partir des mots d'entraînement et combine cette information avec des données indépendantes du locuteur concernant d'autres aspects du signal, comme le profil énergétique, la durée des phonèmes et leur similarité. Le système accepte le langage naturel sans aucune restriction et peut être utilisé pour dicter des documents de n'importe quelle nature, pour autant que le dictionnaire du domaine d'application soit spécifié. La dictée n'est toutefois pas la seule application envisagée. Dans cet article, nous décrivons notre approche de la reconnaissance de la parole et fournissons quelques informations sur l'implémentation et les performances actuelles de notre système.

References (7)

R. Billi et al.
Word preselection for large vocabulary speech recognition
R. Billi et al.
A PC-based very large vocabulary isolated word speech recognition system
D. Burr et al.
Array configurations for dynamic time warping

There are more references available in the full text version of this article.

Cited by (2)

Towards speech technology for south african languages: Automatic speech recognition in xhosa
1999, South African Journal of African Languages
A 46,500-word Chinese Speech Recognition System
1992, 2nd International Conference on Spoken Language Processing, ICSLP 1992

View full text

Speech recognitionA PC-based very large vocabulary isolated word speech recognition system

Abstract

Zusammenfassung

Résumé

Word preselection for large vocabulary speech recognition

A PC-based very large vocabulary isolated word speech recognition system

Array configurations for dynamic time warping

Speech recognition
A PC-based very large vocabulary isolated word speech recognition system