Abstract
Speech and language processing systems can be categorised according to whether they make use of predefined linguistic information and rules or are data driven and therefore exploit machine learning techniques to automatically extract and process relevant units of information which are then indexed and retrieved as appropriate. As an example, most state of the art automatic speech processing systems rely on a representation based on predefined phonetic symbols. The use of language dependent representations, whilst linguistically intuitive, has several drawbacks i.e. portability across languages, development time. Therefore, in this article, we review and present our recent experiments exploiting the idea inherent in the ALISP (Automatic Language Independent Speech Processing) approach, with particular respect to speech processing, where the intermediate representation between the acoustic and linguistic levels area is automatically inferred from speech data. We then present prospective directions in which the ALISP principles could be exploited by different domains such as audio, speech, text, image and video processing.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice Conversion Through Vector Quantization. In: Proceedings ICASSP, New York, pp. 565–568 (1988)
Aho, A.V.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
Ahlbom, G., Bimbot, F., Chollet, G.: Modeling Spectral Speech Transitions using Temporal Decomposition Techniques. In: Proceedings IEEE ICASSP, Dallas, pp. 13–16 (1987)
Aleksic, P., Williams, J., Katsaggelos, A.: Speech-To-Video Synthesis Using MPEG-4 Compliant Visual Features. IEEE Trans. Circuits and Systems for Video Technology 14(5), 682–692 (2004)
Ammicht, E., Gorin A.L., Alonso T.: Knowledge Collection for Natural Spoken Dialog Systems. In: Proceedings EUROSPEECH, Budapest, Hungary (1999).
Atal B.: Efficient Coding of LPC Parameters by Temporal Decomposition. In: Proceedings ICASSP, pp. 81–84 (1983)
Baudoin, G., Cernocky, J., Chollet, G.: Quantization of Spectral Sequences using Variable Length Spectral Segments for Speech Coding at Very Low Bit Rate. In: Proceedings EUROSPEECH, Rhodes, pp. 1295–1298 (1997)
Baudoin, G., Cernocky, J., Gournay, P., Chollet, G.: Codage de la parole à bas et très bas débit. Annales des télécommunications 55, 462–482 (2000)
Baudoin, G., Cernocky, J., El Chami, F., Charbit, M., Chollet, G., Petrovska- Delacretaz, D.: Advances in Very Low Bit Rate Speech Coding using Recognition and Synthesis Techniques. In: Proceedings of the 5th Text, Speech and Dialog Workshop, Brno, pp. 269–276. Czech Republic (2002) ISBN 3-540-44129-8
Bayer, R., Unterauer, K.: Prefix B-Trees. ACM Transactions on Database Systems 2(1), 11–26 (1977)
Berger, A., Brown, P., Della Pietra, S., Della Pietra, V., Gillett, J., Lafferty, J., Mercer, R., Printz, H., Ures, L.: The Candide System for Machine Translation. In: Proceedings of the ARPA Workshop on Human Language Technology (1994)
Bimbot, F., Chollet, G., Deleglise, P., Montacié, C.: Temporal Decomposition and Acoustic-Phonetic decoding of Speech. In: Proceedings IEEE ICASSP, New York, pp. 445–448 (1988)
Bimbot, F., Deleglise, P., Chollet, G.: Speech Synthesis by Structured Segments using Temporal Decomposition. In: Proceedings EUROSPEECH, Paris, pp. 183–186 (1989)
Bimbot, F., Pieraccini, R., Levin, E., Atal, B.: Variable Length Sequence Modelling: Multigrams. IEEE Signal Processing Letters 2(6), 111–113 (1995)
Black, E., Jelinek, F., Lafferty, J.D., Magerman, D.M., Mercer, R.L., Roukos, S.: Towards History-Based Grammars: Using Richer Models for Probabilistic Parsing. In: Proceedings DARPA Speech and Natural Language Workshop, Harriman, NY, pp. 134–139 (1992)
Black, A., Brown, R.D., Frederking, R., Singh, R., Moody, J., Steinbrecher, E.: TONGUES: Rapid Development of a Speech-to-Speech Translation System. In: Proceedings of HLT 2002: Second International Conference on Human Language Technology Research, San Diego, CA , pp. 24–27 (2002)
Blouet, R., Mokbel, C., Mokbel, H., Sanchez-Soto, E., Chollet, G., Greige, H.: BECARS: A Free Software for Speaker Verification. In: Proceedings ODYSSEY 2004 - The Speaker and Language Recognition Workshop, Toledo, Spain, pp. 145–148 (2004)
Bregler, C., Covell, M., Slaney, M.: Video Rewrite: Driving Visual Speech with Audio. In: Proceedings ACM SIGGRAPH 1997 (1997)
Brown, P.F., Della Pietre, S.A., Della Pietra, V.J., Mercer, R.: Word-Sense Disambiguation using Statistical Methods. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, pp. 264–270 (1991)
Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Mercer, R., Roossin, P.: A Statistical Approach to Language Translation. In: Coling Budapest: Proceedings of the 12th International Conference on Computational Linguistics, Budapest, Hungary, pp. 71–77 (1998)
Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J., Mercer, R.L., Roossin, P.S.: A Statistical Approach to Machine Translation. Computational Linguistics 16, 79–85 (1990)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19, 263–311 (1993)
Brown, R.D.: Example-Based Machine Translation in the PANGLOSS System. In: COLING 1996: The 16th International Conference on Computational Linguistics, Copenhagen, Denmark, pp. 169–174 (1996)
Brown, R.D.: Automated Dictionary Extraction for Knowledge-Free Example- Based Translation. In: Proceedings of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation, Santa Fe, New Mexico, pp. 111–118 (1997)
Brown, R.D., Frederking, R.E.: Applying Statistical Language Modelling to Symbolic Machine Translation. In: Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, Leuven, Belgium, pp. 354–372 (1995)
Cappe, O., Stylianou, Y., Moulines, E.: Statistical Methods For Voice Quality Transformation. In: Proceedings of EUROSPEECH 1995, Madrid, Spain, pp. 447–450 (1995)
Carpenter, G., Grossberg, S.: A Massively Parallel Architecture for a Self- Organizing Neural Pattern Recognition Machine. Proceedings of Computer Vision, Graphics and Image Processing 37, 54–115 (1987)
Casacuberta, F., Vidal, E., Vilar, J.-M.: Architectures for Speech-to-Speech Translation using Finite-State Models. In: Proceedings of the Workshop on Speech-to- Speech Translation: Algorithms and Systems, Philadelphia, pp. 39–44 (2002)
Cernocky, J., Baudoin, G., Chollet, G.: Speech Spectrum Representation and Coding using Multigrams with Distance. In: Proceedings IEEE ICASSP, Munich, pp. 1343–1346 (1997)
Cernocky, J., Baudoin, G., Chollet, G.: Segmental Vocoder - Going Beyond the Phonetic Approach. In: Proceedings IEEE ICASSP, Seattle, pp. 605–608 (1998) ISBN 0-7803-4428-6
Cernocky, J., Baudoin, G., Chollet, G.: Very Low Bit Rate Segmental Speech Coding using Automatically Derived Units. In: Proceedings RADIOELEKTRONIKA, Brno, Czech Republic, pp. 224–227 (1998) ISBN 80-214-0983-5
Cernocky, J., Petrovska-Delacretaz, D., Pigeon, S., Verlinde, P., Chollet, G.: A Segmental Approach to Text-Independent Speaker Verification. In: Proceedings EUROSPEECH, Budapest, vol. 5, pp. 2203–2206 (1999)
Cernocky, J., Kopecek I., Baudoin, G., Chollet, G.: Very Low Bit Rate Speech Coding: Comparison of Data-Driven Units with Syllable Segments. In: Proceedings of the Text, Speech and Dialog Workshop, Pilsen, Czech Republic, pp. 257–262 (1999) ISBN 3-540- 66494-7
Cernocky, J., Baudoin, G., Petrovska-Delacretaz, D., Chollet, G.: Vers une analyse acoustico-phonétique de la parole indépendante de la langue, basée sur ALISP. Revue Parole 17, 191–226 (2001) ISSN 1373-1955
Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)
Charniak, E.: Statistical Parsing with a Context-Free Grammar and Word Statistics. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI 1997), Menlo Park, CA, pp. 598–603 (1997)
Chollet, G., Galliano, J.-F., Lefevre, J.-P., Viara, E.: On the Generation and Use of a Segment Dictionary for Speech Coding, Synthesis and Recognition. In: Proceedings IEEE ICASSP, Boston, pp. 1328–1331 (1983)
Chollet, G., Grenier, Y., Marcus, S.: Segmentation and Non-Stationary Modeling of Speech. In: Proceedings EUSIPCO, The Hague (1986)
Chollet, G., Cernocky, J., Constantinescu, A., Deligne, S., Bimbot, F.: Toward ALISP: Automatic Language Independent Speech Processing. In: Ponting, K., Moore, R. (eds.) Computational Models for Speech Pattern Processing, pp. 375–387. Springer, Heidelberg (1999) ISBN 3-540-65478-X
Chollet, G., Cernocky, J., Gravier, G., Hennebert, J., Petrovska-Delacretaz, D., Yvon, F.: Toward Fully Automatic Speech Processing Techniques for Interactive Voice Servers. In: Chollet, G., Di Benedetto, M.-G., Esposito, A., Marinaro, M. (eds.) Speech Processing, Recognition and Artificial Neural Networks, Springer, Heidelberg (1999)
Chollet, G., Cernocky, J., Baudoin, G.: Unsupervised Learning for Very Low Bit Rate Coding. In: Proceedings of SCI-ISAS 2000, Orlando (2000)
Chu-Carroll, J., Carpenter, B.: Vector-based Natural Language Call Routing. Computational Linguistcs 25(3), 361–388 (1999)
Church, K.: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings Second Conference on Applied Natural Language Processing, ACL, Austin, Texas, pp. 136–143 (1988)
Collins, B., Cunningham, P.: Adaptation Guided Retrieval in EBMT: A Case- Based Approach to Machine Translation. In: Smith, I., Faltings, B.V. (eds.) EWCBR 1996. LNCS, vol. 1168, pp. 91–104. Springer, Heidelberg (1996)
Cutting, D., Pedersen, J.: Optimizations for Dynamic Inverted Index Maintenance. In: Proceedings 13th International Conference on Research and Development in Information Retrieval, Brussels, Belgium, pp. 405–411 (1990)
Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A Practical Part-of-Speech Tagger. In: Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 133–140 (1992)
Daelemans, W., Zavrel, J., Berck, S.: MBT: A Memory Based Part of Speech Tagger-Generator. In: Proceedings of the 4th Workshop on Very Large Corpora, Copenhagen, Denmark, pp. 14–27 (1996)
Dagan, I., Perreira, F., Lee, L.: Similarity Based Estimation ofWord Co-occurence Probabilities. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, pp. 272–278 (1994)
Damper, R.I. (ed.): Data Driven Techniques in Speech Synthesis. Kluwer, Dordrecht (2001)
Deligne, S., Bimbot, F.: Language Modeling by Variable Length Sequences: Theoretical Formulation and Evaluation of Multigrams. In: Proceedings ICASSP, Munich, pp. 1731–1734 (1997)
Deligne, S., Bimbot, F.: Inference of Variable-length Linguistic and Acoustic Units by Multigrams. Speech Communication 23, 223–241 (1997)
Deligne, S., Yvon, F., Bimbot, F.: Introducing Statistical Dependencies and Structural Constraints in Variable-Length Sequence Models. In: Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences, Montpellier, France, pp. 156–167 (1996)
Doddington, G., Martin, A., Przybocki, M., Reynolds, D.: The NIST Speaker Recognition Evaluation - Overview, Methodology, Systems, Results, Perspectives. Speech Communications 31(2-3), 225–254 (2000)
Dorr, B. J., Jordan, P. W., Benoit, J. W.: A Survey of Current Paradigms in Machine Translation. Technical Report: LAMP-TR-027, UMIACS-TR-98-72, CSTR- 3961, University of Maryland, College Park (December 1998)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley and Sons, Chichester (2001)
Du Jeu, C., Charbit, M., Chollet, G.: Very Low Rate Speech Compression by Indexation of Polyphones. In: Proceedings of EUROSPEECH, Geneva, pp. 1085–1088 (2003)
Eatock, J.P., Mason, J.S.: A Quantitative Assessment of the Relative Speaker Discriminant Properties of Phonemes. In: Proceedings ICASSP, vol. 1, pp. 133–136 (1994)
El Hannani, A., Petrovska-Delacretaz, D., Chollet, G.: Linear and Non-linear Fusion of ALISP- and GMM-Based Systems for Text-Independent Speaker Verification. In: Proceedings of ISCA Workshop: A Speaker Odyssey, Toledo, Spain, pp. 111–116 (2004)
Farinas, J., Obrecht, R.A.: Modélisation phonotactique de grandes classes phonétiques en vue d’une approche différenciée en identification automatique des langues. In: Proceedings 18ème colloque GRETSI sur le traitement du signal et des images, Toulouse, France (2001)
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)
Fukunaga, K.: Statistical Pattern Recognition, 2nd edn. Academic Press, London (1990)
Gailly, J.-L., Nelson, M.: The Data Compression Book. John Wiley and Sons, Chichester (1995)
Gale, W., Church, K.W., Yarowsky, D.: Work on Statistical Methods for Word Sense Disambiguation. In: Proceedings of the AAAI Fall Symposium: Probabilistic Approaches to Natural Language, Cambridge, MA, pp. 54–60 (1992)
Gonnet, G.H., Baeza-Yates, R.: Handbook of Algorithms and Data Structures, 2nd edn. Addison-Wesley, Reading (1991)
Gorin, A.L., Petrovska-Delacrétaz, D., Riccardi, G., Wright, J.H.: Learning Spoken Language without Transcriptions. In: Proceedings IEEE Workshop on Automatic Speech Recognition and Understanding (1999)
Gorin, A.L.: How I Help You? Speech Communication 23, 113–127 (1997)
Gorin, A.L.: On Automated Language Acquisition. Journal of the Acoustical Society of America JASA 97(6), 3441–3461 (1995)
Gorin, A.L., Levinson, S., Sankar, A.: An Experiment in Spoken Language Acquisition. Proceedings IEEE Transactions on Speech and Audio 2, 224–240 (1994)
Haines, D., Croft, W.B.: Relevance Feedback and Inference Networks. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburg, Penn, pp. 2–11 (1993)
Hankerson, D., Harris, G.A., Johnson, P.D.: Introduction to Information Theory and Data Compression. CRC Press, Boca Raton (2003)
Harbeck, S., Ohler, U.: Multigrams for Language Identification. In: Proceedings EUROSPEECH, Budapest, Hungary (1999)
Harman, D., Baeza-Yates, R., Fox, E., Lee, W.: Inverted Files. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, Prentice Hall, Englewood Cliffs (1992)
Ho, Y.: Application of Minimal Perfect Hashing in Main Memory Indexing. MITLCS-TM-508 (1994)
Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer (2001)
Jelinek, F.: Self-Organized Language Modeling for Speech Recognition. In: Waibel, A., Lee, K.F. (eds.) Readings in Speech Recognition, pp. 450–506. Morgan Kaufmann Publishers, San Mateo (1990)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1999)
Kain, A., Macon, M.W.: Spectral Voice Conversion for Text to Speech Synthesis. In: Proceedings ICASSP 88, New York, vol. 1, pp. 285–288 (1998)
Kain, A., Macon, M.W.: Design and Evaluation of a Voice Conversion Algorithm Based on Spectral Envelope Mapping and Residual Prediction. In: Proceedingsd ICASSP 2001, Salt Lake City, USA (2001)
Kaji, H., Kida, Y., Morimoto, Y.: Learning Translation Templates from Bilingual Text. In: Proceedings of the 14th Conference on Computational Linguistics, Nantes, France, vol. 2, pp. 672–678 (1992)
Karam, W., Mokbel, C., Aversano, G., Pelachaud, C., Chollet, G.: An Audiovisual Imposture Scenario by Talking Face Animation. In: Chollet, G., Esposito, A., Faundez, M., Marinaro, M. (eds.) Nonlinear Speech Processing: Algorithms and Analysis, Springer, Heidelberg (2005) (in this volume)
Knuth, D.E.: The Art of Computer Programming. Addison Wesley, Reading (1973)
Kohonen, T.: Self Organizing Maps. Springer, Heidelberg (1995)
Koza, J.R.: Genetic Programming. MIT Press, Cambridge (1992)
Kuo, H.-K.J., Lee, C.-H.: A Portability Study on Natural Language Call Steering. In: Proceedings EUROSPEECH, Aalborg, Denmark (2001)
Lamel, L.F, Gauvain, J.-L., Eskénazi, M.: BREF, A Large Vocabulary Spoken Corpus for French. In: Proceedings of the European Conference on Speech Technology, EUROSPEECH, pp. 505–508 (1991)
Laroche, J., Stylianou, Y., Moulines, E.: HNM: A Simple, Efficient Harmonic Plus Noise Model for Speech. In: Proceedings of IEEE ASSPWorkshop on Applications of Signal Processing to Audio and Acoustics (1993)
Lee, K.-S., Cox, R.V.: A Segmental Speech Coder Based on a Concatenative TTS. Speech Communication 38(1), 89–100 (2002)
Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Cybernetics and Control Theory 10, 707–710 (1966)
Levin, L., Lavie, A., Woszczyna, M., Gates, D., Gavaldá, M., Koll, D., Waibel, A.: The Janus-III Translation System: Speech-to-Speech Translation in Multiple Domains. Machine Translation Archive 15(1-2), 3–25 (2000)
Lloyd-Thomas, H., Parris, E., Wright, J.W.: Recurrent Substrings and Data Fusion for Language Recognition. In: Proceedings ICSLP, Sydney, Australia (1998)
Lowrance, R., Wagner, R.A.: An Extension of the String-to-String Correction Problem. Journal of the Association of Computing Machinery 22(2), 177–183 (1975)
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Martin, A., Przybocki, M.: The NIST Speaker Recognition Evaluations: 1996-2001. In: Proceedings Odyssey 2001, Crete, Greece, pp. 39–42 (2001)
Marcu, D., Wong, W.: A Phrase-Based, Joint Probability Model for Statistical Machine Translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, pp. 133–139 (2002)
Mc-Tait, K.: Translation Patterns, Linguistic Knowledge and Complexity in an Approach to EBMT. In: Carl, M., Way, A. (eds.) Recent Advances in Example-Based Machine Translation, Kluwer Academic Press, Amsterdam (2003)
McTait, K.: Translation Pattern Extraction and Recombination for Example- Based Machine Translation. Ph.D. Thesis, University of Manchester Institute of Science and Technology, Manchester, UK (2001)
McTait, K., Trujillo, A.: A Language-Neutral Sparse-Data Algorithm for Extracting Translation Patterns. In: Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation TMI 1999, Chester, UK, pp. 98–108 (1999)
McTait, K., Olohan, M., Trujillo, A.: A Building Blocks Approach to Translation Memory. In: Proceedings of the 21st ASLIB International Conference on Translating and the Computer, London, UK (1999)
Melamed, I.D.: A Word-To-Word Model of Translation Equivalence. In: 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, pp. 490–497 (1997)
Merialdo, B.: Tagging English Text with a Probabilistic Model. Computational Linguistics 20(2), 155–172 (1994)
Metze, F., McDonough, J., Soltau, H., Waibel, A., Lavie, A., Burger, S., Langley, C., Levin, L., Schultz, T., Pianesi, F., Cattoni, R., Lazzari, G., Mana, N., Pianta, E.: The NESPOLE! Speech-to-Speech Translation System. In: Proceedings of HLT 2002 Human Language Technology Conference, San Diego, CA (2002)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Mitchell, T.M.: Machine Learning and Data Mining. Communications of the ACM 42(11), 30–36 (1999)
Morimoto, T., Takezawa, T., Yato, F., Sagayama, S., Tashiro, M., Nagata, M., Kurematsu, A.: ATR’s Speech Translation System: ASURA. In: Proceedings EUROSPEECH 1993, pp. 1291–1295 (1993)
Nagao, M.: A Framework of a Mechanical Translation between Japenese and English by Analogy Principle. In: Elithorn, A., Banerji, R. (eds.) Artificial and Human Intelligence., pp. 173–180. NATO Publications (1984)
Nakamura, S.: Fusion of Audio-Visual Information for Integrated Speech Processing. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, pp. 127–143. Springer, Heidelberg (2001)
Navrátil, J.: Spoken Language Recognition: A Step Towards Multilinguality. IEEE Trans. Audio and Speech Processing 9(6), 678–685 (2001)
Nevill-Manning, C.G.: Inferring Sequential Structure. PhD Thesis, Univ. of Waikato (1996)
Nirenburg, S., Beale, S., Domashnev, C.: A Full-Text Experiment in Example- Based Machine Translation. In: Proceedings of the International Conference on New Methods in Language Processing (NeMLaP), Manchester, UK, pp. 78–87 (1994)
Nirenburg, S., Domashnev, C., Grannes, D.J.: Two Approaches to Matching in Example-Based Machine Translation. In: Proceedings of the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation, TMI 1993: MT in the Next Generation, Kyoto, Japan, pp. 47–57 (1993)
Olivier, D.C.: Stochastic Grammars and Language Acquisition Mechanism. Ph.D. Thesis, Harvard University (1968)
Pasquariello, S., Pelachaud, C.: Greta: A Simple Facial Animation Engine. In: 6th Online World Conference on Soft Computing in Industrial Applications, Session on Soft Computing for Intelligent 3D Agents (September 2001)
Perrot, P., Aversano, G., Chollet, G., Charbit, M.: Voice Forgery Using ALISP: Indexation in a Client Memory. To appear in proc. of ICASSP 2005
Petrovska-Delacrétaz, D., Černocký, J., Hennebert, J., Chollet, G.: Text-Independent Speaker Verification Using Automatically Labeled Acoustic Segments. In: ICLSP, Sydney, Australia (1998)
Petrovska-Delacretaz, D., Cernocky, J., Hennebert, J., Chollet, G.: Segmental Approaches to Automatic Speaker Verification. Digital Signal Processing: A Review Journal 10(1/2/3), 198–212 (2000)
Petrovska-Delacrétaz, D., Gorin, A.L.,Wright, J.H., Riccardi G.: Detecting Acoustic Morphemes in Lattices for Spoken Language Understanding. In: Proceedings ICSLP, Beijing, China (2000)
Petrovska-Delacretaz, D., Gorin, A.L., Riccardi, G., Wright, J.H.: Detecting Acoustic Morphemes in Lattices for Spoken Language Understanding. In: Proceedings of ICASSP, Beijing, China (2000)
Petrovska-Delacretaz, D., Chollet, G.: Searching Through a Speech Memory for Efficient Coding, Recognition and Synthesis. In: Braun, A., Masthoff, H. (eds.) Phonetics and its Applications, pp. 453–464. Franz Steiner Verlag, Stuttgart (2002) ISBN 8094-5
Petrovska-Delacretaz, D., Abalo, M., El Hannani, A., Chollet, G.: Data-Driven Speech Segmentation for Speaker Verification and Language Identification. In: Proceedings of NOLISP, Le Croisic (2003)
Petrovska-Delacretaz, D., El Hannani, A., Chollet, G.: Searching through a Speech Memory for Text-Independent Speaker Verification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, p. 84. Springer, Heidelberg (2003)
Pighin, F., Szeliski, R., Salesin, D.: Modeling and Animating Realistic Faces from Images. International Journal of Computer Vision 50(2), 143–169 (2002)
Planas, E., Furuse, O.: Formalizing Translation Memory. In: Carl, M., Way, A. (eds.) Recent Advances in Example-Based Machine Translation., Kluwer Academic Press, Amsterdam (2003)
Prudon, R., d’Alessandro, C.: A Selection/Concatenation Text-to-Speech Synthesis System: Database Development, System Design, Comparative Evaluation. In: Proceedings of the 4th Speech Synthesis Workshop, Pitlochy, Scotland (2001)
Przybocki, M., Martin, A.: NIST’s Assessment of Text Independent Speaker Recognition Performance 2002. In: The Advent of Biometrics on the Internet, A COST 275 Workshop in Rome, Italy, November 7-8 (2002)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. DSP, Special Issue on the NIST 1999 Evaluations 10(1-3), 19–41 (2000)
Ribeiro, C.M., Trancoso, I.M.: Improving Speaker Recognisability in Phonetic Vocoders. In: Proceedings of ICSLP, Sydney (1998)
Ribeiro, C.M., Trancoso, I.M.: Phonetic Vocoder Assessment. In: Proceedings ICSLP, Beijing, vol. 3, pp. 830–833 (2000)
Roy, D.: Learning Words from Sights and Sounds: A Computational Model. Ph.D. Thesis, MIT (1999)
Sadler, V., Vendelmans, R.: Pilot Implementation of a Bilingual Knowledge Bank. In: Proceedings of the 13th International Conference on Computational Linguistics, Helsinki, vol. 3, pp. 449–451 (1990)
Salton, G., McGill, M.S.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Sayood, K.: Introduction to Data Compression. Morgan Kaufmann, San Francisco (2000)
Shiraki, Y., Honda, M.: LPC Speech Coding based on VLSQ. Proceedings IEEE Trans. on ASSP 3(9) (1988)
Schroeter, J., Graf, H.P., Beutnagel, M., Cosatto, E., Syrdal, A., Conkie, A., Stylianou, Y.: Multimodal Speech Synthesis. In: Proceedings IEEE International Conference on Multimedia and Expo., NY, pp. 571–578 (2000)
Simard, P.Y., Le Cun, Y., Denker, J.S.: Memory Based Character Recognition using a Transformation Invariant Metric. In: Proceedings of ICPR, Jerusalem, pp. 262–267 (1994)
Simard, M., Langlais, P.: Sub-Sentential Exploitation of Translation Memories. In: MT Summit VIII: Machine Translation in the Information Age, Santiago de Compostela, Spain, pp. 335–339 (2001)
Simons, A., Cox, S.: Generation of Mouth Shapes for a Synthetic Talking Head. Proceedings Inst. Acoust. 12, 475–482 (1990)
Smith, T.C., Witten, I.H.: Learning Language using Genetic Algorithms. In: Wermter, S., Riloff, E., Scheler, G. (eds.) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, pp. 132–145. Springer, NY (1996)
Somers, H., McLean, I., Jones, D.: Experiments in Multilingual Example-Based Generation. In: Proceedings CSNLP 1994: 3rd Conference on the Cognitive Science of Natural Language Processing, Dublin, Ireland
Stolcke, A.: An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Computational Linguistics 21(2), 165–201 (1995)
Stylianou, Y., Cappé, O., Moulines, E.: Statistical Methods for Voice Quality Transformation. In: Proceedings of EUROSPEECH, Madrid, pp. 447–450 (1995)
Stylianou, Y., Cappé, O., Moulines, E.: Continuous Probabilistic Transform for Voice Conversion. Proceedings IEEE Transactions on SAP 6(2), 131–142 (1998)
Suhm, B., Geutner, P., Kemp, T., Lavie, A., Mayfield, L., McNair, A.E., Rogina, I., Schultz, T., Sloboda, T., Ward, W., Woszczyna, M., Waibel, A.: JANUS: Towards Multilingual Spoken Language Translation. In: Proceedings ARPA Spoken Language Technology Workshop, Austin, TX (1995)
Sumita, E., Tsutsumi, Y.: A Translation Aid System Using Flexible Text Retrieval Based on Syntax-Matching. In: TMI 1988 Proceedings Supplement, Pittsburgh (1988) (pages not numbered)
Tamura, M., Masuko, T., Kobayashi, T., Tokuda, K.: Visual Speech Synthesis Based on Parameter Generation from HMM: Speech-Driven and Text-and-Speech- Driven Approaches. In: Proceedings Auditory-Visual Speech Processing (1998)
Thomas, H.L., Parris, E., Wright, J.: Reccurent Substrings and Data Fusion for Language Recognition. In: Proceedings ICASSP 2000, Instanbul, Turkey, vol. 2, pp. 169–173 (2000)
Tomokiyo, M., Chollet, G.: A Proposal to Represent Speech Control Mechanisms within the Universal Networking Digital Language. In: Proceedings of the International Conference on the Convergence of Knowledge, Culture, Language and Information Technologies, Alexandria, Egypt (2003)
Turcato, D.: Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text. In: Proceedings COLING-ACL 1998. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 1299–1305 (1998)
Utsuro, T., Matsumoto, Y., Nagao, M.: Lexical Knowledge Acquisition from Bilingual Corpora. In: Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, COLING 1992, Nantes, France, pp. 581–587 (1992)
Valbret, H., Moulines, E., Tubach, J.-P.: Voice Transformation using PSOLA Technique. In: Proceedings ICASSP 1992, vol. 1, pp. 145–148 (1992)
Valiant, L.G.: A Theory of the Learnable. Communications of the ACM 27(11), 1134–1142 (1984)
Vogel, S., Och, F.J., Tillmann, C., Nießen, S., Sawaf, H., Ney, H.: Statistical Methods for Machine Translation. In: Wahlster, W. (ed.) Verbmobil: Foundations of Speech-to-Speech Translation, Springer, Berlin (2000)
Wahlster, W.: First Results of Verbmobil: Translation Assistance for Spontaneous Dialogues. In: Proceedings ATR International Workshop on Speech Translation, Kyoto, Japan (1993)
Waibel, A., Finke, M., Gates, D., Gavaldà, M., Kemp, T., Lavie, A., Maier, M., Mayfield, M., McNair, A., Rogina, I., Shima, K., Sloboda, T., Woszczyna, M., Zhan, P., Zeppenfeld, T.: Janus II - Advances in Spontaneous Speech Translation. In: Internatational Conference on Acoustics, Speech and Signal Processing, Atlanta, Georgia (1996)
Waibel, A., Jain, A.M., McNair, A.E., Saito, H., Hauptmann, A.G., Tebelskis, J.: JANUS: A Speech-To-Speech Translation System Using Connectionist and Symbolic Processing Strategies. In: ICASSP 1991, Toronto, Canada, vol. 2, pp. 793–796 (1991)
Wang, Y.-Y., Waibel, A.: Modeling with Structures in Statistical Machine Translation. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 1357–1363 (1998)
Wang, Y., Waibel, A.: Decoding Algorithm in Statistical Machine Translation. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics ACL/EACL 1997, Madrid, Spain, pp. 366–372 (1997)
Watanabe, H.: A Method for Extracting Translation Patterns from Translation Examples. In: Proceedings of the 5th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 1993): MT in the Next Generation, Kyoto, Japan, pp. 292–301 (1993)
Williams, J., Katsaggelos, A.: An HMM-Based Speech-to-Video Synthesizer. Proceedings IEEE Transactions on Neural Networks 13(4), 900–915 (2002)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco (1999)
Yamamoto, E., Nakamura, S., Shikano, K.: Lip Movement Synthesis from Speech Based on Hidden Markov Models. Speech Communication 26(12), 105–115 (1998)
Yi, J., Glass, J.: Information-Theoretic Criteria for Unit Selection Synthesis. In: Proceedings of ICSLP, Denver, Colorado, pp. 2617–2620 (2002)
Yvon, F.: Paradigmatic Cascades: A Linguistically Sound Model of Pronunciation by Analogy. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Somerset, NJ, pp. 428–435 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chollet, G., McTait, K., Petrovska-Delacrétaz, D. (2005). Data Driven Approaches to Speech and Language Processing. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_8
Download citation
DOI: https://doi.org/10.1007/11520153_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)