Skip to main content
Log in

Codage de la parole a bas et tres bas debits

Speech coding at low and very low bit rates

  • Published:
Annales Des Télécommunications Aims and scope Submit manuscript

Résumé

Cet article présente les principales techniques de codage de parole à bas et très bas debits, de 50 bit/s à 4 000 bit/s. Puis il présente en détail la méthode hsx pour le codage à 1200 bit/s et une nouvelle approche segmentale utilisant des unités acoustiques obtenues de manière non supervisée pour des débits inférieurs à 400 bit/s.

Abstract

This paper reviews the main algorithms for speech coding at low and very low bit rates, from 50 bps to 4 000 bps. Then the hsx technique for coding at 1200 bps and a new segmental method with automatically derived units for very low bit rate coding are presented in details.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bibliographie

  1. Atal (B.S.), Hanauer (S.L.) Speech Analysis and Synthesis by linear Prediction of the speech Wave,J. Acoust. Soc. Amer.,50 n° 2 p. 637–657, 1971.

    Article  Google Scholar 

  2. Atal (B.S.), Efficient coding of lpc parameters by temporal decomposition, InProceedings ieee icassp 83, pp. 1–84, 1983.

    Google Scholar 

  3. Baudoin (G.), Cernocky (J.), Chollet (G.), Quantization of spectral sequences using variable length spectral segments for speech coding at very low bit rate,Proceedings Eurospeech-97, pp. 1295–1298, Rhodes, 1997.

  4. Bimbot (F.) An evaluation of temporal decomposition, Technical report, Acoustic research department at&t Bell Labs, 1990.

  5. Bruhn (S.), Matrix Product Vector Quantization for Very Low bit Rate Speech Coding,Proceedings icassp-95, p. 724–727, 1995.

  6. Cernocky (J.), Baudoin (G.), Chollet (G.), Segmental vocoder - going beyond the phonetic approach,Proceedings icassp98, pp. 605–608, Seattle, 1998.

  7. Cernocky (J.), Baudoin (G.) and Chollet (G.) The use of ALISP for automatic acoustic-phonetic transcription,Proceedings SPoS-ESCA Workshop on Sound Patterns of Spontaneous Speech, pp. 149–152, Aix en Provence, 1998.

  8. Cernocky (J.),Speech Processing Using Automatically Derived Segmental Units: Applications to Very Low Rate Coding and Speaker Verification, PhD thesis, Université Paris XI Orsay, 1998.

  9. Cernocky (J.), I. Kopecek, Baudoin (G.), and Chollet (G.), Very low bit rate speech coding: comparison of data-driven units with syllable segments, InProceedings of Workshop on Text Speech and Dialogue (TSD’99), Lecture notes in computer science, Mariànské Làzne, Czech Republic, September 1999. Springer Verlag.

  10. Cheng (Y.M.), O’Shaughnessy (D.), A 450 BPS Vocoder with natural sounding Speech.Proceedings icassp-90, p. 649–652, 1990.

  11. Chollet (G.), Cernocky (J.), Constantinescu, Deligne (S.), and Bimbot (R).Computational models of speech pattern processing, chapter Towards alisp: a proposal for Automatic Language Independent Speech Processing, pp. 375–388. nato asi Series. Springer Verlag, 1999.

  12. Chollet (G.), Cernocky (J.), Gravier (G.), Hennebert (J.), Petrovska (D.), Yvon (E), Toward Fully Automatic Speech Processing Techniques for Interactive Voice Servers, inSpeech Processing, Recognition and Artificial Neural Networks, Chollet (G.), Benedetto (M-G), Esposito (A.), Marino (M.) eds, Springer Verlag. 1999.

  13. Chou (P.A.), Lookabaugh (T.), Variable dimension vector quantization of linear predictive coefficients of speech.Proceedings icassp-94. pp. I-505–508, Adelaide, 1994.

  14. Crosmer (J.R.), Barnwell (T.P.), A Low Bit Rate Segment Vocoder Based on Line Spectrum Pairs,Proceedings icassp-85 pp. 240–243, 1985.

  15. Deligne (S.),Modèles de séquences de longueurs variables: Application au traitement du langage écrit et de la parole, PhD thesis, École nationale supérieure des télécommunications (ENST), Paris, 1996.

  16. Fette (B.), Jaskie (C), A 600 bps lpc Voice Coder,Proceedings milcom-91, pp. 1215–1219, 91.

  17. Flanagan (J.-L.), Springer Verlag.Speech Analysis, Synthesis and Perception New York, 1965, 2nd ed. 1972.

  18. Gersho (A.),Vector Quantization and Signal Compression Kluwer Academic Publisher 1996.

  19. Gersho (A.), Advances in speech and audio compression,Proceedings ieee, 82(6):900–918, june 1994.

    Article  Google Scholar 

  20. Gibbon (D.), Moore (R.), and Winski (R.), editors,EAGLES Handbook on Spoken Language Systems, Mouton de Gruyter, 1997.

  21. Gournay (P.), Charter (F.), A 1 200 bps hsx speech coder for very low bit rate communications,IEEE Workshop on Signal Processing System SiPS’98, Boston, 1998.

  22. Griffin (D.W.) andLim (J.S.), « Multiband Excitation Vocoders »IEEE Trans, on Acoustics, Speech, and Signal Processing,36, n° 8, pp. 1223–1235, 1988.

    Article  MATH  Google Scholar 

  23. Guilmin (G.), Le Bouquin-Jeannes (R.) etGournay (P.), Study of the influence of noise pre-processing on the performance of a low bit rate parametric speech coder,Eurospeech’99,5, pp. 2367–2370, Budapest 1999.

    Google Scholar 

  24. Ismail (M.) and Ponting (K.), Between recognition and synthesis 300 bps speech coding. InProceedings Eurospeech-97, pp. 441–444, Rhodos, 1997.

  25. iso/iec jtc1/sc29/wg11 N2503-sub2, « Final Draft International Standard of iso/iec 14496-3 Subpart 2 », octobre 1998.

  26. Jaskie (C), Fette (B.), A survey of low bit rate vocoders,dsp & Multimedia Technology, p 26–40, apr. 94.

  27. Jeanrenaud (P.), Peterson (P.), Segment Vocoder Based on Reconstruction with Natural SegmentProceedings icassp-91, pp. 605–608, 1991.

  28. Jelinek (M.), Baudoin (G.), Excitation Construction for the robust celp coder, InSpeech Recognition and Coding, new advances and trends. Springer Verlag, nato asi Serie F., Ed. par A. Rubio & J.-M. Lopez, pp. 439–443, 1995.

  29. Kang (G.S.), Fransen (I.J.), Application of Line Spectrum Pairs to Low-Bit Rate Speech Encoders,Proceedings icassp-85. pp. 244–247, 85.

  30. Kemp (D.P.), Collura (J.S.), Tremain (T.E.), Multiframe Coding of lpc Parameters at 600-800 bps,Proceedings icassp-91, pp. 609–612,91.

  31. Kleijn (W.) Encoding Speech Using Prototype Waveforms,ieee Trans. Speech Audio Processing,1, n° 4, pp. 386–399, 1993.

    Article  Google Scholar 

  32. Kleijn (W.B.), Haagen (J.), A Speech Coder based on Decomposition of Characteristic Waveforms,Proceedings icassp-95, pp. 508–511, 1995.

  33. Kleijn (W.B.), Haagen (J.), « Waveform Interpolation for Coding and Synthesis », inSpeech Coding and Synthesis, edited by Kleijn (W.B.) and Paliwal (K.K.), Elsevier 1995.

  34. Laflamme (C), Salami (R.), Matmti (R.), and Adoul (J.-R), « Harmonic Stochastic Excitation (hsx) speech coding below 4 kbps »,IEEE International Conference on Acoustics, Speech,and Signal Processing, Atlanta, may 1996, pp. 204–207.

  35. Linde (Y.), Buzo (A.), Gray (R.M.), Algorithm for Vector Quantization Design,IEEE trans, on communications, 28, p 84–95, jan. 1980.

  36. Liu (Y.J.), Rothweiler (J.), A High Quality Speech Coder at 400 BPS,Proceedings icassp-89, pp. 204–206, 1989.

  37. Lopez-Soler (E.), Favardin (N.), A combined quantization-Interpolation scheme for Very Low bit rate coding of speech lsp parameters,Proceedings icassp-93, p.II-21–24, 1993.

  38. McAulay (R.), Quatieri (T.), Speech Analysis/Synthesis based on a sinusoïdal representation of speech,ieee trans. ASSP-34, n° 4, pp. 744, 1985.

    Google Scholar 

  39. McAulay (R.), Champion (T.), Improved Interoperable 2.4 kbps LPC Using Sinusoidal Transform Coder techniques,Proceedings icassp-90, pp. 641–643, 1990.

  40. McAulay (R.), Quatieri (T.), Multirate Sinusoïdal Transform Coding at Rates from 2.4 kbps to 8kbps,Proceedings icassp-87, Dallas, 1987.

  41. McAulay (R.), Quatieri (T.), Sine-Wave Phase Coding at Low Data Rates,Proceedings icassp-91, pp. 577–580, 1991.

  42. McCree (A.), Truong (K.), George (E.B.), Barnwell (T.P.), Viswanathan (V.), A 2.4 Kbits/s melp Coder Candidate for the New U.S. Federal Standard,Proceedings icassp-96, pp. 200–203, 1996.

  43. Mouy (B.), de La Noue (P.) and Goudezeune (G.), « nato sta- nag 4479: A standard for an 800 bps vocoder and channel coding in hf-eccm system »,IEEE International Conference on Acoustics, Speech, and Signal Processing, Detroit, may 1995, pp. 480–483.

  44. Nishiguchi (M.), Inoue (A.), Maeda (Y), Matsumoto (J.), Parametric Speech Coding - hvxc at 2.0-4.0 kbps,Proc ieee Workshop on Speech Coding, 1999.

  45. « Parameters and coding characteristics that must be common to assure interoperability of 2400 bps linear predictive encoded speech », nato Standard STANAG-4198-Edl, 13 february 1984.

  46. Peterson (P.), Jeanrenaud (P.), vandegrift (J.), Improving Intelligibility at 300bps Segment Vocoder,Proceedings icassp-90, pp. 653–656, 1990.

  47. Picone, Doddington (G.R.), A phonetic Vocoder,Proceedings icassp-89, pp. 580–583, 1989.

  48. Potage (J.), Rochette (D.), Mathevon (G.), Speech Encoding Techniques for Low Bit Rate Coding Applicable to Naval Communications,Rev. Tech. Thomson-CSF,18, n° 1 pp. 171-205, mar. 86.

  49. Rabiner (L.) and Juang (B.H.) Fundamentals of speech recognition, Signal Processing. Prentice Hall, Engelwood Cliffs, nj, 1993.

  50. Ribeiro (C.) and Trancoso (M.), Phonetic vocoding with speaker adaptation, InProceedings Eurospeech-97, pp. 1291–1294, Rhodes, 1997.

  51. Rothweiler (J.), Performances of a real time Low Rate Voice Coder.Proceedings icassp-86, pp. 3039–3042, 1986.

  52. Roucos (S.), Schwarz (R.), Makhoul (J.), A segment vocoder at 150 bps,Proceedings icassp-83, pp. 61–64, 1983.

  53. Roucos (S.), Wilgus (A.M.), The Waveform Segment Vocoder: A New Approach for Very Low Rate Speech Coding,Proceedings icassp-85, pp.236–239, 1985.

  54. Roucos (S.), Schwarz (R.), Makhoul (J.), Segment Quantization for very-low rate speech coding,Proceedings icassp-82.

  55. Schroeder (M.R.), Atal (B.), Code-Excited Linear Prediction (celp): High Quality Speech at Very Low Bit Rates,Proceedings ieee icassp-85, pp. 937–940, Tamp, 1985.

  56. Schwartz (R.M.), Roucos (R.M.), A Comparison of Methods for 300-400 B/S Vocoders,Proceedings icassp-83, 83.

  57. Shiraki (Y), Honda (M.), LPC speech coding based on Variable Length Segment Quantization,ieee trans, on assp, vol.36, n° 9, pp. 1437–1444, sept. 1988, pp. 1565-1568, 82.

    Article  MATH  Google Scholar 

  58. Shoham (Y), « Very low complexity interpolative speech coding at 1.2 to 2.4 kbps »,IEEE International Conference on Acoustics,Speech, and Signal Processing, Munich, april 1997, pp. 1599–1602.

  59. Spanias, Speech coding: A Tutorial Review,Proceedings ieee,82(10) 1541–1582, oct. 1994.

  60. Stylianou (Y), Dutoit (T), Schroeter (J.), Diphone concatenation using a Harmonic plus Noise Model of Speech,Proceedings Eurospeech-97, Rhodes, sept. 1997.

  61. Supplee (L.M.), Cohn (R.P.), Collura (J.S.), McCree (A.V.), « melp : The new federal standard at 2400 bps »,ieee International Conference on Acoustics, Speech, and Signal Processing, Munich, April 1997, pp. 1591-1594.

  62. Specifications for the Analog to Digital Conversion of Voice by 2,400 Bit /Second Mixed Excitation Linear Prediction.Federal Information Processing Standards Publication (FOPS PUB) Draft-may 1998.

  63. Tokuda (K.), Masuko (T.), Hiroi (J.), Kobayashi (T), Kitamara (T.), A very low bit rate speech coder using hmm-based speech recognition/synthesis techniques, InProceedings icassp-98, pp. 609–612, 1998.

  64. Tremain (T.E.), The government standard Linear Predictive Coding Algorithm: LPClO.Speech Technology,1, n° 2, pp. 40–49, apr. 1982.

    Google Scholar 

  65. Young (S.), Jansen (J.), Odell (J.), Ollason (D.), Woodland (P.),The HTK book. Entropics Cambridge Research Lab., Cambridge, UK, 1996.

  66. Wong (D.Y.), Juang (B.H.), Cheng (D.Y.), Very Low Data Rate Speech compression using lpc Vector and Matrix Quantization,Proceedings icassp-83, pp. I-65–68, 83.

  67. Le test de diagnostic par paires minimales, adaptation au français duDiagnostic rythm test de W.D. Voiers,Revue d’acoustiques, n° 27, 1973.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Geneviève Baudoin, Jan Cernocky, Philippe Gournay or Gérard Chollet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baudoin, G., Cernocky, J., Gournay, P. et al. Codage de la parole a bas et tres bas debits. Ann. Télécommun. 55, 462–482 (2000). https://doi.org/10.1007/BF02995202

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02995202

Mots clés

Keywords

Navigation