Abstract
The excellent performance in communications quality speech coding below 8 kbps achievable with the code-excited linear prediction (CELP) coders gives to this architecture a predominant role in medium-rate and low-rate speech coding, as evidenced by the adoption of several recent fixed-rate and variable-rate standards. Unfortunately, some of these CELP-based schemes are not completely described in the literature, and consequently they are difficult to understand and implement efficiently. This paper presents an original study of the G723.1 codec. The G723.1 encoder is dedicated to compress the voice signals with bandwidth up to 4 kHz efficiently and to deliver an encoded data stream with a very low binary rate and a good quality of transmitted speech (typical applications being encoding of the vocal signal for video conferences via GSTN and Voice over IP). We perform a detailed and gradually analysis, describing the MP-MLQ/ACELP speech coder from the point of view of a classical CELP structure. This approach allows us to identify (using theoretical considerations) the starting internal structure of each processing block from the encoder scheme. These results are used in breaking the main encoding algorithm loop. Finally, using the previously revealed starting internal structure, we derive the algorithm for the pitch predictor block, which is one of the most difficult parts of the ITU-T G723.1 encoder. The accompanying comments, explanations and diagrams allow efficient implementation and debugging of the corresponding software by regular DSP programmers.
Similar content being viewed by others
References
Atal, B.S., Cuperman, V., and Gersho, A. (1991). Advances in Speech Coding. Boston: Kluwer.
Atal, B.S. and Remde, J.R. (1982).Anewmodel ofLPCexcitation for producing natural-sounding speech at low bit rates. Proceedings of ICASSP'82, pp. 614–617.
Boite, R., Bourlard, H., Dutoit, T., Hancq, J., and Leich, H. (2000). Traitement de la parole. Laussane: Les Presses Polytechniques et Universitaires Romandes.
Deller, Jr., J.R., Proakis, J.G., and Hansen, J.H.L. (1993). Discrete-Time Processing of Speech Signals. New York: Macmillan Publishing Company.
Haykin, S. (1996). Adaptive Filter Theory, 3rd edn. NJ: Prentice-Hall.
Hersent, O. and Gurle, D. (2000). IP Telephony. Packet Based Multimedia Communications Systems. Harlow: Addison-Wesley.
International Telecommunication Union. (1996). ITU-T G723.1- Dual Rate Speech Coder For Multimedia Communications Transmitting at 5.3 and 6.3 kbits/s.
Kroon, P. and Atal, B.S. (1991). On improving the performance of pitch predictors in speech coding systems. In B.S. Atal, V. Cuperman, and A. Gersho (Eds.), Advances in Speech Coding. Boston: Kluwer, pp. 321–327.
Markovic, M. (2001). Advances in speech compression. Tutorials of ICT2001, IEEE, International Conference on Telecommunications, Bucharest, pp. 335–369.
McClellan, S. and Gibson, D. (1996). Lag-indexedVQfor pitch filter coding. Proceedings of ICASSP'96, pp. 236–239.
Paliwal, K. and Atal, B.S. (1993). Efficient vector quantization of LPC parameters at 24 bits/frame. Proceedings of IEEE Transactions on Speech and Audio Processing, 1: 3–14.
Rabiner, L.R. and Schafer, R.W. (1978). Digital Processing of Speech Signals, Englewood Cliffs, NJ: Prentice-Hall.
Ramirez, M.A. and Gerken, M. (1999). A multistage search of algebraic CELP codebooks. CD-ROM Proceedings of ICASSP'99.
Schroeder, M.R. and Atal, B.S. (1985). Code-excited linear prediction (CELP). Proceedings of ICASSP'85, pp. 937–940.
Stanomir, D., Negrescu, C., and Jalbă, L. (1998). Algorithms for Speech Processing (Algoritmi pentru prelucrarea semnalului vocal). Bucharest: Editura Athena.
Veeneman, D. and Mazor, B. (1993). Efficient multi-tap pitch prediction for stochastic coding. In B.S. Atal, V. Cuperman, and A. Gersho (Eds.), Speech and Audio Coding for Wireless Networks. Boston: Kluwer, pp. 225–229.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Negrescu, C., Stanomir, D. & Burileanu, D. On Rationally DSP Implementation of the MP-MLQ/ACELP Dual Rate Speech Encoder for Multimedia Communications. International Journal of Speech Technology 5, 281–300 (2002). https://doi.org/10.1023/A:1020253109447
Issue Date:
DOI: https://doi.org/10.1023/A:1020253109447