Elsevier

Signal Processing

Volume 88, Issue 6, June 2008, Pages 1528-1538
Signal Processing

Optimum switched split vector quantization of LSF parameters

https://doi.org/10.1016/j.sigpro.2008.01.001Get rights and content

Abstract

We address the issue of rate–distortion (R/D) performance optimality of the recently proposed switched split vector quantization (SSVQ) method. The distribution of the source is modeled using Gaussian mixture density and thus, the non-parametric SSVQ is analyzed in a parametric model based framework for achieving optimum R/D performance. Using high rate quantization theory, we derive the optimum bit allocation formulae for the intra-cluster split vector quantizer (SVQ) and the inter-cluster switching.

For the wide-band speech line spectrum frequency (LSF) parameter quantization, it is shown that the Gaussian mixture model (GMM) based optimum parametric SSVQ method provides 1 bit/vector advantage over the non-parametric SSVQ method.

Introduction

Most of the speech coders use linear prediction (LP) analysis and thus, more effective scheme of quantizing the LP coefficients (LPCs), equivalently line spectrum frequencies (LSFs), is in great demand. Vector quantization (VQ) of LSFs is the best way to reach lowest bitrate, but the prohibitive complexity of a full-search VQ limits its usage. Many different product code VQ methods [1], [2], [3], [4], [5] have been reported for LSF coding, which reduce complexity with a moderate loss of quantization performance. One of the widely reported techniques is split vector quantization (SVQ) method which was first proposed by Paliwal and Atal [6] for telephone-band speech and then further explored for wide-band speech [7], [8], [9]. Recently, So and Paliwal have proposed switched split vector quantization (SSVQ) method [10], [11] which is shown to provide a better R/D performance than the traditional SVQ method, for both telephone-band and wide-band speech cases. The SSVQ is further explored in [12], [13] to show its competitive performance advantage over many other product code VQ methods.

The SSVQ is a non-parametric product code VQ method, where the vector space is divided into non-overlapping Voronoi regions1 and a separate SVQ is designed for each region. Thus, the SSVQ is composed of multiple SVQs. An input vector to be quantized is first classified to a Voronoi region and then the region specific SVQ is used for quantization. Though the SSVQ provides better rate–distortion (R/D) performance than the SVQ, it does not address the optimality of its R/D performance.

Currently there is a growing interest to develop parametric pdf based quantization methods using Gaussian mixture model (GMM), such that the optimum R/D performance can be achieved at a given bitrate [14], [15], [16], [17]. In this paper, we address the R/D performance optimality of SSVQ in a GMM based framework. We derive the optimum bit allocation criteria for both the stages of quantization, referred to as inter-cluster and intra-cluster bit allocation. We resort to the fixed and variable bitrate schemes of [16] for inter-cluster bit allocation. For intra-cluster bit allocation, we derive the R/D performance expression of the region specific optimum SVQ method using high rate quantization theory. We use square Euclidean distance (SED) as the distortion measure for ease of analysis. Focusing on wide-band speech LSF quantization, we show that the optimum parametric SSVQ method provides 1 bit/vector advantage over the non-parametric SSVQ method.

Section snippets

Preliminaries

For a source pdf given by fg(g), the high rate quantization distortion (mean square error), using a VQ is given by [18]:DN-2/h1πhh+2h2Γh22/h[fg(g)]h/h+2dg(h+2)/h,where N=2bg is the number of Voronoi regions and bg is the allocated bits/vector to quantize the source; h is the dimension of vector g and Γ(.) is the usual gamma function.

Let us consider the lower bound (equality in Eq. (1)) for a multi-variate Gaussian source. Suppose, fg(g) is multi-variate Gaussian distributed as fg(g)=N(μg,Cg).

Optimum SVQ

The SSVQ consists of multiple SVQs. Thus, we first address the R/D performance optimality of SVQ method using high rate quantization theory. For analyzing the optimum SVQ, let us assume that the source vector is multi-variate Gaussian distributed. This assumption is well justified in the context of analyzing SSVQ, since the SVQ is applied to a subset of data, occurring within a Voronoi region. Also, let that the source be quantized using c bits/vector.

Let X be the p-dimensional vector which is

Optimum switched split VQ

In this section, we address the R/D performance optimality of the SSVQ method using a parametric model of the source pdf. The basis of the SSVQ method is to populate the vector space with M number of SVQs and switching to one of them for quantization, based on a nearest neighbor criterion [12], [13]. While SSVQ is shown to be better than SVQ, the issue of R/D performance optimality has not been addressed so far. We address this issue using GMM based framework for the source signal. Each

Quantization experiments

To test the LSF quantization performance, we consider wide-band speech LSFs. The speech data used in the experiments are from TIMIT database. The specification of AMR-WB speech codec [19] is used to compute the 16th order LPCs which are then converted to LSFs. We briefly describe the LPC analysis method in AMR-WB speech codec [19]. The 16 kHz speech is processed in two sub-bands, 0.05–6.4 and 6.4–7 kHz, to allocate the bits optimally according to the subjective importance of the lower band. In

Conclusion

We address the rate–distortion (R/D) performance optimality of the recently proposed switched split VQ (SSVQ) method. Using the GMM based framework, the optimality of SSVQ is addressed using a linearized approximation to the total average distortion. These result in optimum inter-cluster and intra-cluster bit allocation schemes. For wide-band speech LSF quantization, we show that the new parametric optimum SSVQ methods perform better than the non-parametric SSVQ method.

References (21)

There are more references available in the full text version of this article.
View full text