A new algorithm for translating psycho-acoustic information to the wavelet domain
Introduction
Since the 1980's, CD-quality has been the main reference for audio-signal representation. It is defined as an audio signal, where each channel is sampled at 44.1 kHz, and linearly coded with 16 bits/sample. Of course, it results in very high binary rates for the multimedia applications that are appearing nowadays. This fact justifies the use of compression techniques, the objective of which is to achieve a low bit rate in the digital representation of signals with the minimum perceived loss in quality.
A proper choice of the coding technique has to take into consideration the nature of the audio signals. These are neither Gaussian nor stationary. A very important issue in audio coding is the possibility of exploiting the masking phenomenon in the inner ear. To do this, subband decompositions must be used, and each subband should be coded as an isolated signal, ensuring that the introduced quantization noise remains below the masking threshold. The overall binary rate which ensures transparent coding is defined as the ‘perceptual entropy’, proposed by Johnston [9] to be used in audio coding instead of the classical entropy defined by Shannon.
Another important issue is the non-stationary nature of the audio signal, which forces us to use local analysis. Several techniques have been proposed to do that. The most novel is the wavelet transform, and its generalization, the wavelet-packet decomposition. Due to the usual implementation of the wavelet transform as a filter bank, audio coders based on this transform can be understood as subband coders.
Nowadays, the international interest in audio coding is centered on the ISO/MPEG audio standards. The first one, called ISO/MPEG-1 [3], manages sampling rates of 32, 44.1 and 48 kHz. The last one, the ISO/MPEG-4 standard, is composed of several speech and audio coders that support bit rates from kbps per channel.
Parallel to the definition of the ISO/MPEG standards, several audio-coding algorithms have been proposed which use the wavelet transform as the tool for decomposing the signal. The most often cited was designed by Sihna and Tewfik [12]. It is a high-complexity audio coder which provides high compression rates and high-quality reconstructed signals. This perceptual coder includes a method for translating psycho-acoustic information to the wavelet domain. Several drawbacks related to this method have been reported: It is computationally complex, and its simplification implies the use of high-selectivity filters to implement the wavelet transform.
This paper focuses on the problem of translating the psycho-acoustic information to the wavelet domain. We demonstrate that it can be easily achieved using filters that generate wavelets with any number of vanishing moments if several constraints are imposed on the analysis and synthesis filter banks. We also describe a new perceptual-coding algorithm for CD monophonic audio signals that combines simplicity and high compression rates and ensures high perceptual quality of the decoded signal.
The sections of the paper are organized as follows:
Section 2 describes multi-resolution analysis and wavelet transform to show the properties of this transform which make it suitable for audio coding purposes. The way the discrete-time wavelet transform is implemented as an iterated filter bank is very important.
In Section 3, coder and decoder structures are presented, which follow the general scheme of subband coding. However, several original contributions are included, such as an improved psycho-acoustic model, a wavelet-packet decomposition of the audio signal, and the main one, a new method for translating psycho-acoustic information from the Fourier domain to the wavelet one.
Section 4 describes the psycho-acoustic model used. It is based on the model proposed in [9] which is also the base of the second model included in the MPEG-1 standard.
Section 5 describes the criteria for the design of filter-banks used in subband coders. The filter-bank used to decompose the time-domain input data into subband components is crucial if a good performance of an audio coding system is to be ensured [6], [8].
The wavelet-packet decompositions have many advantages, but one serious drawback, which is the low selectivity of the equivalent subband filters. This makes it impossible to apply the estimated psycho-acoustic information directly to the Fourier domain.
Due to this drawback, in order to use a wavelet-transform based audio coder a translation algorithm is necessary. Section 6 includes the description of a new algorithm, discussing the conditions that must be fulfilled for its correct usage.
A double blind test has been carried out to check the behaviour of the audio coder. The results with a wide set of audio signals are presented in Section 7.
Finally, conclusions are extracted and new research lines are presented in Section 8.
Section snippets
Multiresolution analysis and wavelet transform
Multiresolution analysis is the process that allows us to express a function f(t)∈L2(R) as a linear combination of its projections into a set of closed subspaces that fulfil the following properties [1], [5], [10]:
- •
All subspaces Vm are enclosed:
- •
Completeness:
- •
Scaling property:
- •
Base property: there is a scaling function φ(t)∈V0 so that ∀m∈Z, the setis an orthonormal base for Vm, that is
Audio coder and decoder structure
In this section, coder and decoder structures are described. They work with monophonic audio signals, sampled at 44.1 kHz, where each sample has been linearly coded with 16 bits. The objective is to obtain a digital representation of the original signal, with the minimum size, and preserving as much as possible its psycho-acoustical properties. To do that, the following scheme has been implemented (see [15]):(1) The proposed audio coder analyses the signal with a filter bank that
Masking threshold
We have used a psycho-acoustic model similar to the well-known one proposed in [9], which is the basis of the second psycho-acoustic model of the ISO-MPEG-1 standard for audio coding. The only improvement to be introduced estimates the tonality coefficient that must be assigned to each critical band.
To estimate the masking threshold, we first estimate the power spectrum of the audio frame (x[n]). To do that, we use the modified periodogram with a Hann window (w[n])
Filter bank design
The design of the filter-bank structure must follow the general objective of representing the input signal with as low a number of bits as possible. It is desirable to attend to several design goals [8]:(1) The decomposition should be invertible, i.e., the filter bank should be a perfect-reconstruction. (2) Both the analysis filter bank and its inverting process must maintain a high degree of frequency selectivity, in order to make the application of the perceptual threshold as simple as possible.
Algorithm for translating psycho-acoustic information to the wavelet domain
We were interested in developing a procedure which allows the direct translation of masking and auditory thresholds from the Fourier to the wavelet domain, even when low-selectivity filters are used to implement the wavelet transform or the wavelet-packet decomposition. We demonstrate that under several constraints, audio coders based on the wavelet transform can be developed using filters that generate compactly supported wavelets with any number of vanishing moments (filters with any length).
Results
To assess the quality of signals encoded with the proposed coder, we have obtained some subjective and objective results. Six music samples considered hard to encode have been used. Special attention has been paid to signals which consist of impulsive energy bursts, like ‘drums’ or ‘piano solo’. These signals are extremely susceptible to the presence of ‘pre-echos’. As we shall see, the proposed coder with the new algorithm for translating psycho-acoustic information to the wavelet domain
Conclusions
In this paper, we have presented an improved masking model in which we propose a new method for calculating the tonality coefficient that characterizes the tonal or noise like nature of the analysed signal. This method is based on a linear predictor of the magnitude and phase spectra. Also, a new algorithm for translating psycho-acoustic information to the wavelet domain is presented which allows us to implement transparent audio coders based on the orthonormal-wavelet transform, using filters
References (15)
- et al.
Biorthonormal bases of compactly supported wavelets
Commun. Pure Appl. Math.
(1992) - R. Coifman, Y. Meyer, S. Quake, M. Wickerhauser, Signal processing and compression with wave packets, Dept. Math., Yale...
- I.T. Committee, Coding of moving pictures and associated audio for digital storage media up to about 1.5 Mbit/s,...
- et al.
Efficient audio coding using perfect reconstruction noncausal iir filter banks
IEEE Trans. Speech Audio Process.
(1996) Orthonormal bases of compactly supported wavelets
Commun. Pure Appl. Math.
(1988)Advances in speech and audio compression
Proc. IEEE
(1994)Asymmetry of masking between noise and tone
Perception Psychophys.
(1972)
Cited by (15)
Auditory-motivated Gammatone wavelet transform
2014, Signal ProcessingCitation Excerpt :A psychoacoustic model based on the WPT with applications to audio compression and watermarking was proposed by He and Scordilis [30]. Zurera et al. proposed an algorithm for translating psychoacoustic information to the wavelet domain, using which a WPT-based audio coder was developed [31]. Reyes et al. developed an adaptive WPT by minimizing different perceptual-entropy-based cost functions, and showed that it leads to a low bit-rate representation of audio signals without affecting perceptual quality [32].
An enhanced psychoacoustic model based on the discrete wavelet packet transform
2006, Journal of the Franklin InstituteCitation Excerpt :However, in contrast to other approaches (e.g. [14]) here the computation uses more precise critical bank approximations as well as simultaneous masking results obtained in the wavelet domain. As mentioned in Section 2, the DWPT can conveniently decompose the signal into a critical band-like partition [14,16,17]. The standard critical bands are included in Table 1.
Novel wavelet domain Wiener filtering de-noising techniques: Application to bowel sounds captured by means of abdominal surface vibrations
2006, Biomedical Signal Processing and ControlUse of the symmetrical extension for improving a time-varying wavelet-packet-based audio coder
2003, Digital Signal Processing: A Review JournalAdaptive wavelet-packet analysis for audio coding purposes
2003, Signal ProcessingCitation Excerpt :To reduce this complexity, a simplification was proposed using long wavelet filters. In [10], an algorithm for translating psycho-acoustic information to the wavelet domain is presented. It can be used with filters that generate orthogonal wavelets with any compact support, but the analysis tree must be close to the critical band division.
Comparison of different wavelet decomposition techniques for PEAQ model to assess the quality of audio codecs
2015, 2nd International Conference on Electronics and Communication Systems, ICECS 2015