Audio coding in wireless acoustic sensor networks☆
Introduction
The fact that sensor nodes in a Wireless Acoustic Sensor Network (WASN) are wireless, necessitates parsimony in spending the resources, namely power and bandwidth [1]. Two possible remedies are to allow the sensor nodes to communicate with neighboring nodes in order to save communication power, and to use source coding to compress the data acquired by the microphones before transmission in order to reduce the required transmission rate, leading to a better power-bandwidth trade-off.
There are three main consumers of power in a typical sensor node, i.e. sensing, signal processing, and communication units, where the last one is the dominant [2]. Electromagnetic wave power is a highly superlinear function of the inverse of distance, and thus fades rapidly as the distance increases. This means that a multihop scenario where instead of directly being sent to a far-end destination, the message goes through a sequence of transmissions in short distances, can be more power-efficient. To implement such a scenario, one should assume that neighboring nodes are able to communicate, and thus it is possible to let the nodes which are far from the base station deliver their messages by sequential forwarding through neighboring nodes. This is illustrated in Fig. 1.
To lower the required transmission rate, one may consider source coding for data compression. The audio signals acquired by the microphones are typically highly redundant and can therefore be compressed before transmission. The redundancy is due to correlation in time between the samples of the data acquired by a microphone, as well as spatial correlation between the measurements of the sound field performed by microphones placed at different locations in the environment.
Applying a data compression technique separately to the sequences (the data acquired by the microphones and also those received from neighboring microphones) is equivalent to making use of the time correlation between the samples of a sequence, but ignoring the spatial correlation between different sequences of data.
There are several possibilities to exploit the spatial correlation to further reduce the rate. One possibility is to jointly encode the sequences received by a microphone and the microphone׳s measurement into one single message instead of separate encoding [3]. Another possibility is to consider the spatial correlation between the message sent from a node and the measurement of the sound field available at the destination node. This latter can be utilized as side information at the decoder to reduce the rate at the encoder. Finally, assuming that nodes transmitting to a neighboring receiving node have the knowledge that which other nodes are sending messages to the same receiving node, it is possible to reduce in the rates of the transmitting nodes due to correlation between their messages by applying e.g. asymmetric or non-asymmetric Slepian–Wolf coding [4]. However, in this work, we do not make this assumption, and thus consider only the two first cases as done in [5].
To exploit the availability of the side information at the encoder for reduction in the rate, we rely on results from distributed source coding (DSC), which was started by Slepian and Wolf in [6]. In particular, it is possible to separately encode two correlated discrete sources at a sum-rate equal to the joint entropy of the two sources and reconstruct the sources at a joint decoder. This result was then extended in [7], [8] to the case of continuous sources, assuming that one source is available at the decoder and the other one is to be discretized and encoded for a given distortion. It was shown that for Gaussian sources under an MSE distortion constraint, the rate can be as low as the case where the side information is also available at the encoder.
Measurements made by wireless microphones are generally digitized to discrete-time and discrete-amplitude sequences. This incurs some distortion depending on the quantization step-size. For a given distortion, there is a minimum achievable rate given by the so-called rate-distortion function (RDF). The RDF can be used as a lower bound to assess the performance of any source coding scheme. The interested reader is referred to [9], [10] for more information on source coding theory.
In this paper, we derive the local RDF for an arbitrary node in a WASN, assuming that multiple sources at a transmitting node are jointly encoded taking into account (in a distributed sense) the side information available at the receiving node. We will then show in the simulations that local RDFs can be used for rate allocation in the network to achieve the optimal sum-rate. We will also design a coding scheme, which in theory and under certain asymptotical conditions, achieves the theoretical RDF. Vector sources will be considered to allow for modeling of memory in the sources, and distortion constraints are defined in form of covariance matrices for generality. This paper extends our previous work [5] by first proving that Gaussian RDF is an upper bound to other distributions including audio, and then applying our results to Gaussian as well as audio signals. We also provide a complete proof for the RDF, which was only sketched in [5]. In Section 2, the problem is formulated, notations and assumptions are mentioned, and the acoustic channel model used in this paper will be discussed. In Section 3, sufficient statistics are used for joint encoding of correlated sources into a single source in a DSC scenario. Then the RDF will be derived for vector Gaussian sources with noisy measurements under covariance matrix distortion constraints. We further prove that for this setup, Gaussian distribution is the worst case for coding in terms of rate-distortion. In Section 4, we provide simulation results for Gaussian sources and real audio measurements. Section 5 concludes the paper.
Section snippets
Notation, assumptions, and problem formulation
We denote random vectors by lowercase boldface, matrices by uppercase boldface, and scalars by italic letters. The operations , , , and stand for trace of a matrix, mutual information, differential entropy and expectation, respectively. Markov chains are denoted by two-headed arrows; e.g. . Probability density functions are denoted by and covariance and cross-covariance matrices are denoted by symbol followed by a subscript indicating the random vectors involved
Distributed source coding for WASN
In this section, we solve the problem formulated in Section 2.1. First we consider Gaussian sources, and reduce the problem to a DSC problem with a single source at the encoder. Then we derive the RDF for this problem. Finally, we will show that for our DSC problem with covariance distortion constraint, a Gaussian distribution is the worst for source coding, i.e. for a given distortion and source covariance, any other distribution requires a lower rate compared to the Gaussian case. The use of
Simulation results
We apply our results on sufficient statistics and distributed source coding to Gaussian and audio signals separately. To make use of the side information in a DSC manner, we use a suboptimal implementation called zero error coding (ZEC) [19], [20], which is based on the fact that due to the correlation between the source at the encoder and the side information, having the knowledge of the side information limits the range of probable values for the source, thus requiring lower rate quantization
Conclusions
We considered a source coding problem for wireless acoustic sensor networks with the possibility of communication between the nodes. We used sufficient statistics to losslessly combine the messages and the measurement available in a given node into a single source, which was then encoded and sent to the next node. We also proposed to make use of the measurement available in receiving nodes as side information to reduce the rate. For the resulting distributed source coding problem with the
Acknowledgments
The authors would like to thank Morten Lydolf for his help in making the audio measurements at Bang & Olufsen, and Jesper Kjær Nielsen for insightful discussions related to our acoustical channel model.
References (23)
- et al.
Wireless sensor networksa survey
Comput. Netw.
(2002) The rate-distortion function for source coding with side information at the decoder—IIgeneral sources
Inf. Control
(1978)- A. Bertrand, Applications and trends in wireless acoustic sensor networks: a signal processing perspective, in:...
- J. Østergaard, M.S. Derpich, Sequential remote source coding in wireless acoustic sensor networks, in: European Signal...
- et al.
Distributed Source Coding, Theory, Algorithms and Applications
(2009) - A. Zahedi, J. Østergaard, S.H. Jensen, P. Naylor, S. Bech, Distributed remote vector Gaussian source coding for...
- et al.
Noiseless coding of correlated information sources
IEEE Trans. Inf. Theory
(1973) - et al.
The rate-distortion function for source coding with side information at the decoder
IEEE Trans. Inf. Theory
(1976) Source Coding Theory
(1990)- et al.
Elements of Information Theory
(1991)
Bayesian Approach to Inverse Problems
Cited by (13)
Optimal transforms of random vectors: The case of successive optimizations
2017, Signal ProcessingCitation Excerpt :Techniques associated with data compression are used in a number of areas of signal processing such as, to name a few, robust estimation of principal and minor components of a stochastic signal [1–6], recognition of non-deterministic signals generated by synthetic non-chaotic and chaotic stochastic processes [7], color image watermarking [8], similarity search in a large number if signals [9,10], image resolution enhancement [11], the extrapolated impulse response filter design [12], multisensor networks [13–16].
Localization in wireless sensor networks and wireless multimedia sensor networks using clustering techniques
2024, Multimedia Tools and ApplicationsLearning to Dequantize Speech Signals by Primal-dual Networks: An Approach for Acoustic Sensor Networks
2019, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - ProceedingsOn Zero-delay Source Coding of LTI Gauss-Markov Systems with Covariance Matrix Distortion Constraints
2018, 2018 European Control Conference, ECC 2018Fast Randomization for Distributed Low-Bitrate Coding of Speech and Audio
2018, IEEE/ACM Transactions on Audio Speech and Language Processing
- ☆
The research leading to these results has received funding from the European Union׳s Seventh Framework Programme (FP7/2007-2013) under Grant agreement n° ITN-GA-2012-316969.