Elsevier

Signal Processing

Volume 107, February 2015, Pages 141-152
Signal Processing

Audio coding in wireless acoustic sensor networks

https://doi.org/10.1016/j.sigpro.2014.07.021Get rights and content

Highlights

  • We treat the problem of source coding for wireless acoustic sensor networks.

  • We consider vector sources to make use of the time correlation in the audio sequences.

  • We use the measurements at the receiving nodes as side information using distributed source coding.

  • We derive local rate-distortion functions to be used for rate allocation for an optimal sum-rate.

  • Our encoding/decoding process is joint with dereverberation.

Abstract

In this paper, we consider the problem of source coding for a wireless acoustic sensor network where each node in the network makes its own noisy measurement of the sound field, and communicates with other nodes in the network by sending and receiving encoded versions of the measurements. To make use of the correlation between the sources available at the nodes, we consider the possibility of combining the measurement and the received messages into one single message at each node instead of forwarding the received messages and separate encoding of the measurement. Moreover, to exploit the correlation between the messages received by a node and the node׳s measurement of the source, we propose to use the measurement as side information and thereby form a distributed source coding (DSC) problem. Assuming that the sources are Gaussian, we then derive the rate-distortion function (RDF) for the resulting remote DSC problem under covariance matrix distortion constraints. We further show that for this problem, the Gaussian source is the worst to code. Thus, the Gaussian RDF provides an upper bound to other sources such as audio signals. We then turn our attention to audio signals. We consider an acoustical model based on the room impulse response (RIR) and provide simulation results for the rate-distortion performance in a practical setup where a set of microphones record the sound in a standard listening room. Since our reconstruction scheme and distortion measure are defined over the direct sound source, coding and dereverberation are performed in a joint manner.

Introduction

The fact that sensor nodes in a Wireless Acoustic Sensor Network (WASN) are wireless, necessitates parsimony in spending the resources, namely power and bandwidth [1]. Two possible remedies are to allow the sensor nodes to communicate with neighboring nodes in order to save communication power, and to use source coding to compress the data acquired by the microphones before transmission in order to reduce the required transmission rate, leading to a better power-bandwidth trade-off.

There are three main consumers of power in a typical sensor node, i.e. sensing, signal processing, and communication units, where the last one is the dominant [2]. Electromagnetic wave power is a highly superlinear function of the inverse of distance, and thus fades rapidly as the distance increases. This means that a multihop scenario where instead of directly being sent to a far-end destination, the message goes through a sequence of transmissions in short distances, can be more power-efficient. To implement such a scenario, one should assume that neighboring nodes are able to communicate, and thus it is possible to let the nodes which are far from the base station deliver their messages by sequential forwarding through neighboring nodes. This is illustrated in Fig. 1.

To lower the required transmission rate, one may consider source coding for data compression. The audio signals acquired by the microphones are typically highly redundant and can therefore be compressed before transmission. The redundancy is due to correlation in time between the samples of the data acquired by a microphone, as well as spatial correlation between the measurements of the sound field performed by microphones placed at different locations in the environment.

Applying a data compression technique separately to the sequences (the data acquired by the microphones and also those received from neighboring microphones) is equivalent to making use of the time correlation between the samples of a sequence, but ignoring the spatial correlation between different sequences of data.

There are several possibilities to exploit the spatial correlation to further reduce the rate. One possibility is to jointly encode the sequences received by a microphone and the microphone׳s measurement into one single message instead of separate encoding [3]. Another possibility is to consider the spatial correlation between the message sent from a node and the measurement of the sound field available at the destination node. This latter can be utilized as side information at the decoder to reduce the rate at the encoder. Finally, assuming that nodes transmitting to a neighboring receiving node have the knowledge that which other nodes are sending messages to the same receiving node, it is possible to reduce in the rates of the transmitting nodes due to correlation between their messages by applying e.g. asymmetric or non-asymmetric Slepian–Wolf coding [4]. However, in this work, we do not make this assumption, and thus consider only the two first cases as done in [5].

To exploit the availability of the side information at the encoder for reduction in the rate, we rely on results from distributed source coding (DSC), which was started by Slepian and Wolf in [6]. In particular, it is possible to separately encode two correlated discrete sources at a sum-rate equal to the joint entropy of the two sources and reconstruct the sources at a joint decoder. This result was then extended in [7], [8] to the case of continuous sources, assuming that one source is available at the decoder and the other one is to be discretized and encoded for a given distortion. It was shown that for Gaussian sources under an MSE distortion constraint, the rate can be as low as the case where the side information is also available at the encoder.

Measurements made by wireless microphones are generally digitized to discrete-time and discrete-amplitude sequences. This incurs some distortion depending on the quantization step-size. For a given distortion, there is a minimum achievable rate given by the so-called rate-distortion function (RDF). The RDF can be used as a lower bound to assess the performance of any source coding scheme. The interested reader is referred to [9], [10] for more information on source coding theory.

In this paper, we derive the local RDF for an arbitrary node in a WASN, assuming that multiple sources at a transmitting node are jointly encoded taking into account (in a distributed sense) the side information available at the receiving node. We will then show in the simulations that local RDFs can be used for rate allocation in the network to achieve the optimal sum-rate. We will also design a coding scheme, which in theory and under certain asymptotical conditions, achieves the theoretical RDF. Vector sources will be considered to allow for modeling of memory in the sources, and distortion constraints are defined in form of covariance matrices for generality. This paper extends our previous work [5] by first proving that Gaussian RDF is an upper bound to other distributions including audio, and then applying our results to Gaussian as well as audio signals. We also provide a complete proof for the RDF, which was only sketched in [5]. In Section 2, the problem is formulated, notations and assumptions are mentioned, and the acoustic channel model used in this paper will be discussed. In Section 3, sufficient statistics are used for joint encoding of correlated sources into a single source in a DSC scenario. Then the RDF will be derived for vector Gaussian sources with noisy measurements under covariance matrix distortion constraints. We further prove that for this setup, Gaussian distribution is the worst case for coding in terms of rate-distortion. In Section 4, we provide simulation results for Gaussian sources and real audio measurements. Section 5 concludes the paper.

Section snippets

Notation, assumptions, and problem formulation

We denote random vectors by lowercase boldface, matrices by uppercase boldface, and scalars by italic letters. The operations tr(·), I(·;·), h(·), and E[·] stand for trace of a matrix, mutual information, differential entropy and expectation, respectively. Markov chains are denoted by two-headed arrows; e.g. xyz. Probability density functions are denoted by f(·) and covariance and cross-covariance matrices are denoted by symbol Σ followed by a subscript indicating the random vectors involved

Distributed source coding for WASN

In this section, we solve the problem formulated in Section 2.1. First we consider Gaussian sources, and reduce the problem to a DSC problem with a single source at the encoder. Then we derive the RDF for this problem. Finally, we will show that for our DSC problem with covariance distortion constraint, a Gaussian distribution is the worst for source coding, i.e. for a given distortion and source covariance, any other distribution requires a lower rate compared to the Gaussian case. The use of

Simulation results

We apply our results on sufficient statistics and distributed source coding to Gaussian and audio signals separately. To make use of the side information in a DSC manner, we use a suboptimal implementation called zero error coding (ZEC) [19], [20], which is based on the fact that due to the correlation between the source at the encoder and the side information, having the knowledge of the side information limits the range of probable values for the source, thus requiring lower rate quantization

Conclusions

We considered a source coding problem for wireless acoustic sensor networks with the possibility of communication between the nodes. We used sufficient statistics to losslessly combine the messages and the measurement available in a given node into a single source, which was then encoded and sent to the next node. We also proposed to make use of the measurement available in receiving nodes as side information to reduce the rate. For the resulting distributed source coding problem with the

Acknowledgments

The authors would like to thank Morten Lydolf for his help in making the audio measurements at Bang & Olufsen, and Jesper Kjær Nielsen for insightful discussions related to our acoustical channel model.

References (23)

  • I.F. Akyildiz et al.

    Wireless sensor networksa survey

    Comput. Netw.

    (2002)
  • A. Wyner

    The rate-distortion function for source coding with side information at the decoder—IIgeneral sources

    Inf. Control

    (1978)
  • A. Bertrand, Applications and trends in wireless acoustic sensor networks: a signal processing perspective, in:...
  • J. Østergaard, M.S. Derpich, Sequential remote source coding in wireless acoustic sensor networks, in: European Signal...
  • P.L. Dragotti et al.

    Distributed Source Coding, Theory, Algorithms and Applications

    (2009)
  • A. Zahedi, J. Østergaard, S.H. Jensen, P. Naylor, S. Bech, Distributed remote vector Gaussian source coding for...
  • D. Slepian et al.

    Noiseless coding of correlated information sources

    IEEE Trans. Inf. Theory

    (1973)
  • A. Wyner et al.

    The rate-distortion function for source coding with side information at the decoder

    IEEE Trans. Inf. Theory

    (1976)
  • R. Gray

    Source Coding Theory

    (1990)
  • J. Thomas et al.

    Elements of Information Theory

    (1991)
  • J. Idier

    Bayesian Approach to Inverse Problems

    (2008)
  • Cited by (13)

    View all citing articles on Scopus

    The research leading to these results has received funding from the European Union׳s Seventh Framework Programme (FP7/2007-2013) under Grant agreement n° ITN-GA-2012-316969.

    View full text