Skip to main content
Log in

An Adaptive Non Reference Anchor Array Framework for Audio Retrieval in Teleconferencing Environment

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In this paper, an adaptive framework for audio retrieval in live teleconferencing environments with multiple participants is proposed. The framework uses a non reference anchor array (NRA) to capture the interfering speech sources, in addition to the primary array that captures the speech source of interest (SOI). A linearly constrained-minimum variance (LC-MV) beamformer is used herein such that the signal coming from the look direction is preserved while interferences coming from the non look direction are nulled. Additionally, the reverberant component of the speech acquired by this framework is removed by a novel method that uses the linear prediction (LP) residual cepstrum. This method does not require the computation of the acoustic impulse response (AIR) of the teleconferencing room and hence is computationally efficient. The NRA framework is therefore able to remove correlated noise coming from the direction of the SOI and also dereverberating the noise free signal. The performance of the proposed framework is evaluated by conducting experiments on clean speech acquisition from distant microphone arrays. Experiments on distant speech recognition are also conducted using the TIMIT and MONC databases. Experimental results obtained from the proposed framework indicate a reasonable improvement over correlation, subspace and standard minimum variance beamforming methods. The application of the framework in audio retrieval in a live teleconferencing environment with multiple participants is also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  1. Li, J., & Stoica, P. (2006). Robust adaptive beamforming. Wiley Online Library.

  2. Benesty, J., Chen, J., Huang, Y.(2008). Microphone array signal processing (Vol. 1). Berlin Heidelberg: Springer-Verlag.

    Google Scholar 

  3. Li, J., Stoica, P., Wang, Z. (2003). On robust capon beamforming and diagonal loading. IEEE Transactions on Signal Processing , 51(7), 1702–1715.

    Article  Google Scholar 

  4. Shukla, A., Nathwani, K., Hegde, R.M. (2012). An adaptive non reference anchor array framework for distant speech recognition. In Advances in multimedia information processing–PCM 2012 (pp. 222–231). Berlin Heidelberg: Springer-Verlag.

    Chapter  Google Scholar 

  5. Nathwani, K., & Hegde, R. (2012). Joint adaptive beamforming and echo cancellation using a non reference anchor array framework. In TA8a1-10: array signal processing, 46th asilomar conference on signals, systems and computers Nov. 2012. Pacific Grove, California.

  6. Bees, D., Blostein, M., Kabal, P. (1991). Reverberant speech enhancement using cepstral processing. In Acoustics, speech, and signal processing, 1991. ICASSP-91., International conference on (pp. 977–980). IEEE.

  7. Dobrowolski, A.P., & Majda E. (2011). Cepstral analysis in the speakers recognition systems. In Signal processing algorithms, architectures, arrangements, and applications conference proceedings (SPA), 2011 (pp. 1–6). IEEE.

  8. Mosayyebpour, S., Sayyadiyan, A., Zareian, M., Shahbazi, A. (2010). Single channel inverse filtering of room impulse response by maximizing skewness of lp residual. In Signal acquisition and processing, 2010. ICSAP’10. International conference on (pp. 130–134). IEEE.

  9. Xizhong, S., & Guang, M. (2009). Complex cepstrum based singlechannel speech dereverberation. In Computer science & education, 2009. ICCSE’09. 4th International conference on (pp. 7–11). IEEE.

  10. Dmochowski, J., Benesty, J., Affès, S. (2009). On spatial aliasing in microphone arrays. Signal Processing, IEEE Transactions on, 57(4), 1383–1395.

    Article  Google Scholar 

  11. Naylor, P.A., & Gaubitch, N.D. (2010). Speech dereverberation. Springer.

  12. Garofolo, J. (1993). TIMIT: acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.

    Google Scholar 

  13. Levi, A. (2003). Multi channel overlapping numbers corpus distribution. Philadelphia: Linguistic Data Consortium. http://cslu.cse.ogi.edu/corpora/.

    Google Scholar 

  14. Loizou, P. (2011). Speech quality assessment. Multimedia analysis, processing and communications (pp. 623–654).

  15. Naylor, P., & Gaubitch, N. (2012). Acoustic signal processing in noise: its not getting any quieter. In Acoustic signal enhancement; proceedings of IWAENC 2012, International workshop on (pp. 1–6). VDE.

  16. Qin, B., Zhang, H., Fu, Q., Yan, Y. (2008). Subsample time delay estimation via improved gcc phat algorithm. In Signal processing, 2008. ICSP 2008. 9th international conference on (pp. 2579–2582).

  17. Zahernia, A., Dehghani, M., Javidan, R. (2011). Music algorithm for doa estimation using mimo arrays. In 6th telecommunication systems services, and applications (TSSA), 2011 international conference on (pp. 149–153).

  18. Huber, R. (2006). PEMO-Q–A new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio Speech and Language Processing , 14(6), 1902–1911.

    Article  Google Scholar 

  19. Qadeer, M. (2012). Dynamic call transfer through wi-fi networks using asterisk. In Proceedings of the international conference on soft computing for problem solving (SocProS 2011) December 20-22, 2011 (pp. 51–61). New York: Springer.

    Chapter  Google Scholar 

  20. Sinnreich, H., & Johnston, A. B. (2012). Internet communications using SIP: delivering VoIP and multimedia services with session initiation protocol (Vol. 27). Indianapolis: Wiley Publishing, Inc.

Download references

Acknowledgments

This work was supported in part by the DeITY, Goverment of India and in part by the BSNL Telecom Center of Excellence, IIT Kanpur

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajesh M. Hegde.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nathwani, K., Shukla, A., Khunteta, S. et al. An Adaptive Non Reference Anchor Array Framework for Audio Retrieval in Teleconferencing Environment. J Sign Process Syst 74, 91–102 (2014). https://doi.org/10.1007/s11265-013-0786-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-013-0786-7

Keywords

Navigation