Skip to main content
Log in

Bidirectional microphone array with adaptation controlled by voice activity detector based on multiple beamformers

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Ambient noise suppression in a reverberant room is usually performed by the microphone array. The adaptive beamforming, whose typical representative is minimum variance distortionless (MVDR) beamformer, is an effective method for noise suppression. However, MVDR beamformer gives poor results in the real room because of its sensitivity to the steering error and the multipath wave propagation. In this paper we propose a noise suppression method based on assumption that the positions of the speakers in the reverberant room are roughly known. Noise reduction is realized by two MVDR beamformers directed toward each of the speakers. Adaptation of the MVDR beamformers are controlled by a speaker activity detector which decision is based on power transfer model of the multiple superdirective beamformers in combined diffuse and coherent noise field. The proposed voice activity detector also provides residual noise reduction. The proposed method and its robustness to steering error were tested on the model of simulated room as well as in real room environment. The improvement of the restored speech signal was evaluated by Signal to Noise Ratio Enhancement (SNRE) and by Perceptual evaluation of speech quality (PESQ) measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Strictly speaking, it is not beamformer because it uses only one microphone, i.e. fourth microphone, with omnidirectional characteristic.

  2. In experimental tests we used small value of λ, λ=0.25 which provides fast tracking of the power change.

  3. In practice, there is one more hypothesis when both speakers speak simultaneously. In this case we assume that the louder speaker is active.

  4. In this test case SNRE is ratio of speech energy during speech segment and residual noise in pause segment attenuated by (19).

  5. PESQ in this test case relates to whole signal displayed in Fig. 8e (signal with additional noise attenuation by (19)).

References

  1. Agnew J, Thornton MJ (2000) Just noticeable and objectionable group delays in digital hearing aids. J Am Acad Audiol 11(6):330–336

    Google Scholar 

  2. Air conditioner sounds https://www.soundsnap.com/tags/air_conditioner. Accessed: 2017-05-25

  3. Allen JB, Berkley DA (1979) Image method for efficiently simulating small-room acoustics. J Acoust Soc Am 65(4):943–950

    Article  Google Scholar 

  4. Bitzer J, Uwe Simmer K (2001) Superdirective microphone arrays. Microphone arrays. Springer, Berlin, pp 19–38

    Book  Google Scholar 

  5. Cabañas-Molero P et al. (2018) Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis. Multimed Tools Appl: 1–23. https://doi.org/10.1007/s11042-018-5944-2

  6. Defatta DJ, Lucas JG, Hodgkiss WS (1988) Digital signal processing: a system design approach

  7. Farhang-Boroujeny B (1998) Adaptive filters: theory and applications. John Wiley & Sons, Inc., New York

    MATH  Google Scholar 

  8. Frost LO III (1972) An algorithm for linearly constrained adaptive array processing. Proc IEEE 60:926–935 (Frost, 1972)

    Article  Google Scholar 

  9. Griffiths L, Jim CW (1982) An alternative approach to linearly constrained adaptive beamforming. IEEE Trans Antennas Propag 30(1):27–34

    Article  Google Scholar 

  10. Hoshuyama O, Sugiyama A, Hirano A (1999) A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. IEEE Trans Signal Process 47:2677–2684

    Article  Google Scholar 

  11. ITU-T (2001) Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Int Telecomm Union

  12. ITU-T Test Signals for Telecommunication Systems http://www.itu.int/net/itu-t/sigdb/genaudio/Pseries.htm.Accessed: 2018-02-07

  13. Jovičić TS, Šarić MZ, Turajlić RS (2005) Application of the maximum signal to interference criterion to the adaptive microphone array. Acoustics Research Letters Online (ARLO) 6(4):232–237

    Article  Google Scholar 

  14. Marro C, Mahieux Y, Simmer UK (1998) Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering. IEEE Trans Speech Audio Process 6(3):240–259

    Article  Google Scholar 

  15. McCowan AI, Bourlard H (2003) Microphone array post-filter based on noise field coherence. IEEE Transactions on Speech and Audio Processing, 11(6). (McCowan and Bourlard (2003)

  16. Papp II, Šarić MZ, Jovičić TS, Teslić DN (2007) Adaptive microphone array for unknown desired speaker’s transfer function. JASA Express Lett 122(2):EL44–EL49

    Google Scholar 

  17. Parra L, Alvino C (2002) Geometric source separation: merging convolutive source separation with geometric beamforming. IEEE Trans Speech Audio Process 10(6):352–362

    Article  Google Scholar 

  18. Parra L, Spence C (2000) Convolutive blind separation of non-stationary sources. IEEE Trans Speech Audio Process 8(3):320–327

    Article  MATH  Google Scholar 

  19. Šarić MZ, Jovičić TS (2004) Adaptive microphone array based on pause detection. Acoust Res Lett Online (ARLO) 5(2):68–74

    Article  Google Scholar 

  20. Šarić MZ, Simić PD, Jovičić TS (2011) A new post-filter algorithm combined with two-step adaptive beam former. Circ Syst Sign Process 30:483–500. https://doi.org/10.1007/s00034-010-9233-1, printed, CSSP(2011)

    Article  MATH  Google Scholar 

  21. Simmer KU, Bitzer J, Marro C (2001) Post-filtering techniques. Microphone arrays. Springer, Berlin, pp 39–60

    Book  Google Scholar 

  22. Spriet A, MooNEN MARC, Wouters J (2002) A multi-channel subband generalized singular value decomposition approach to speech enhancement. Trans Emerg Telecomm Technol 13(2):149–158

    Article  Google Scholar 

  23. Van Trees HL (2004) Optimum array processing: part IV of detection, estimation, and modulation theory. John Wiley & Sons

  24. Wang L, Ding H, Fuliang Y (2010) Combining superdirective beamforming and frequency-domain blind source separation for highly reverberant signals. EURASIP J Audio, Speech Music Process 1(2010):797962

    Article  Google Scholar 

  25. White G, Louie GJ (2005) The audio dictionary: revised and expanded. University of Washington Press

  26. Wölfel M, McDonough J (2009) Distant speech recognition. John Wiley & Sons

  27. Yan C, Xie H, Yang D, Yin J, Zhang Y, Dai Q (2018) Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transport Syst 19(1):284–295

    Article  Google Scholar 

  28. Zelinski R (1988) A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. Proc ICASSP88: 2578–2581

Download references

Acknowledgements

This research was supported by grants 178027, TR32032 and TR32035 from the Ministry of Education, Science and Technological Development of the Republic of Serbia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zoran Šarić.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Transfer of the acoustic power by diffuse noise field

Appendix: Transfer of the acoustic power by diffuse noise field

Transfer of the diffuse component of the acoustic power from the acoustic source to the output of the beamformer is defined by linear transfer factor.

$$ {\beta}_k=\frac{P_{diff,k}}{P_s} $$
(22)

where Pdiff, k is total diffuse power at the output of the beamformer k, Psis the power of the acoustic source measured at distance 1 m. Taking into account directivity of the microphone array defined by beam pattern hк(j, ϕ, θ), diffuse power component is.

(23)

where Dk(j) is directivity factor, Pdif _ array is diffuse power component at microphone array position. Diffuse power is uniformly distributed in the room. It is equal to the direct path power at critical distance dc.

$$ {P}_{dif\_ array}={P}_{direct}={P}_s{\left(1/{d}_c\right)}^2 $$
(24)

Substituting (24), (23) into (22) we obtain.

$$ {\beta}_k=\frac{1}{d_c^2{D}_k(j)} $$
(25)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Šarić, Z., Subotić, M., Bilibajkić, R. et al. Bidirectional microphone array with adaptation controlled by voice activity detector based on multiple beamformers. Multimed Tools Appl 78, 15235–15254 (2019). https://doi.org/10.1007/s11042-018-6895-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6895-3

Keywords

Navigation