Skip to main content

Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

  • 1087 Accesses

Abstract

Overlapped speech contains simultaneous speech of multiple speakers. The presence of overlapped speech is one of the main sources of error for speaker diarization, speech, and speaker recognition systems. Most of the existing works used magnitude spectrum based features for overlap detection. This work focuses on detecting overlapped speech by exploring instantaneous phase and amplitude information of speech signal. Phase characteristics are captured by the Instantaneous Frequency Spectrogram (IFSpec), while Teager-Kaiser Energy Operator (TEO) based pyknograms are used for representing instantaneous amplitude. Features are learned from the IF spectrogram and TEO-based pyknogram automatically using Fully-Convolutional Neural Network (F-CNN). This work is evaluated on the SSC corpus, which has been previously used in this task. Significant performance improvement is observed when both representations are combined in an early fusion framework. The performance improvement upon combination indicates the presence of complementary information in the feature representations. Classification is performed over three different segment durations, i.e., 1 s, 500 ms, and 250 ms, to analyze the effect of segment duration over overlap detection. The effect of speaker gender present in overlapped speech is also studied in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Andrei, V., Cucu, H., Burileanu, C.: Overlapped speech detection and competing speaker counting-humans versus deep learning. IEEE J. Select. Top. Sig. Process. 13(4), 850–862 (2019)

    Article  Google Scholar 

  2. Andrei, V., Cucu, H., Burileanu, C.: Detecting overlapped speech on short timeframes using deep learning. In: INTERSPEECH, pp. 1198–1202 (2017)

    Google Scholar 

  3. Baghel, S., Prasanna, S.R.M., Guha, P.: Overlapped speech detection using phase features. J. Acoust. Soc. Am. 150(4), 2770–2781 (2021)

    Article  Google Scholar 

  4. Boakye, K., Trueba-Hornero, B., Vinyals, O., Friedland, G.: Overlapped speech detection for improved speaker diarization in multiparty meetings. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4353–4356 (2008)

    Google Scholar 

  5. Boakye, K., Vinyals, O., Friedland, G.: Improved overlapped speech handling for speaker diarization. In: Twelfth Annual Conference of the International Speech Communication Association (2011)

    Google Scholar 

  6. Chowdhury, S.A., Danieli, M., Riccardi, G.: Annotating and categorizing competition in overlap speech. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5316–5320 (2015)

    Google Scholar 

  7. Chowdhury, S.A., Riccardi, G.: A deep learning approach to modeling competitiveness in spoken conversations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5680–5684 (2017)

    Google Scholar 

  8. Cooke, M., Hershey, J.R., Rennie, S.J.: Monaural speech separation and recognition challenge. Comput. Speech Lang. 24(1), 1–15 (2010)

    Article  Google Scholar 

  9. Geiger, J.T., Eyben, F., Schuller, B., Rigoll, G.: Detecting overlapping speech with long short-term memory recurrent neural networks. In: Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France (2013)

    Google Scholar 

  10. Lovekin, J., Krishnamachari, K.R., Yantorno, R.E., Benincasa, D.S., Wenndt, S.J.: Adjacent Pitch Period Comparison (APPC) as a usability measure of speech segments under co-channel conditions. In: IEEE International Symposium on Intelligent Signal Processing and Communication Systems, pp. 139–142 (2001)

    Google Scholar 

  11. Ryant, N., et al.: First dihard challenge evaluation plan. 2018, Technical Report (2018)

    Google Scholar 

  12. Ryanta, N., et al.: Enhancement and analysis of conversational speech: JSALT 2017. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5154–5158 (2018)

    Google Scholar 

  13. Shokouhi, N., Hansen, J.H.L.: Teager-Kaiser energy operators for overlapped speech detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1035–1047 (2017)

    Article  Google Scholar 

  14. Shokouhi, N., Ziaei, A., Sangwan, A., Hansen, J.H.L.: Robust overlapped speech detection and its application in word-count estimation for prof-life-log data. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4724–4728 (2015)

    Google Scholar 

  15. Vijayan, K., Reddy, P.R., Murty, K.S.R.: Significance of analytic phase of speech signals in speaker verification. Speech Commun. 81, 54–71 (2016)

    Google Scholar 

  16. Yella, S.H., Bourlard, H.: Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1688–1700 (2014)

    Article  Google Scholar 

  17. Yousefi, M., Hansen, J.H.L.: Frame-based overlapping speech detection using convolutional neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6744–6748 (2020)

    Google Scholar 

  18. Yousefi, M., Shokouhi, N., Hansen, J.H.: Assessing speaker engagement in 2-person debates: overlap detection in united states presidential debates. In: INTERSPEECH, pp. 2117–2121 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shikha Baghel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baghel, S., Prasanna, S.R.M., Guha, P. (2022). Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20980-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20979-6

  • Online ISBN: 978-3-031-20980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics